Tuesday, February 18, 2014

Flow Chart and List of Ideas

Flow chart:


  • Build clusters recursing spectrally using Darren's data.
  • Run CART on those clusters to find and list conditions that are different between clusters.
  • For each cluster, run a12 and bootstrap to find others that are better. Best on one dep variable and worse on none.
  • Find the differences in conditions when we traverse from one cluster to another using CART branching.
  • Generate new dataset and repeat to see if those conditions actually matter.


Ideas:

  1. ***Label clusters with how close they are.
  2. ***Round the floating numbers to .2 decimal points 
  3. ***For each dependent variable Run a12 and then bootstrap to see if clusters are actually different.
  4. ***Generate conditions for each branch and find difference between them.
  5. Change criterion to "gini" and check results.
  6. Don't split the clusters if the neighbors have statistically insignificantly different scores.
  7. Train on N things nearest to the centroids of each cluster and test on all.
Note: *** are done.

Current work:

Working on spectrally generated clusters to use WHICH to see if rules of variables between clusters are same as ones produced by CART. To check and see if CART is right.

No comments:

Post a Comment