Worked example of Ordonez distance algorithm
Code snippet - shows code and marks the place where I have verified the algorithm
pseudoCode up to verified line
awk code where I'm hung up
must do: decide on a class the meets requirements, hard as there are very few open
Thursday, July 26, 2012
Tuesday, July 24, 2012
Erin's To Dos
Ellington NN clustering, preliminary
Continue working on Binary Sparse NN implementation
Think centroids need to be binary with stdDev instead of averages of the points.
Nearly every instance is clustered, and the lack of StdDev cutoff is not identifying unknown instances
Double check methods to make sure they are working as I think they should
Start GRE studying - exam July 31
Schedule
Ellington NN clustering, preliminary
Continue working on Binary Sparse NN implementation
Think centroids need to be binary with stdDev instead of averages of the points.
Nearly every instance is clustered, and the lack of StdDev cutoff is not identifying unknown instances
Double check methods to make sure they are working as I think they should
Start GRE studying - exam July 31
Schedule
Thursday, July 19, 2012
POM
POM Learner Results:
Software Project Performance Metrics: http://i.imgur.com/ERu9M.png
Learner Metrics: http://i.imgur.com/AJoq1.png
Software Project Performance Metrics: http://i.imgur.com/ERu9M.png
Learner Metrics: http://i.imgur.com/AJoq1.png
Cluster SA results
Results/Comparisons:
https://docs.google.com/document/d/1E0lwTCm-GDijJUjmfu_aA0EwuLCkm2B2rikJ2hoh-MM/edit
How to measure success/compare methods?
In the NSGA-II paper[1], there were two performance measures they used, neither were AUC. Since these models were widely used, they used 500 known, evenly distributed, points on the Pareto to measure the average distance from each resultant point to the Pareto, which was the first metric. The second metric was a measure of spread across the Pareto of the obtained solutions, calculated with a given algorithm.
REFERENCES:
[1] Kalyanmoy Deb, Associate Member, IEEE, Amrit Pratap, Sameer Agarwal, and T. Meyarivan, A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II, IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 6, NO. 2, APRIL 2002
https://docs.google.com/document/d/1E0lwTCm-GDijJUjmfu_aA0EwuLCkm2B2rikJ2hoh-MM/edit
How to measure success/compare methods?
In the NSGA-II paper[1], there were two performance measures they used, neither were AUC. Since these models were widely used, they used 500 known, evenly distributed, points on the Pareto to measure the average distance from each resultant point to the Pareto, which was the first metric. The second metric was a measure of spread across the Pareto of the obtained solutions, calculated with a given algorithm.
REFERENCES:
[1] Kalyanmoy Deb, Associate Member, IEEE, Amrit Pratap, Sameer Agarwal, and T. Meyarivan, A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II, IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 6, NO. 2, APRIL 2002
Tuesday, July 17, 2012
FpFSS slides
Leverages Association Learning (FP Growth) and Clustering (EM) to create a predictive data model in an unclassified database where the number of rows and columns are similar. Model is then applied to a related time series database where cluster concentrations can be predicted for future time values.
Erin's To Dos
Leverages Association Learning (FP Growth) and Clustering (EM) to create a predictive data model in an unclassified database where the number of rows and columns are similar. Model is then applied to a related time series database where cluster concentrations can be predicted for future time values.
Erin's To Dos
Labels:
Association Learning,
Clustering,
ErinM,
Fp Growth,
FSS,
Regression,
Sequence Mining,
Time Series
Local Cluster SA
Only SA for Constr (really bad):
[1] Baseline from NSGA-II for Constr:
Other runs for SA on cluster for Constr, only dominating:
Conclusion: Running SA with clusters was much MUCH better, but could use some improvement. By limiting SA to within cluster we get points that aren't going to be on the pareto.
Up Next: DE
REFERENCES
[1] Kalyanmoy Deb, Associate Member, IEEE, Amrit Pratap, Sameer Agarwal, and T. Meyarivan, A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II, IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 6, NO. 2, APRIL 2002
[1] Baseline from NSGA-II for Constr:
SA on cluster for Constr(top: all points, bottom: only dominating):
Other runs for SA on cluster for Constr, only dominating:
Conclusion: Running SA with clusters was much MUCH better, but could use some improvement. By limiting SA to within cluster we get points that aren't going to be on the pareto.
Up Next: DE
REFERENCES
[1] Kalyanmoy Deb, Associate Member, IEEE, Amrit Pratap, Sameer Agarwal, and T. Meyarivan, A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II, IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 6, NO. 2, APRIL 2002
POM
Todo List: https://docs.google.com/spreadsheet/ccc?key=0AolajDUgsGZ7dGF0Y2ppYk9XS0g5aTQ3bFRVbjRXSmc
POM: Portman Owens Menzies
What is it: A software project emulator model. See how full projects which take 200 days or more complete in mere seconds in a model. Can gather variety of statistics such as days to complete, money spent, and many more. I've coded a version of it for use with a learner.
How it works: My POM model runs on Actory, which is a Finite State Machine of sorts, where each Team/Person in the development project is a different machine. We also add a project manager, and an "assigner", who's job is to decide which task is best for the team/person.
Coded in: Python
Reason for Building POM: The transitions between machines in Actory have priorities. The main goal of POM was to use a Learner (bore = best or the rest) to learn the best transition priorities in Actory.
Methodology for Learning: We run POM 1000 times to generate average statistics and then package them with the currently used (random) transition priorities. This package gets sent to the learner, which spits out some data analysis on what the best transition priorities should be. After learning the best transition priorities, we run POM again, 1000 times, and regenerate the statistics and compare them to see if any improvements were found.
Data Results: The five statistics used are as follows:
- - - days = Days to Complete Project
- - - s1 = Money per Day Spent
- - - s2 = Money per LOC
- - - s3 = Days per LOC
- - - s4 = Average time spent IDLE for a team/person
Before learning:
- - - days = 269
- - - s1 = 1240
- - - s2 = 10.25
- - - s3 = 0.0083
- - - s4 = 0.5004
After learning:
- - - days = 268
- - - s1 = 1242
- - - s2 = 10.22
- - - s3 = 0.0082
- - - s4 = 0.4001
Brookes Law: Adding members to the project at a late phase in the game will only make it later. We test this in POM by allowing team/persons to gain experience and become better coders the more they work on the project. We test the effects and prove brookes law by running POM 35 times and gathering the number of days it takes, when team/persons can be added at different phases during the completion of the project. The following chart depicts the results, and indicates a steady increase in the days when members can be added earlier in the development.
http://i.imgur.com/frqkH.png
Y Axis: Days
X Axis: (0 to 100%) Percent of the Project Completed (Teams/Persons can only be added to the project when it is this much complete)
POM: Portman Owens Menzies
What is it: A software project emulator model. See how full projects which take 200 days or more complete in mere seconds in a model. Can gather variety of statistics such as days to complete, money spent, and many more. I've coded a version of it for use with a learner.
How it works: My POM model runs on Actory, which is a Finite State Machine of sorts, where each Team/Person in the development project is a different machine. We also add a project manager, and an "assigner", who's job is to decide which task is best for the team/person.
Coded in: Python
Reason for Building POM: The transitions between machines in Actory have priorities. The main goal of POM was to use a Learner (bore = best or the rest) to learn the best transition priorities in Actory.
Methodology for Learning: We run POM 1000 times to generate average statistics and then package them with the currently used (random) transition priorities. This package gets sent to the learner, which spits out some data analysis on what the best transition priorities should be. After learning the best transition priorities, we run POM again, 1000 times, and regenerate the statistics and compare them to see if any improvements were found.
Data Results: The five statistics used are as follows:
- - - days = Days to Complete Project
- - - s1 = Money per Day Spent
- - - s2 = Money per LOC
- - - s3 = Days per LOC
- - - s4 = Average time spent IDLE for a team/person
Before learning:
- - - days = 269
- - - s1 = 1240
- - - s2 = 10.25
- - - s3 = 0.0083
- - - s4 = 0.5004
After learning:
- - - days = 268
- - - s1 = 1242
- - - s2 = 10.22
- - - s3 = 0.0082
- - - s4 = 0.4001
Brookes Law: Adding members to the project at a late phase in the game will only make it later. We test this in POM by allowing team/persons to gain experience and become better coders the more they work on the project. We test the effects and prove brookes law by running POM 35 times and gathering the number of days it takes, when team/persons can be added at different phases during the completion of the project. The following chart depicts the results, and indicates a steady increase in the days when members can be added earlier in the development.
http://i.imgur.com/frqkH.png
Y Axis: Days
X Axis: (0 to 100%) Percent of the Project Completed (Teams/Persons can only be added to the project when it is this much complete)
Thursday, July 12, 2012
Wednesday, July 11, 2012
To-Do List:
- Summer Report 3: researching MOEA performance metrics, testing NIS active breeding pool updating, new performance metrics?
- Summer Report 4: researching nicheing techniques, testing new idea for nicheing technique against those in literature, new MOEA?
- Thesis: compiling summer research results into a thesis
- Paper? I'm pretty sure that my Non-dominated Insertion Sort will make MOEAs converge faster. I think that the algorithm running times are worth publication on their own. If it also accelerates convergence, I think the resulting algorithm could be called NSGA-III (or NISGA).
- Prepare for job hunt: After this, I'd like to prepare for getting a job. I'd like to do some research that can relate to landing a data mining or game coding job.
Jared Update
**UPDATED**
Notes:
Dominance eastwest heuristic worked well
splitting while y decreasing was very prone to wacky results (some clusters of 10, some of 2)
next up:
reorganize and rethink code (very hard to SA on both models and real data as is)
get some/any pareto graph out for a model
DE & GA
Notes:
Dominance eastwest heuristic worked well
splitting while y decreasing was very prone to wacky results (some clusters of 10, some of 2)
next up:
reorganize and rethink code (very hard to SA on both models and real data as is)
get some/any pareto graph out for a model
DE & GA
Erin's update
To Dos
With paper details
Slope as Estimator of Cumulative Rule Percentages
These charts show the cumulative rule percentages for Atrazine and Bromacil. There is a graph displaying all of the rules for both chemicals. The other graphs are an example of using the slope of the cumulative rule percentage to estimate the rule percentage in a later round of experimentation. The 'A' rule set is used for both examples.
The example shows that the slope of the line in early rounds, 6 for Atrazine and 3 for Bromacil, provides a good estimate that could allow experimenters to jump forward in the aptamer discovery process by several rounds
With paper details
Slope as Estimator of Cumulative Rule Percentages
These charts show the cumulative rule percentages for Atrazine and Bromacil. There is a graph displaying all of the rules for both chemicals. The other graphs are an example of using the slope of the cumulative rule percentage to estimate the rule percentage in a later round of experimentation. The 'A' rule set is used for both examples.
The example shows that the slope of the line in early rounds, 6 for Atrazine and 3 for Bromacil, provides a good estimate that could allow experimenters to jump forward in the aptamer discovery process by several rounds
Tuesday, July 10, 2012
Subscribe to:
Posts (Atom)