Not a whole lot yet because of mid-terms last week, plus lost about 2 days when I couldn't get Texmaker installed in Ubuntu properly.
Blarg
Showing posts with label WillM. Show all posts
Showing posts with label WillM. Show all posts
Tuesday, February 28, 2012
Tuesday, January 24, 2012
Concept Lattice Generation
Using a tool called Lattice Miner (written in Java), I was able to generate the following lattices.
NOTE: This lattice was generated using only PD, PF, and PREC as objectives (meaning 3 dimensions). This generated 25 rules. Using all 6 dimensions I have been using (number of bugs found, lines of code, bugs/loc), I was getting 70 rules, which made the lattice chart very large and unwieldy, and therefore not ideal for the presentation:
Limitations: it doesn't allow labelling of edges, only of points.
This tool doesn't have a summary tool (to average the PDs, PFs, etc) that I have found yet.
No command line interface that I have found yet.
Benefits: File structure makes it very easy to generate lattice grid files on the fly
Very quick
Visually appealing
Worth noting: Lattices higher on the tree (towards the top) tend to have higher PD, higher PF, lower precision
As you move down the tree (towards more complicated rules), you start to see lower PF and lower precision, but also lower PD
Tuesday, January 17, 2012
Frustration with concept Lattices
I've been trying to implement this:
http://www.gbspublisher.com/ijtacs/1010.pdf
Unfortunately, I just haven't been able to so far. Hopefully have something by the meeting.
http://www.gbspublisher.com/ijtacs/1010.pdf
Unfortunately, I just haven't been able to so far. Hopefully have something by the meeting.
Tuesday, January 10, 2012
Virtual Machine and You: A match made in coding heaven
Here's the link to set up Ubuntu Virtual Machine, you crazy coders you:
From here, go to Projects->Homework 1
E-mail me if you have questions.
Tuesday, October 4, 2011
Crowd pruning
Tried 2 methods:
1) Based on the ratio of (max allowed / current rule count), I randomly decide whether to include each point.
2) I sort by the first dimension, then add in in order every nth item to get to 100. (If n is not an integer, it selects every item that makes the "step" go over the next integer value.
Using algorithm one, I saw a fair amount of performance loss, as gaps began to appear in the frontier.
However, when I used the second algorithm, I saw mostly the same level of performance as the data where I didn't crowd prune the rules, and generated the results is minutes rather than a few hours.
For comparison, here's the fonseca curve at generation 9 without rule pruning.
1) Based on the ratio of (max allowed / current rule count), I randomly decide whether to include each point.
2) I sort by the first dimension, then add in in order every nth item to get to 100. (If n is not an integer, it selects every item that makes the "step" go over the next integer value.
Using algorithm one, I saw a fair amount of performance loss, as gaps began to appear in the frontier.
However, when I used the second algorithm, I saw mostly the same level of performance as the data where I didn't crowd prune the rules, and generated the results is minutes rather than a few hours.
For comparison, here's the fonseca curve at generation 9 without rule pruning.
Tuesday, September 13, 2011
CITRE Data Analysis
Not shockingly, the field that is an order of magnitude larger than the others most affects the data
By comparison, when looking at the other fields, few of the percentages were substantially different from 20% (if, within a class, each of the possible values had a 20% distribution, it would mean true random distribution)
Data (Big file)
Insofar as progress into Which and NSGA-2, I've made progress, though due to illness, not as much as I would like. Hopefully to complete both (or at least 1) by next week with some results.
By comparison, when looking at the other fields, few of the percentages were substantially different from 20% (if, within a class, each of the possible values had a 20% distribution, it would mean true random distribution)
Data (Big file)
Insofar as progress into Which and NSGA-2, I've made progress, though due to illness, not as much as I would like. Hopefully to complete both (or at least 1) by next week with some results.
Tuesday, August 30, 2011
Results on CITRE from last semester
5 "bands" further examination
ReviewDocs2 = 250
ReviewDocs2 = 275
ReviewDocs2 = 300
ReviewDocs2 = 325
ReviewDocs2 = 350
ReviewDocs2 = 250
ReviewDocs2 = 275
ReviewDocs2 = 300
ReviewDocs2 = 325
ReviewDocs2 = 350
Monday, August 29, 2011
Which2 Multidimensional optimizer
Immediate results using the given multi-dimensional functions are poor, but promising
Fonseca data
Fonseca data
All of our rules with 2-bin discretization using Which were in the tiny green square. However, the goal with fonseca is to minimize, so being in the top right corner is very bad. By comparison, with 8-bin, are rules were mostly in the top right blue square, but we had one rule with coordinates f1=0.2497 f2=0.9575.
In Kursawe
Our rules were all in the mass in the center left when I chose maximize to optimize. With 8-bin, the rules were spread out with than with 2-bin.
This is because 8-bin allows more detail than 2-bin. However, I posit once I am able to recurse this process, applying the constraints of the rules, that 2-bin will be better overall.
Further exploring these rules (by applying the rules as new constraints on the randomized input vectors on the data database) will involve a massive recoding. However, having done some manual constraints using the generated rules, the results improve in Round 2 (treating this as round 1). However, you cannot simply pick one rule to explore. Basically, your unconstrained start point is the head of an infinite tree, the branches from each node are the rules generated by each run of which using that node and all ancestor nodes to that node's rules as constraints on the input data. The rules can then be mapped to coordinates in the space of (f1, f2, ...fn). Ideally, these rules will approach the Pareto frontier.
Which is running through the data very quickly, but until I have further results which will take a massive reworking of code, can't say anything definitive about it's long term usefulness just yet.
Tuesday, January 25, 2011
Will McBurney

It is not secret Will is a major nerd. Will skipped his Senior Prom to go to National Science Bowl in 2006 and has read more Star Wars books than most people have read at all. On the inter-tubes, Will uses the psuedonym "Death by Smiley", a name that was made up spur of the moment while playing Halo in the dorms at WVU.
-Will McBurney
Subscribe to:
Posts (Atom)