ai @ wvu: WillM

Showing posts with label WillM. Show all posts

Tuesday, February 28, 2012

Drafting of the Paper

Not a whole lot yet because of mid-terms last week, plus lost about 2 days when I couldn't get Texmaker installed in Ubuntu properly.

Blarg

Tuesday, January 24, 2012

Concept Lattice Generation

Using a tool called Lattice Miner (written in Java), I was able to generate the following lattices.

NOTE: This lattice was generated using only PD, PF, and PREC as objectives (meaning 3 dimensions). This generated 25 rules. Using all 6 dimensions I have been using (number of bugs found, lines of code, bugs/loc), I was getting 70 rules, which made the lattice chart very large and unwieldy, and therefore not ideal for the presentation:

Limitations: it doesn't allow labelling of edges, only of points.
This tool doesn't have a summary tool (to average the PDs, PFs, etc) that I have found yet.
No command line interface that I have found yet.

Benefits: File structure makes it very easy to generate lattice grid files on the fly
Very quick
Visually appealing

Worth noting: Lattices higher on the tree (towards the top) tend to have higher PD, higher PF, lower precision
As you move down the tree (towards more complicated rules), you start to see lower PF and lower precision, but also lower PD

Tuesday, January 17, 2012

Frustration with concept Lattices

I've been trying to implement this:
http://www.gbspublisher.com/ijtacs/1010.pdf

Unfortunately, I just haven't been able to so far. Hopefully have something by the meeting.

Tuesday, January 10, 2012

Virtual Machine and You: A match made in coding heaven

Here's the link to set up Ubuntu Virtual Machine, you crazy coders you:

http://menzies.us/cs472/

From here, go to Projects->Homework 1

E-mail me if you have questions.

Tuesday, October 4, 2011

Crowd pruning

Tried 2 methods:

1) Based on the ratio of (max allowed / current rule count), I randomly decide whether to include each point.

2) I sort by the first dimension, then add in in order every nth item to get to 100. (If n is not an integer, it selects every item that makes the "step" go over the next integer value.

Using algorithm one, I saw a fair amount of performance loss, as gaps began to appear in the frontier.

However, when I used the second algorithm, I saw mostly the same level of performance as the data where I didn't crowd prune the rules, and generated the results is minutes rather than a few hours.

For comparison, here's the fonseca curve at generation 9 without rule pruning.

Tuesday, September 13, 2011

CITRE Data Analysis

Not shockingly, the field that is an order of magnitude larger than the others most affects the data

By comparison, when looking at the other fields, few of the percentages were substantially different from 20% (if, within a class, each of the possible values had a 20% distribution, it would mean true random distribution)

Data (Big file)

Insofar as progress into Which and NSGA-2, I've made progress, though due to illness, not as much as I would like. Hopefully to complete both (or at least 1) by next week with some results.

Tuesday, August 30, 2011

Results on CITRE from last semester

5 "bands" further examination

ReviewDocs2 = 250
ReviewDocs2 = 275
ReviewDocs2 = 300
ReviewDocs2 = 325
ReviewDocs2 = 350

Monday, August 29, 2011

Which2 Multidimensional optimizer

Immediate results using the given multi-dimensional functions are poor, but promising

Fonseca data

All of our rules with 2-bin discretization using Which were in the tiny green square. However, the goal with fonseca is to minimize, so being in the top right corner is very bad. By comparison, with 8-bin, are rules were mostly in the top right blue square, but we had one rule with coordinates f1=0.2497 f2=0.9575.

In Kursawe

Our rules were all in the mass in the center left when I chose maximize to optimize. With 8-bin, the rules were spread out with than with 2-bin.

This is because 8-bin allows more detail than 2-bin. However, I posit once I am able to recurse this process, applying the constraints of the rules, that 2-bin will be better overall.

Further exploring these rules (by applying the rules as new constraints on the randomized input vectors on the data database) will involve a massive recoding. However, having done some manual constraints using the generated rules, the results improve in Round 2 (treating this as round 1). However, you cannot simply pick one rule to explore. Basically, your unconstrained start point is the head of an infinite tree, the branches from each node are the rules generated by each run of which using that node and all ancestor nodes to that node's rules as constraints on the input data. The rules can then be mapped to coordinates in the space of (f1, f2, ...fn). Ideally, these rules will approach the Pareto frontier.

Which is running through the data very quickly, but until I have further results which will take a massive reworking of code, can't say anything definitive about it's long term usefulness just yet.

Tuesday, January 25, 2011

Will McBurney

Currently a graduate student in Computer Science at West Virginia University, Will McBurney is a programmer, musician, and blogger on the side. Will McBurney began programming in his Senior year of High School in Charleston, WV, his hometown. Originally attending WVU as an undergraduate aspiring to become a Mechanical Engineer, Will switched majors midway through his freshmen year to Computer Science.

It is not secret Will is a major nerd. Will skipped his Senior Prom to go to National Science Bowl in 2006 and has read more Star Wars books than most people have read at all. On the inter-tubes, Will uses the psuedonym "Death by Smiley", a name that was made up spur of the moment while playing Halo in the dorms at WVU.

-Will McBurney