Tuesday, January 31, 2012

Results for Week #3

http://unbox.org/stuff/var/jar/report-2012-01-17-1.pdf

Idea vs. 90 solo-methods,18 datasets, Ranked by Median MRE


After running Idea.awk through 8 variations of MinSize= 5, 10, 20, 30 and MedianSplit = 0 or 1 and matching it against 90 solo-methods on 18 datasets ranking by Median MRE sorting largest to smallest on each we come up with a median rank of 29.5 for Idea.awk.

Dataset Idea MRE
Median Rank
finnish 2
telecom1 9
kemerer 12
desharnaisL1 13
maxwell 13
cocomo81o 17
desharnais 19
desharnaisL2 23
miyazaki94 29
desharnaisL3 30
nasa93center5 35
albrecht 36
nasa93 36
nasa93center2 42
cocomo81e 45
nasa93center1 47
cocomo81 54
cocomo81s 75

Aspects of Replayability Paper

Version 2!

Paper link

Monday, January 30, 2012

Active Learning Paper and Tukutuku

Active learning paper is here.

The tukutuku stuff (ongoing) is here.

Delaunay Tessellation

I have a k-dimensional Delaunay tessellation algorithm based upon the Bowyer-Watson algorithm. It is prototyped in Python, and terribly slow, but I believe that this can be accelerated by magnitudes by using a ball tree.

The next step to seeing whether I can beat NSGA-II is to solve the problem of iteratively generating new solutions that fall within the bounds represented by an individual's nearest neighbors.

Tuesday, January 24, 2012

Concept Lattice Generation

Using a tool called Lattice Miner (written in Java), I was able to generate the following lattices.

NOTE: This lattice was generated using only PD, PF, and PREC as objectives (meaning 3 dimensions). This generated 25 rules. Using all 6 dimensions I have been using (number of bugs found, lines of code, bugs/loc), I was getting 70 rules, which made the lattice chart very large and unwieldy, and therefore not ideal for the presentation:









Limitations: it doesn't allow labelling of edges, only of points.
This tool doesn't have a summary tool (to average the PDs, PFs, etc) that I have found yet.
No command line interface that I have found yet.

Benefits: File structure makes it very easy to generate lattice grid files on the fly
Very quick
Visually appealing

Worth noting: Lattices higher on the tree (towards the top) tend to have higher PD, higher PF, lower precision
As you move down the tree (towards more complicated rules), you start to see lower PF and lower precision, but also lower PD

Repairing feature models


From http://www.splot-research.org/

INPUT:
   <feature_model name="My feature model">     <-- feature model start tag and name attribute (mandatory)
  <meta>                                                                  <-- Optional
  <data name="description">Model description</data>                 <-- Optional
  <data name="creator">Model's creator</data>                       <-- Optional
  <data name="email">Model creator's email</data>                   <-- Optional
  <data name="date">Model creation date</data>                      <-- Optional
  <data name="department">Model creator's department</data>         <-- Optional
  <data name="organization">Model creator's organization</data>     <-- Optional
  <data name="address">Model creator's address</data>               <-- Optional
  <data name="phone">Model creator's phone</data>                   <-- Optional
  <data name="website">Model creator's website</data>               <-- Optional
  <data name="reference">Model's related publication</data>         <-- Optional
  </meta>
  <feature_tree>                <-- feature tree start tag
    :r root (root_id)                 <-- root feature named 'root' with unique ID 'root_id'         
      :o opt1 (id_opt1)               <-- an optional feature named opt1 with unique id id_opt1
      :o opt2 (id_opt2)               <-- an optional feature named opt2, child of opt1 with unique id id_opt2
      :m man1                         <-- an mandatory feature named man1 with unique id id_man1
        :g [1,*]                      <-- an inclusive-OR feature group with cardinality [1..*] ([1..3] also allowed)
          : a (id_a)                  <-- a grouped feature name a with ID id_a
          : b (id_b)                  <-- a grouped feature name b with ID id_b
            :o opt3 (id_opt3)         <-- an optional feature opt3 child of b with unique id id_opt3
          : c (id_c)                  <-- a grouped feature name c with ID id_c
        :g [1,1]                      <-- an exclusive-OR feature group with cardinality [1..1]
          : d (id_d)                  <-- a grouped feature name d with ID id_d
          : e (id_e)                  <-- a grouped feature name e with ID id_e
            :g [2,3]                      <-- a feature group with cardinality [2..3] children of feature e
              : f (id_f)                  <-- a grouped feature name f with ID id_f
              : g (id_g)                  <-- a grouped feature name g with ID id_g
              : h (id_h)                  <-- a grouped feature name h with ID id_h
  </feature_tree>               <-- feature tree end tag (mandatory)
  <constraints>                 <-- extra constraints start tag (mandatory)
    c1: ~id_a or id_opt2        <-- extra constraint named c1: id_a implies id_opt2 (must be a CNF clause)
    c2: ~id_c or ~id_e          <-- extra constraint named c2: id_c excludes id_e (must be a CNF clause)
  </constraints>                <-- extra constraint end tag (mandatory)
</feature_model>                <-- feature model end tag  (mandatory)



Important:
  • Tags are optional but encouraged
  • Feature tree must have at least one feature, i.e., the root node
  • Extra constraints are optional
  • Each extra constraint must be a CNF clause (any arity)

State of the art in checking these models:
  • A Performance Comparison of Contemporary Algorithmic Approaches for Automated Analysis Operations on Feature Models
    Richard Pohl, Kim Lauenroth and Klaus Pohl. IEEE ASE 2011
 Exponential runtimes


also; just says "no" and maybe offers one counter example


here, a linear-time greedy bayes search (think simpler than W2) generates this in 6 seconds in awk from 10,000 random models from a 300+ class system

Output:
                                                  MEDIAN      N    treatment
                                                  ------    ---    ---------
[                                ------ | ------   ] 80 | 10000    all rows
[                                     ----|  ---   ] 85 |  4945    _id_35 = 1
[                                       ---  |--   ] 90 |  2521    _id_32 = 1
[                                       ---  |--   ] 90 |  2213    _id_68 = 0
[                                       ---  |--   ] 90 |  1903    _id_63 = 0
[                                       ---  |--   ] 90 |  1427    _id_77 = 0
[                                       ---  |--   ] 90 |   711    _id_15 = 1
[                                       ---  | -   ] 90 |   355    _id_17 = 1
[                                       ---  | -   ] 90 |   170    _id_22 = 1
[                                       ---  | -   ] 90 |    86    _id_86 = 0
[                                       ---  |--   ] 90 |    47    _id_24 = 1
[                                       -----| ----] 90 |    20    _id_3 = 0 
 


What's fun here is that it is telling us what to do and what not to do.