Basic Presentation for DM and Aptamers
for NanoSAFE fellows on Oct 31,2012
Target: General audience
Monday, October 28, 2013
Monday, October 21, 2013
DataSet ID Comparison
All comments based on KNN performance
http://unbox.org/things/var/fayola/docs/newcliff-v6.pdf
fig. 5
Data Set---ID----- Comment
labor-neg :: 0.68 Noticeable Downtrend
glass :: 0.75 Down Trend with some improvement at 20%
iris :: 1.11 Flat
hepatitis :: 2.45 Noticeable Downtrend on ACC, Prec Not Convinced PD, PF
ecoli :: 3.27 Flat (acts like Software sets for pd, prec, pf)
bcancer :: 0.00 No Results (unsure of source)
heartc :: 0.00 No Results(restricted)
lymph :: 0.00 No Results (restricted)
vote :: 0.00 No Results
Note: I am interested to see what the UCI Data Sets look like with Error Bars to see if they look similar in any way to the Software Sets
http://unbox.org/things/var/brian/2013/projects/data-quality/mucker2/output/10-7-2013%20plots/
----------------
xerces1.4:: 0.01 Flat/Not Convinced
poi-3.0 :: 1.45 Flat/Not Convinced
ivy-1.1 :: 1.74 Flat/Not Convinced
synap1.2:: 1.89 Flat/Not Convinced
xalan2.6 :: 1.90 Flat/Not Convinced
jedit-4 :: 2.03 Flat/Not Convinced
ant-1.7 :: 2.08 Flat/Not Convinced
log4j-1.1:: 2.67 Flat/Not Convinced
lucene2.4:: 2.94 Flat/Not Convinced
veloci1.6:: 3.00 Flat/Not Convinced
TODO:
*Create Random DataSets as per "Reflections on the NASA MDP data sets" - D.Gray
*Get My (Vasil's) rig running with the UCI data sets to see if we get similar results?
*Find sources for [bcancer, lymph, vote]
*Get to the bottom of why the noise is not a problem, if ID does not explain it.
*????
http://unbox.org/things/var/fayola/docs/newcliff-v6.pdf
fig. 5
Data Set---ID----- Comment
labor-neg :: 0.68 Noticeable Downtrend
glass :: 0.75 Down Trend with some improvement at 20%
iris :: 1.11 Flat
hepatitis :: 2.45 Noticeable Downtrend on ACC, Prec Not Convinced PD, PF
ecoli :: 3.27 Flat (acts like Software sets for pd, prec, pf)
bcancer :: 0.00 No Results (unsure of source)
heartc :: 0.00 No Results(restricted)
lymph :: 0.00 No Results (restricted)
vote :: 0.00 No Results
Note: I am interested to see what the UCI Data Sets look like with Error Bars to see if they look similar in any way to the Software Sets
http://unbox.org/things/var/brian/2013/projects/data-quality/mucker2/output/10-7-2013%20plots/
----------------
xerces1.4:: 0.01 Flat/Not Convinced
poi-3.0 :: 1.45 Flat/Not Convinced
ivy-1.1 :: 1.74 Flat/Not Convinced
synap1.2:: 1.89 Flat/Not Convinced
xalan2.6 :: 1.90 Flat/Not Convinced
jedit-4 :: 2.03 Flat/Not Convinced
ant-1.7 :: 2.08 Flat/Not Convinced
log4j-1.1:: 2.67 Flat/Not Convinced
lucene2.4:: 2.94 Flat/Not Convinced
veloci1.6:: 3.00 Flat/Not Convinced
TODO:
*Create Random DataSets as per "Reflections on the NASA MDP data sets" - D.Gray
*Get My (Vasil's) rig running with the UCI data sets to see if we get similar results?
*Find sources for [bcancer, lymph, vote]
*Get to the bottom of why the noise is not a problem, if ID does not explain it.
*????
Tuesday, October 8, 2013
UPC Find
Update 5/01/14
Nutritional Facts information: we've got it for about 30% of the items.
Unique UPCs | Total Items
Items matching Walmart.com : 5436 34.3% | 5904 35.0%
Items with Walmart nutrition : 4514 28.5% | 4940 29.0%
'Sodium' : 4540 28.7% | 4972 30.0%
'Total Carbohydrate' : 4540 28.7% | 4973 30.0%
'Protein' : 4514 28.5% | 4940 29.8%
'Total Fat' : 4512 28.5% | 4949 29.9%
'Sugars' : 3924 24.8% | 4251 25.7%
'Saturated Fat' : 3778 23.8% | 4228 25.5%
'Cholesterol' : 3744 23.6% | 4175 25.2%
'Trans Fat' : 3446 21.7% | 3849 23.2%
'Dietary Fiber' : 3296 20.8% | 3503 21.1%
'Potassium' : 1462 9.2% | 1506 9.1%
'Calories' : 614 3.9% | 729 4.4%
'Dietary' : 360 2.3% | 567 3.4%
'Monounsaturated Fat' : 274 1.7% | 333 2.0%
'' : 250 1.6% | 248 1.5%
'Saturated' : 64 0.4% | 69 0.4%
'Trans' : 64 0.4% | 54 0.3%
'Sugars Less Than' : 12 0.1% | 9 0.1%
'Total' : 8 0.1% | 4 0.0%
'Dietary Fiber Less Than' : 6 0.0% | 4 0.0%
'Monounsaturated Fat 1.5' : 2 0.0% | 1 0.0%
'Dietary Fiber 2' : 2 0.0% | 7 0.0%
Items matching upcdatabase.com : 4344 27.4% | 5125 30.0%
Items matching local UPC list : 2594 16.4% | 3046 18.0%
Items matching ONLY Walmart.com : 3352 21.2% | 3606 21.0%
Items matching ONLY upcdatabase : 3352 21.2% | 3606 21.0%
Items matching ONLY local list : 3352 21.2% | 3606 21.0%
Items matching all three sources: 1474 9.3% | 1717 10.0%
Items matching any source : 7886 49.8% | 8891 53.0%
Update 3/27/14
Criteria for match success: must match at ONE of the following:
-Walmart.com result
-UPC database result + NDB match
-Food database result + NDB match
Graphs (some of these are repeats from lost blog posts):
Tabular Results:
top 1000 items
Update--Web Sources for Barcode Lookup:
Summary:
- Of the 50 Items scanned from my fiance's kitchen, a match was found for 29 items (58%).
- If beverages are removed from the list, the match rate for this set of scans jumps to 28/40 (70%).
- An interesting observation: in this set of scans, ALL store-brand products failed to match.
- Partial matching of UPCs yielded poor results
- For this set, 81% of the items which failed to match had a brand code which matched to something in the database. Perhaps brand could be used in some circumstances to make an educated guess about the properties of the item if a match cannot be made.
- positive example: Florida's Natural (low brand variance)
- negative example: Sam's Choice (high brand variance)
UPCA == TMMMMMPPPPPX
where T is type (0 for US UPC)
M is manufacturer code
P is product code
X is check digit
1) UPCA => Jif Peanut Butter
2) UPCA => Nutella Hazelnut Spread with Skim Milk & Cocoa
3) UPCA => No Match 078742095233
4) UPCA => No Match 011110833303
5) UPCA => No Match 041498127824
6) UPCA => No Match 044000031138
7) UPCA => Bush's Best Baked Beans
8) UPCA => Kraft Dinners Easy Mac
9) UPCA => No Match 038000844966
10) UPCA => Campbell's Pasta
11) UPCA => Betty Crocker Instant Potatoes
12) UPCA => Campbell's R&W Condensed Soup
13) UPCA => No Match 085000016176
14) UPCA => No Match 085000019894
15) UPCA => No Match 081172780006
16) UPCA => Smart Balance Cooking Spray
17) UPCA => No Match 031200002945
18) UPCA => No Match 078742351896
19) UPCA => Kraft Philadelphia Cream Cheese Spread
20) UPCA => No Match 016300151304
21) UPCA => No Match 070847000037
22) UPCA => Coca-Cola Cola
23) UPCA => No Match 087692591009
24) UPCB => UPCA 8857378 => 088573000078
24) UPCA => No Match 088573000078
25) UPCA => Sweet Baby Ray's Barbecue Sauce
26) UPCB => UPCA 1364008 => 013000006408
26) UPCA => Heinz Ketchup
27) UPCA => French's Classic Yellow Mustard
28) UPCA => Hellmann's Mayonnaise
29) UPCA => DiGiorno Pizza & Breadsticks
30) UPCA => Birds Eye Steamfresh Corn
31) UPCA => No Match 011110673565
32) UPCA => Lance Toastchee
33) UPCA => No Match 078742434377
34) UPCA => McCormick Grill Mates Seasoning
35) UPCA => Sun Chips Flavored Multigrain Snack
36) UPCA => Doritos Tortilla Chips
37) UPCA => No Match 050000497256
38) UPCA => Smucker's Preserves
39) UPCA => No Match 011110786715
40) UPCA => Tostitos Dip
41) UPCA => Prego Italian Sauce
42) UPCA => Quaker Oatmeal Instant Oatmeal
43) UPCA => Swiss Miss Hot Cocoa Mix
44) UPCA => Swanson RTS Broth
45) UPCA => No Match 078742030104
46) UPCA => No Match 011110492630
47) UPCA => Betty Crocker Loaded Mashed
48) UPCA => No Match 072736014880
49) UPCA => Duncan Hines Cake Mix
50) UPCA => Knorr Side Dishes Fiesta Sides
match: 29/50 58.0%
Items not found:
Note: one-off, two-off are the number of database items which differ by one or two characters
UPCA | FoundBrand | one-off | two-off | type (* indicates store-brand)
078742095233 | T | 0 | 2 | trail mix*
011110833303 | T | 0 | 0 | trail mix*
041498127824 | T | 0 | 0 | baking soda*
044000031138 | T | 0 | 9 | ritz snack packs
038000844966 | T | 0 | 0 | pringles
085000016176 | F | 0 | 2 | gin
085000019894 | F | 0 | 0 | wine
081172780006 | T | 0 | 0 | gummy candy
031200002945 | T | 0 | 18 | crasins
078742351896 | T | 0 | 2 | skim milk*
016300151304 | T | 0 | 4 | orange juice
070847000037 | T | 0 | 2 | energy drink
087692591009 | F | 0 | 0 | beer
088573000078 | F | 0 | 0 | beer
011110673565 | T | 0 | 0 | jello*
078742434377 | T | 0 | 1 | paprika*
050000497256 | T | 0 | 0 | coffee mate
011110786715 | T | 0 | 0 | applesauce*
078742030104 | T | 0 | 1 | chicken breast*
011110492630 | T | 0 | 2 | cola*
072736014880 | T | 0 | 0 | vinaigrette
List of Items Scanned:
1) Gif reduced fat PB
2) Nutella
3) tropical Trail Mix
4) traditional trail mix
5) baker's corner baking soda
6) ritz fresh snacks
7) busch's baked beans
8) kraft Easy mac
9) pringles original
10) Spaghetto0s meatballs
11) betty crocker sour cream and chives
12) campbell's chicken noodle
13) New Amsterdam Gin
14) Barefood Red moscato
15) PVZ gummies
16) smart balance cooking spray
17) crasins
18) great value skim milk
19) phillidelpha cream cheese
20) florida's natural
21) monster absolutely zero
22) coke 2 litre
23) sam adams latitude 48
24) schiner bock
25) sweet baby ray's
26) heinz
27) french's
28) hellman's
29) pizza and breadsticks
30) bird's eye corn
31) kroger strawberry jello
32) lance toast chee
33) great value paprika
34) grill mates mesquite spice
35) sun chips
36) dorritos
37) coffee mate french vanilla
38) smuckers strawberry preserves
39) kroger applesauce
40) tostitos creamy spanish
41) prego meat
42) quaker fruit n cream
43) swiss miss
44) swanson chicken broth
45) great value chicken breast
46) big k vanilla cola
47) betty crocker loaded mash
48) vinaigrette
49) duncan hines spice cake
50) knorr taco rice
Tuesday, October 1, 2013
WMC3CDA Results
Shown here are results for GALE/NSGA-II/SPEA2 for a single run each on the CDA simulation for the WMC3 framework (i.e. WMC3CDA).
Note that each single run evaluates every member of every population for comparison purposes, but with GALE, not every evaluation is necessary for optimization. Thus, each algorithm takes ~ about 4 hours to do a single run.
The results below are for a single run of the algorithm. A more complete analysis would require ~ 10 runs at the very least, or 20 to be minimally satisfying to put up a solid statistical comparison.
Results:
Dot Plots: 1
http://i.imgur.com/jyfLyDH.png
Line Plots:
GALE is on top. Bottom plots have messed up plot marker-choices, so read carefully.
http://i.imgur.com/LxkL6eh.png
Comments:
Although GALE reaches some lows not seen by the other algorithms, GALE has much larger variance, which could theoretically be fixed by removing random jiggles.
Note that the adjustments in making GALE deterministic (removing all randoms) are not used in these results.
Whats Next:
Need more Decision Variables!
- Different landing routes
- Radar Tower parameters?
Prune some objectives away? Focus on time lost instead of number of delayed/interrupted tasks.
NASA furloughs - no access to their servers for now. Need to get stuff working on my machines.
Note that each single run evaluates every member of every population for comparison purposes, but with GALE, not every evaluation is necessary for optimization. Thus, each algorithm takes ~ about 4 hours to do a single run.
The results below are for a single run of the algorithm. A more complete analysis would require ~ 10 runs at the very least, or 20 to be minimally satisfying to put up a solid statistical comparison.
Results:
Dot Plots: 1
http://i.imgur.com/jyfLyDH.png
Line Plots:
GALE is on top. Bottom plots have messed up plot marker-choices, so read carefully.
http://i.imgur.com/LxkL6eh.png
Comments:
Although GALE reaches some lows not seen by the other algorithms, GALE has much larger variance, which could theoretically be fixed by removing random jiggles.
Note that the adjustments in making GALE deterministic (removing all randoms) are not used in these results.
Whats Next:
Need more Decision Variables!
- Different landing routes
- Radar Tower parameters?
Prune some objectives away? Focus on time lost instead of number of delayed/interrupted tasks.
NASA furloughs - no access to their servers for now. Need to get stuff working on my machines.
Subscribe to:
Posts (Atom)