Wednesday, January 9, 2013

Erin's update

Her is the current pseudocode and code for my alg.  BicBin is the alg that I will run comparisons against.


PSEUDOCODE

loop over (confidence levels, or coverage)

# ATTRIBUTE SELECTION
run FP Growth
extract unique 4-mers from results
compile the list of selected attribute numbers
derive unselected attribute numbers

# CLUSTERING
EM cluster db using selected attributes
label rows by the clusters
parse results


# CLASSIFIED TESTING
divide db into 10 bins
run nb, JRip, oner, prism, ZeroR, j48 using cluster labels as the class

keep clusters that are greater than the db average (effect size),
AND pd, pf, calc you gave me today in classification

record the cluster patterns, and rows and columns

change 1s in the clusters to 0s (non overlapping biclusters)

end loop

? do I recalculate the db average or use inital avg
? classification performance criteria

CODE (thus far)
########################

#!/bin/bash

Jar="/usr/share/java/weka.jar"
Weka="nice -n 20 java -Xmx2048M -cp $Jar "
Input="/tiny/data/DNA/WordRules/secondTry/ell4Words.arff"
CONFIDENCE=0.99
STEP=0.05
MinRulesPerc=.60
UPPER=1.0
InvAttSet=""
DbAvg=0

#COPY THE INPUT DATA
cp $Input /tiny/data/trialInput.arff
Input="/tiny/data/trialInput.arff"

cat $Input | ./dbAvg.awk
#GET THE DB AVERAGE
#DbAvg=`cat $Input | ./dbAvg.awk`
#echo "DbAvg" $DbAvg

#LOOP OVER CONFIDENCE LEVELS or MinRulesPerc(aka coverage)

  #for ((CONFIDENCE=99;CONFIDENCE==80;CONFIDENCE=CONFIDENCE-5)); do

#GET ATTRIBUTES FROM FP GROWTH
#run FPGrowth
$Weka weka.associations.FPGrowth -P 1 -I -1 -N 20 -T 0 -C $CONFIDENCE -D $STEP -U $UPPER -M $MinRulesPerc -t $Input > fpResults$MinRulesPerc$CONFIDENCE.txt

#extract unique 4-mers from FP

cat fpResults$MinRulesPerc$CONFIDENCE.txt |
sed 's/_binarized/ /g' |
sed -e 's/=1/ /g' -e 's/=0/ /g' |
./fpFieldParser.awk > attA$MinRulesPerc$CONFIDENCE.txt

#compile list of attribute numbers

cat fieldsNum.txt separator.txt attA$MinRulesPerc$CONFIDENCE.txt  |
./makeFieldNumList.awk > fieldListA$MinRulesPerc$CONFIDENCE.txt
#remove last comma

sed 's/.\{1\}$//' attIndicies.txt > fldListA$MinRulesPerc$CONFIDENCE.txt
sed 's/.\{1\}$//' inverseAttIndicies.txt > fldListIA$MinRulesPerc$CONFIDENCE.txt

#put list into variable

InvAttSet=`cat fldListIA$MinRulesPerc$CONFIDENCE.txt`
echo $InvAttSet

#cluster all instances on the reduced attribute list

$Weka weka.filters.unsupervised.attribute.AddCluster -W "weka.clusterers.EM  -t $Input -I 100 -N -1 -M 1.0E-6 -S 100" -I $InvAttSet

#keep clusters that are 30% more than the db average
#AND Test well 70+

#done

####
used 'time' and got these numbers


real 0m33.106s
user 0m38.313s
sys 0m0.890s




No comments:

Post a Comment