Monday, August 26, 2013

Yet another data mining toolkit

AUK

https://github.com/timm/auk/tree/v0


eg. Naive Bayes classifier. Finds class with highest liklihood

function likelihood(row,total,hypotheses,l,_Tables,k,m,
      like,h,nh,prior,tmp,c,x,y,best) {
   like  = NINF ;    # smaller than any log
   total = total + k * length(hypotheses)
   for(h in hypotheses) {   
      nh    = length(datas[h])
      prior = (nh+k)/total
      tmp   = log(prior)
      for(c in terms[h]) {
         x = row[c]
         if (x == "?") continue
         y = counts[h][c][x] 
         tmp += log((y + m*prior) / (nh + m))
      }
      for(c in nums[h]) {
         x = row[c]
         if (x == "?") continue
          y = norm(x, mus[h][c], sds[h][c])
          tmp += log(y)
      }
      l[h] = tmp
      if ( tmp >= like ) {like = tmp; best=h}
   }
   return best
}

No comments:

Post a Comment