Friday, January 15, 2010

Checking prior results in student retention estimation

Prior results in learning predictors for student retention are shown here.

Those reports offer a variety of numbers. But using the Zhang equations, we can computer the missing from the given:

```calc(Pos,Neg,Prec,Recall, Pf,Acc) :-
Pf      is Pos/Neg * (1-Prec)/Prec * Recall,
D       is Recall * Pos,
C       is Pf * Neg,
A       is C*(1/Pf - 1),
Acc     is (A+D)/(Neg + Pos).

```
Then we can write a simulator to explore a range of possible values. For example, for the Atwell paper:

```run(atwel,[prec/Prec,neg=Neg,pos=Pos,pf/Pf,pd/Recall,acc/Acc]) :-
nl,
member(Prec,[0.88,0.82,0.73]),
N   = 5990,
Neg = 4881,
Pos is (N-Neg),
member(Recall,[0.65,0.7,0.75,0.8,0.85,0.9]),
calc(Pos,Neg,Prec,Recall,Pf,Acc).

```
When we run this, we get the following numbers. Note the suspiciously low false alarm rates:

```[who=atwel, prec=88, neg=4881, pos=1109, pf=2, pd=65, acc=92]
[who=atwel, prec=88, neg=4881, pos=1109, pf=2, pd=70, acc=93]
[who=atwel, prec=88, neg=4881, pos=1109, pf=2, pd=75, acc=93]
[who=atwel, prec=88, neg=4881, pos=1109, pf=2, pd=80, acc=94]
[who=atwel, prec=88, neg=4881, pos=1109, pf=3, pd=85, acc=95]
[who=atwel, prec=88, neg=4881, pos=1109, pf=3, pd=90, acc=96]
[who=atwel, prec=82, neg=4881, pos=1109, pf=3, pd=65, acc=91]
[who=atwel, prec=82, neg=4881, pos=1109, pf=3, pd=70, acc=92]
[who=atwel, prec=82, neg=4881, pos=1109, pf=4, pd=75, acc=92]
[who=atwel, prec=82, neg=4881, pos=1109, pf=4, pd=80, acc=93]
[who=atwel, prec=82, neg=4881, pos=1109, pf=4, pd=85, acc=94]
[who=atwel, prec=82, neg=4881, pos=1109, pf=4, pd=90, acc=94]
[who=atwel, prec=73, neg=4881, pos=1109, pf=5, pd=65, acc=89]
[who=atwel, prec=73, neg=4881, pos=1109, pf=6, pd=70, acc=90]
[who=atwel, prec=73, neg=4881, pos=1109, pf=6, pd=75, acc=90]
[who=atwel, prec=73, neg=4881, pos=1109, pf=7, pd=80, acc=91]
[who=atwel, prec=73, neg=4881, pos=1109, pf=7, pd=85, acc=91]
[who=atwel, prec=73, neg=4881, pos=1109, pf=8, pd=90, acc=92]

```
Similarly for the delong results. Here's the query:

```run(delong,[prec/Prec,neg=Neg,pos=Pos,pf/Pf,pd/Recall,acc/Acc]) :-
nl,
Neg is 500,
Pos is 500,
member(Prec,[0.57,0.58,0.59]),
member(Recall,[0.65,0.7,0.75,0.8,0.85,0.9]),
calc(Pos,Neg,Prec,Recall,Pf,Acc).

```
And here's the results. Note the very high false alarm rates and mediocre accuracies.

```[who=delong, prec=57, neg=500, pos=500, pf=49, pd=65, acc=58]
[who=delong, prec=57, neg=500, pos=500, pf=53, pd=70, acc=59]
[who=delong, prec=57, neg=500, pos=500, pf=57, pd=75, acc=59]
[who=delong, prec=57, neg=500, pos=500, pf=60, pd=80, acc=60]
[who=delong, prec=57, neg=500, pos=500, pf=64, pd=85, acc=60]
[who=delong, prec=57, neg=500, pos=500, pf=68, pd=90, acc=61]
[who=delong, prec=58, neg=500, pos=500, pf=47, pd=65, acc=59]
[who=delong, prec=58, neg=500, pos=500, pf=51, pd=70, acc=60]
[who=delong, prec=58, neg=500, pos=500, pf=54, pd=75, acc=60]
[who=delong, prec=58, neg=500, pos=500, pf=58, pd=80, acc=61]
[who=delong, prec=58, neg=500, pos=500, pf=62, pd=85, acc=62]
[who=delong, prec=58, neg=500, pos=500, pf=65, pd=90, acc=62]
[who=delong, prec=59, neg=500, pos=500, pf=45, pd=65, acc=60]
[who=delong, prec=59, neg=500, pos=500, pf=49, pd=70, acc=61]
[who=delong, prec=59, neg=500, pos=500, pf=52, pd=75, acc=61]
[who=delong, prec=59, neg=500, pos=500, pf=56, pd=80, acc=62]
[who=delong, prec=59, neg=500, pos=500, pf=59, pd=85, acc=63]
[who=delong, prec=59, neg=500, pos=500, pf=63, pd=90, acc=64]

```