Precision, recall, sensitivity and specificity

01 January 2012

Nowadays I work for a medical device company where in a medical test the big indicators of success are specificity and sensitivity. Every medical test strives to reach 100% in both criteria. Imagine my surprise today when I found out that other fields use different metrics for the exact same problem. To analyze this I present to you the confusion matrix:

Confusion Matrix

Confusion Matrix E.g. we have a pregnancy test that classifies people as pregnant (positive) or not pregnant (negative). And now some equations...

Sensitivity and specificity are statistical measures of the performance of a binary classification test:

Sensitivity Specificity

Sensitivity in yellow, specificity in red

sensitivity and specificity

In pattern recognition and information retrieval:

Precision Recall Let's translate:

Precision in red, recall in yellow

Precision, recall

Standardized equations

Equations explained

More ways to cheat

A Specificity buff - let's continue with our pregnancy test where our experiments resulted in the following confusion matrix:
8 2
10 80
Our specificity is only 88% and we need 97% for our FDA approval. We can tell our patients to run the test twice and only double positives count (eg two red lines) so we suddenly have 98.7% specificity. Magic. This would only be kosher if the test results are proven as independent. Most tests are probably not as such (eg blood parasite tests that are triggered by antibodies may repeatedly give false positives from the same patient). A  less ethical (though IANAL) approach would be to add 300 men to our pregnancy test experiment. Of course, part of our test is to ask "are you male?" and mark these patients as "not pregnant". Thus we get a lot of easy true negatives and this is the resulting confusion matrix:
8 2
10 380
Voila! 97.4% specificity with a single test. Have fun trying to get that FDA approval though, I doubt they'll overlook the 300 red herrings.

What does it mean, who won?

Finally the punchline: