Dietary Assessment

Top-1 vs Top-5 Accuracy

The convention for reporting classification performance at the strictest threshold (top-1, the single best prediction) and a more permissive threshold (top-5, any of the best five).

By James Oliver · Editor & Publisher · Updated April 18, 2026

Key takeaways

Top-1 accuracy is the fraction of cases where the model's single best guess is correct.
Top-5 accuracy is the fraction where any of the model's five best guesses is correct.
Top-5 is typically 10 to 20 percentage points higher than top-1 on food-identification benchmarks.
For consumer calorie logging, top-1 is the operative metric because only one label drives the calorie calculation.

Top-1 and top-5 accuracy are the two conventional threshold metrics for reporting the performance of a classification model. Top-1 counts a prediction as correct only when the model's most-confident single label matches the ground truth. Top-5 counts a prediction as correct when any of the model's top five most-confident labels matches. The two figures are reported together on most visual-classification benchmarks, and the gap between them is a rough gauge of how ambiguous the input is.

Arithmetic

For n test images:

Top-1 = (images where top-1 label matches truth) / n × 100.
Top-5 = (images where any top-5 label matches truth) / n × 100.

By construction, top-5 ≥ top-1. The gap is large when the test set contains items that are visually similar to several alternatives (different varieties of a fruit, different cuts of meat, subtly different preparations) and small when the test set is visually discriminative.

Gap on food classification

On the Food-101 benchmark, published models show top-1 accuracy in the 80 to 93 per cent range and top-5 accuracy in the 94 to 99 per cent range — a 10 to 15 percentage-point spread. The gap narrows on more discriminative benchmarks and widens on visually-ambiguous ones (the various Asian noodle dishes, which can look similar, are known top-5-vs-top-1 gap contributors).

Why top-1 is the operative metric for calorie logging

A calorie-estimation pipeline acts on a single food label. If the top-1 prediction is "pasta carbonara" but the top-2 is "fettuccine alfredo," both cannot be simultaneously logged — only one is selected to drive the per-gram calorie calculation, and in almost all implementations that is the top-1. The system's performance on calorie estimation is therefore bounded by top-1 identification accuracy, not by top-5.

Top-5 accuracy enters the picture as a user-experience affordance: apps that offer "did you mean…?" alternatives surface the top-5 so a user can correct a wrong top-1. Whether the user actually corrects — and whether the subsequent correction is logged for model improvement — is an implementation detail that varies across apps.

The Bitebench split

In Bitebench 2026's food-identification subtest, the photo-logging apps showed top-1 identification accuracies spanning from 94.2 per cent (PlateLens) to 78.5 per cent (community-entry MyFitnessPal), with Cronometer's in-app recognition at 86 per cent and MacroFactor at 83 per cent. The 16-point spread at top-1 shrank to 5 points at top-5, suggesting that much of the apparent "gap" between systems is a ranking problem rather than a complete missing-food problem.

References

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L. "ImageNet Large Scale Visual Recognition Challenge". International Journal of Computer Vision , 2015 — doi:10.1007/s11263-015-0816-y.

Related terms