Definition

Confidence Score

/ˈkɒnfɪdəns skɔː/

A numerical value (typically 0-1 or 0-100%) indicating the AI model's certainty that an image contains a particular type of content, used to determine moderation thresholds and actions.

What is a Confidence Score?

A confidence score represents the probability that an AI model's prediction is correct. In content moderation, it indicates how certain the system is that an image contains specific content types like nudity, violence, or hate symbols.

Scores typically range from 0 (no confidence) to 1 (complete confidence). Higher scores indicate stronger certainty in the detection.

Confidence Score Examples

NSFW

0.95

Violence

0.12

Hate Symbol

0.03

Setting Thresholds

Platforms set custom thresholds based on their policies. A children's app might block content above 0.3, while an adult platform might only flag at 0.9. The right threshold balances catching harmful content against false positives.

Using Confidence Scores

Auto-approve: Score below 0.2 - likely safe content
Human review: Score 0.2-0.8 - uncertain, needs review
Auto-reject: Score above 0.8 - high confidence violation

Multi-Category Scores

Modern APIs return confidence scores for multiple categories simultaneously, allowing platforms to evaluate images against various policy criteria in a single request. This enables nuanced moderation decisions based on combined risk factors.

Confidence Score

What is a Confidence Score?

Confidence Score Examples

Setting Thresholds

Using Confidence Scores

Multi-Category Scores

Related Terms

Precision

Recall

False Positive

False Negative

Get Precise Confidence Scores