/ˈkɒnfɪdəns skɔː/
A confidence score represents the probability that an AI model's prediction is correct. In content moderation, it indicates how certain the system is that an image contains specific content types like nudity, violence, or hate symbols.
Scores typically range from 0 (no confidence) to 1 (complete confidence). Higher scores indicate stronger certainty in the detection.
Platforms set custom thresholds based on their policies. A children's app might block content above 0.3, while an adult platform might only flag at 0.9. The right threshold balances catching harmful content against false positives.
Modern APIs return confidence scores for multiple categories simultaneously, allowing platforms to evaluate images against various policy criteria in a single request. This enables nuanced moderation decisions based on combined risk factors.
Industry-leading accuracy for better moderation decisions
Start Free Trial