A threshold of 0.997 means that—when forming conformal prediction sets—we only include a class if the model’s predicted probability is at least 99.7% confident. In other words, the nonconformity score cutoff is so strict that only extremely confident predictions will make it into the prediction set.
Here’s what that implies:
• Very High Confidence Requirement:
With a 0.997 threshold, the model must be almost certain (≥ 99.7% probability) about a class before it’s included in the prediction set. For many examples, this might result in a prediction set with only one class (if that class’s probability exceeds 0.997) or—even worse—an empty set if no class meets that bar.
• Coverage vs. Set Size Tradeoff:
Conformal prediction is designed to guarantee that the true label is included in the prediction set at a desired rate (coverage). If you set the threshold so high, you risk lowering the coverage (i.e. many instances might not have the true label in their prediction set) or you might get very sparse (tiny) prediction sets. In practical terms, if most of your examples end up with empty or overly “confident” (but possibly incorrect) prediction sets, then the threshold is too strict.
• Is it Good or Bad?
– If the model is extremely well-calibrated and truly confident: A threshold of 0.997 could indicate that the model is rarely uncertain, and its predictions are reliable. In such a rare scenario, you might see high coverage (almost every true label is included) and prediction sets that almost always have a single label.
– In most realistic settings: Such a high threshold is likely too conservative. It may lead to prediction sets that are too small (or even empty), failing to capture the uncertainty inherent in the data. That would be “bad” because it undermines one of the strengths of conformal prediction—providing informative prediction sets that reflect the model’s uncertainty.
In summary, unless your model is known to be extremely confident and well-calibrated (so that nearly every correct prediction is given with ≥ 99.7% probability), a threshold of 0.997 is likely too strict. You would typically aim for a threshold that balances having reasonably sized prediction sets (capturing uncertainty) while still meeting your desired coverage rate (for instance, 90% coverage when targeting a 10% error rate).
No comments:
Post a Comment