Confidence Level

This page explains how Taggun's API confidence level measures data reliability based on the quality of the uploaded receipt or invoice, helping developers decide whether to process or review submissions.

What is Confidence Level?

The confidenceLevel in Taggun’s API measures how likely the extracted data is correct, based on the quality of the uploaded receipt or invoice.

This helps developers determine whether to process the data automatically or take further actions, such as human review or requesting a clearer image from the user.


How It's Generated

The confidence level is determined by analysing various aspects of the uploaded file that affect how well the OCR can extract data. Factors include:

  • Image resolution: Low-resolution images reduce readability and result in lower confidence scores.
  • Tilt or rotation: Images that are skewed or poorly aligned can decrease accuracy.
  • Blurriness or smudging: If the text is blurry or smudged, the system struggles to extract clean data.
  • Readability: Poor lighting, overly small fonts, or visual obstructions (e.g., folds or shadows) lower the confidence level, as the data is harder to interpret.
  • Obstructions or hidden data: If parts of the receipt are hidden or cut off, the confidence level drops as the OCR has less information to work with.

Two Types of Confidence Levels

  1. Overall Receipt Confidence: A single score for the entire receipt or invoice, reflecting the general quality of the file.
  2. Data Point Confidence: Each extracted data point (e.g., total amount, tax, merchant name) has its own confidence score, giving developers a detailed view of how reliable each field is.

Recommended Threshold

We recommend setting a confidence threshold of 0.7 to 0.8:

  • Higher sensitivity cases should use a threshold closer to 0.8 for greater reliability.
  • General use cases can use a threshold around 0.7, providing a balance between precision and flexibility.

Why It's Important

It’s crucial to understand that the confidence level is not a reflection of Taggun’s accuracy rate or the OCR technology’s capability. Instead, it reflects how likely the extracted data is correct based on the quality of the uploaded document.

Here’s why this distinction is important:

  • The confidence level helps identify low-quality submissions—such as blurry, tilted, or incomplete receipts—not the performance of Taggun's Data Extraction.
  • Even if Taggun’s OCR is highly accurate, poor-quality images will result in lower confidence scores because the system has less reliable data to work with.
  • By using the confidence level, you can manage user errors, such as submitting poorly captured images, and decide whether to:
    • Automatically process high-confidence data.
    • Flag low-confidence data for manual review.
    • Request a new image from the user when confidence levels fall below a certain threshold.

Conclusion

The confidence level offers a valuable tool for handling scenarios where image quality may vary, ensuring that you maintain control over the accuracy and reliability of the data your system processes.