Frequently Asked Questions: Document Processing & OCR
How do you measure and guarantee accuracy?
We establish accuracy baselines using annotated ground-truth samples from your actual documents. Accuracy is measured per field type (e.g., invoice number, date, amount) using precision, recall, and F1 scores. We provide transparent dashboards showing real-time extraction performance against these benchmarks.
How is my data protected during processing?
Documents are encrypted in transit (TLS 1.3) and at rest (AES-256). Processing happens in isolated environments with no data persistence after extraction. We use SOC 2 Type II compliant infrastructure and offer optional on-premises deployment for maximum control.
What happens when the AI is uncertain about an extraction?
Every extraction includes a confidence score. Extractions below your defined threshold are automatically routed to a human review queue. Reviewers can correct and approve results, and these corrections are logged for model improvement.
Which document formats and languages are supported?
We support PDFs (scanned and native), images (JPEG, PNG, TIFF), Word documents, and emails with attachments. Languages include German, English, French, Italian, Spanish, and more. Template-free extraction works with any document layout.
Can I host the solution on my own infrastructure?
Yes. We offer cloud deployment (AWS, Azure, GCP), hybrid setups, or fully on-premises installation. On-prem deployments include Docker/Kubernetes packages with all dependencies and air-gapped operation capability.
How do you handle PII and sensitive data?
PII detection and redaction can be enabled automatically. Access controls ensure only authorized personnel see sensitive fields. All access is logged, and data retention policies can be configured to auto-delete documents after processing.