Data Curation

Arabic Calligraphy OCR Dataset

2,000+ annotated calligraphy images for Arabic OCR model training.

Custom annotation workflowQA pipelineDataset formattingPython
← Back to All Projects

THE PROBLEM

Client needed a high-quality labeled dataset of Arabic calligraphy with exact text transcriptions for training an OCR model.

THE SOLUTION

Collected 2,000+ unique calligraphy images, annotated each with exact Arabic text, validated through multi-stage QA, and delivered in structured format ready for model training.

WHAT WE BUILT

2,000+ unique image samples
Field-level text annotation
Multi-stage quality validation
Structured delivery format
Ongoing dataset expansion

TECH STACK

Custom annotation workflowQA pipelineDataset formattingPython

OUTCOME

Delivered clean, benchmark-ready dataset. Ongoing — second phase expanding to 5,000+ images.