Data Curation
Arabic Calligraphy OCR Dataset
2,000+ annotated calligraphy images for Arabic OCR model training.
Custom annotation workflowQA pipelineDataset formattingPython
← Back to All Projects
THE PROBLEM
Client needed a high-quality labeled dataset of Arabic calligraphy with exact text transcriptions for training an OCR model.
THE SOLUTION
Collected 2,000+ unique calligraphy images, annotated each with exact Arabic text, validated through multi-stage QA, and delivered in structured format ready for model training.
WHAT WE BUILT
2,000+ unique image samples
Field-level text annotation
Multi-stage quality validation
Structured delivery format
Ongoing dataset expansion
TECH STACK
Custom annotation workflowQA pipelineDataset formattingPython
OUTCOME
Delivered clean, benchmark-ready dataset. Ongoing — second phase expanding to 5,000+ images.