Ottoman OCR: Printed naskh font
Özet
We present an OCR tool developed for printed Ottoman documents in naksh font as part of a project named End-to-End Conversion of Ottoman Documents to Modern Turkish This tool uses a deep learning model trained with a data set containing original and synthetic documents. We conducted an experimental comparison of this tool named Osmanlica.com with Tesseract Arabic, Tesseract Persian, Abby Finereader, Miletos and Google Docs OCR tools (or models) using a test data set comprised of 21 pages of original documents. With 88, 64% raw, 95, 92% normalized and 97, 18% joined character recognition accuracy rates, Osmanlica.com outperformed the other tools with a marked difference. Osmanlica.com also achieved 58% word recognition accuracy which is the only rate over 50% among the OCR tools compared. We shared the test data set, ground truth, OCR outputs and the test program written in Python using difflib at osmanlica.com/test for independent verification. © 2021 IEEE.