Ottoman OCR: Printed naskh font

Dölek, I.; Kurt, A.

Gelişmiş Arama

Göster/Aç

Tam Metin / Full Text (2.197Mb)

Erişim

info:eu-repo/semantics/closedAccess

Tarih

2021

Yazar

Dölek, I.
Kurt, A.

Üst veri

Tüm öğe kaydını göster

Özet

We present an OCR tool developed for printed Ottoman documents in naksh font as part of a project named End-to-End Conversion of Ottoman Documents to Modern Turkish This tool uses a deep learning model trained with a data set containing original and synthetic documents. We conducted an experimental comparison of this tool named Osmanlica.com with Tesseract Arabic, Tesseract Persian, Abby Finereader, Miletos and Google Docs OCR tools (or models) using a test data set comprised of 21 pages of original documents. With 88, 64% raw, 95, 92% normalized and 97, 18% joined character recognition accuracy rates, Osmanlica.com outperformed the other tools with a marked difference. Osmanlica.com also achieved 58% word recognition accuracy which is the only rate over 50% among the OCR tools compared. We shared the test data set, ground truth, OCR outputs and the test program written in Python using difflib at osmanlica.com/test for independent verification. © 2021 IEEE.

Kaynak

2021 International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2021 - Proceedings

Bağlantı

https://doi.org/10.1109/INISTA52262.2021.9548616
https://hdl.handle.net/20.500.14002/428

Koleksiyonlar

Sakarya Meslek Yüksekokulu Koleksiyonu [73]
Scopus İndeksli Yayınlar Koleksiyonu [1179]