dc.contributor.author | Sevim S. | |
dc.contributor.author | Omurca S.İ. | |
dc.contributor.author | Ekinci E. | |
dc.date.accessioned | 2023-03-14T20:29:01Z | |
dc.date.available | 2023-03-14T20:29:01Z | |
dc.date.issued | 2022 | |
dc.identifier.isbn | 9.78303E+12 | |
dc.identifier.issn | 1867-8211 | |
dc.identifier.uri | https://doi.org/10.1007/978-3-031-01984-5_6 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14002/1565 | |
dc.description | 1st International Congress of Electrical and Computer Engineering, ICECENG 2022 -- 9 February 2022 through 12 February 2022 -- -- 277759 | en_US |
dc.description.abstract | Document image classification has received huge interest in business automation processes. Therefore, document image classification plays an important role in the document image processing (DIP) systems. And it is necessary to develop an effective framework for this task. Many methods have been proposed for the classification of document images in literature. In this paper we propose an efficient document image classification task that uses vision transformers (ViTs) and benefits from visual information of the document. Transformers are models developed for natural language processing tasks. Due to its high performances, their structures have been modified and they have started to be applied on different problems. ViT is one of these models. ViTs have demonstrated imposing performance in computer vision tasks compared with baselines. Since, scans the image and models the relation between the image patches using multi-head self-attention Experiments are conducted on a real-world dataset. Despite the limited size of training data available, our method achieves acceptable performance while performing document image classification. © 2022, ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. | en_US |
dc.description.sponsorship | Kocaeli Üniversitesi: FBA-2020-2152 | en_US |
dc.description.sponsorship | Acknowledgments. This work has been supported by the Kocaeli University Scientific Research and Development Support Program (BAP) in Turkey under project number FBA-2020-2152. | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Springer Science and Business Media Deutschland GmbH | en_US |
dc.relation.ispartof | Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST | en_US |
dc.rights | info:eu-repo/semantics/closedAccess | en_US |
dc.subject | Deep learning | en_US |
dc.subject | Document image classification | en_US |
dc.subject | Transformers | en_US |
dc.subject | Vision transformers | en_US |
dc.subject | Classification (of information) | en_US |
dc.subject | Deep learning | en_US |
dc.subject | Information retrieval systems | en_US |
dc.subject | Natural language processing systems | en_US |
dc.subject | Automation process | en_US |
dc.subject | Business automation | en_US |
dc.subject | Deep learning | en_US |
dc.subject | Document image classification | en_US |
dc.subject | Document image processing | en_US |
dc.subject | Document images | en_US |
dc.subject | Images classification | en_US |
dc.subject | Performance | en_US |
dc.subject | Transformer | en_US |
dc.subject | Vision transformer | en_US |
dc.subject | Image classification | en_US |
dc.title | Document Image Classification with Vision Transformers | en_US |
dc.type | conferenceObject | en_US |
dc.department | Belirlenecek | en_US |
dc.identifier.doi | 10.1007/978-3-031-01984-5_6 | |
dc.identifier.volume | 436 LNICST | en_US |
dc.identifier.startpage | 68 | en_US |
dc.identifier.endpage | 81 | en_US |
dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
dc.authorscopusid | 57219157474 | |
dc.authorscopusid | 55014691600 | |
dc.authorscopusid | 55293166200 | |
dc.identifier.scopus | 2-s2.0-85130276747 | en_US |