Show simple item record

dc.contributor.authorLuo, Queenie
dc.contributor.authorChuang, Yung-Sung
dc.date.accessioned2024-04-04T17:40:12Z
dc.date.available2024-04-04T17:40:12Z
dc.date.issued2024-03-30
dc.identifier.issn2375-4699
dc.identifier.issn2375-4702
dc.identifier.urihttps://hdl.handle.net/1721.1/154069
dc.description.abstractScholars in the humanities heavily rely on ancient manuscripts to study history, religion, and socio-political structures of the past. Significant efforts have been devoted to digitizing these precious manuscripts using OCR technology. However, most manuscripts have been blemished over the centuries, making it unrealistic for OCR programs to accurately capture faded characters. This work presents the Transformer + Confidence Score mechanism architecture for post-processing Google?s Tibetan OCR-ed outputs. According to the Loss and Character Error Rate metrics, our Transformer + Confidence Score mechanism architecture proves superior to the Transformer, LSTM-to-LSTM, and GRU-to-GRU architectures. Our method can be adapted to any language dealing with post-processing OCR outputs.en_US
dc.publisherACMen_US
dc.relation.isversionof10.1145/3654811en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceACMen_US
dc.subjectGeneral Computer Scienceen_US
dc.titleCleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscriptsen_US
dc.typeArticleen_US
dc.identifier.citationLuo, Queenie and Chuang, Yung-Sung. 2024. "Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts." ACM Transactions on Asian and Low-Resource Language Information Processing.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.relation.journalACM Transactions on Asian and Low-Resource Language Information Processingen_US
dc.identifier.mitlicensePUBLISHER_POLICY
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2024-04-01T07:50:13Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2024-04-01T07:50:13Z
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record