Notice

This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/131742.2

Show simple item record

dc.contributor.authorBelinkov, Yonatan
dc.contributor.authorMagidow, Alexander
dc.contributor.authorBarrón-Cedeño, Alberto
dc.contributor.authorShmidman, Avi
dc.contributor.authorRomanov, Maxim
dc.date.accessioned2021-09-20T17:30:06Z
dc.date.available2021-09-20T17:30:06Z
dc.date.issued2019-04-12
dc.identifier.urihttps://hdl.handle.net/1721.1/131742
dc.description.abstractAbstract Arabic is a widely-spoken language with a long and rich history, but existing corpora and language technology focus mostly on modern Arabic and its varieties. Therefore, studying the history of the language has so far been mostly limited to manual analyses on a small scale. In this work, we present a large-scale historical corpus of the written Arabic language, spanning 1400 years. We describe our efforts to clean and process this corpus using Arabic NLP tools, including the identification of reused text. We study the history of the Arabic language using a novel automatic periodization algorithm, as well as other techniques. Our findings confirm the established division of written Arabic into Modern Standard and Classical Arabic, and confirm other established periodizations, while suggesting that written Arabic may be divisible into still further periods of development.en_US
dc.publisherSpringer Netherlandsen_US
dc.relation.isversionofhttps://doi.org/10.1007/s10579-019-09460-wen_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceSpringer Netherlandsen_US
dc.titleStudying the history of the Arabic language: language technology and a large-scale historical corpusen_US
dc.typeArticleen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2020-09-24T20:34:00Z
dc.language.rfc3066en
dc.rights.holderSpringer Nature B.V.
dspace.embargo.termsY
dspace.date.submission2020-09-24T20:34:00Z
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Needed


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version