Supplementary materials for "ProppLearner: Deeply Annotating a Corpus of Russian Folktales to Enable the Machine Learning of a Russian Formalist Theory"
Author(s)
Patrick Winston; Genesis; Finlayson, Mark Alan
Downloadarchive.zip (8.145Mb)
Other Contributors
Genesis
Advisor
Patrick Winston
Terms of use
Metadata
Show full item recordAbstract
This archive contains the supplementary material for the journal article "ProppLearner: Deeply Annotating a Corpus of Russian Folktales to Enable the Machine Learning of a Russian Formalist Theory", published in the Journal of Digital Scholarship in the Humanities (DSH), ca. 2016.The archive contains several different types of files. First, it contains the annotation guides that were used to train the annotators. The guides are numbered to match the team numbers in Table 6. Included here are not only detailed guides for some layers, as produced by the original developers of the specification, but also our synopsis guides for each layer, which were used as a reference and further training material for the annotators. Also of interest are the general annotator and adjudicator training guides, which outline the general procedures followed by the teams when conducting annotation. Those who are organizing their own annotation projects may find this material useful.Second, the archive contains a comprehensive manifest, in Excel spreadsheet format, listing the word counts, sources, types, and titles (in both Russian and English) of all the texts that are part of the corpus. Finally, the archive contains the actual corpus data files, in Story Workbench format, an XML-encoded stand-off annotation scheme. The scheme is described in the file format specification file, also included in the archive. These files can be parsed with the aid of any normal XML reading software, or can be loaded and edited easily with the Story Workbench annotation tool, also freely available.