Notice

This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/136684.2

Show simple item record

dc.contributor.authorJin, Di
dc.contributor.authorPan, Eileen
dc.contributor.authorOufattole, Nassim
dc.contributor.authorWeng, Wei-Hung
dc.contributor.authorFang, Hanyi
dc.contributor.authorSzolovits, Peter
dc.date.accessioned2021-10-28T12:49:47Z
dc.date.available2021-10-28T12:49:47Z
dc.date.issued2021-07-12
dc.identifier.urihttps://hdl.handle.net/1721.1/136684
dc.description.abstractOpen domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, <span style="font-variant: small-caps;">MedQA</span>, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively. We implement both rule-based and popular neural methods by sequentially combining a document retriever and a machine comprehension model. Through experiments, we find that even the current best method can only achieve 36.7%, 42.0%, and 70.1% of test accuracy on the English, traditional Chinese, and simplified Chinese questions, respectively. We expect <span style="font-variant: small-caps;">MedQA</span> to present great challenges to existing OpenQA systems and hope that it can serve as a platform to promote much stronger OpenQA models from the NLP community in the future.en_US
dc.publisherMultidisciplinary Digital Publishing Instituteen_US
dc.relation.isversionofhttp://dx.doi.org/10.3390/app11146421en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceMultidisciplinary Digital Publishing Instituteen_US
dc.titleWhat Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Examsen_US
dc.typeArticleen_US
dc.identifier.citationApplied Sciences 11 (14): 6421 (2021)en_US
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2021-07-23T13:27:27Z
dspace.date.submission2021-07-23T13:27:27Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US
mit.metadata.statusAuthority Work and Publication Information Needed


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version