Notice
This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/136684.2
What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams
| dc.contributor.author | Jin, Di | |
| dc.contributor.author | Pan, Eileen | |
| dc.contributor.author | Oufattole, Nassim | |
| dc.contributor.author | Weng, Wei-Hung | |
| dc.contributor.author | Fang, Hanyi | |
| dc.contributor.author | Szolovits, Peter | |
| dc.date.accessioned | 2021-10-28T12:49:47Z | |
| dc.date.available | 2021-10-28T12:49:47Z | |
| dc.date.issued | 2021-07-12 | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/136684 | |
| dc.description.abstract | Open domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, <span style="font-variant: small-caps;">MedQA</span>, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively. We implement both rule-based and popular neural methods by sequentially combining a document retriever and a machine comprehension model. Through experiments, we find that even the current best method can only achieve 36.7%, 42.0%, and 70.1% of test accuracy on the English, traditional Chinese, and simplified Chinese questions, respectively. We expect <span style="font-variant: small-caps;">MedQA</span> to present great challenges to existing OpenQA systems and hope that it can serve as a platform to promote much stronger OpenQA models from the NLP community in the future. | en_US |
| dc.publisher | Multidisciplinary Digital Publishing Institute | en_US |
| dc.relation.isversionof | http://dx.doi.org/10.3390/app11146421 | en_US |
| dc.rights | Creative Commons Attribution | en_US |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en_US |
| dc.source | Multidisciplinary Digital Publishing Institute | en_US |
| dc.title | What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams | en_US |
| dc.type | Article | en_US |
| dc.identifier.citation | Applied Sciences 11 (14): 6421 (2021) | en_US |
| dc.identifier.mitlicense | PUBLISHER_CC | |
| dc.eprint.version | Final published version | en_US |
| dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
| eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
| dc.date.updated | 2021-07-23T13:27:27Z | |
| dspace.date.submission | 2021-07-23T13:27:27Z | |
| mit.license | PUBLISHER_CC | |
| mit.metadata.status | Authority Work and Publication Information Needed | en_US |
| mit.metadata.status | Authority Work and Publication Information Needed |
