Show simple item record

dc.contributor.advisorSzolovits, Peter
dc.contributor.authorLehman, Eric(Computer scientist)
dc.date.accessioned2022-08-29T15:59:38Z
dc.date.available2022-08-29T15:59:38Z
dc.date.issued2022-05
dc.date.submitted2022-06-21T19:25:42.893Z
dc.identifier.urihttps://hdl.handle.net/1721.1/144613
dc.description.abstractExisting question answering (QA) datasets derived from electronic health records (EHR) are artificially generated and, as a result, fail to capture realistic physician information needs. We present Discharge Summary Clinical Questions (DiSCQ), a newly curated question dataset composed of 2,000+ questions paired with the snippets of text (triggers) that prompted each question. The questions are generated by medical experts from 100+ MIMIC-III discharge summaries. We analyze this dataset to characterize the types of information sought by medical experts. We also train baseline models for trigger detection and question generation (QG), paired with unsupervised answer retrieval over EHRs. Our baseline model is able to generate high quality questions in over 62% of cases when prompted with human selected triggers. We will release this dataset (and all code to reproduce baseline model results) to facilitate further research into realistic clinical QA and QG.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleLearning to Ask Like a Physician
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcid0000-0001-9919-2257
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record