Automation of Drosophila gene expression pattern image annotation : development of web-based image annotation tool and application of machine learning methods
Author(s)
Ayuso, Anna Maria E
DownloadFull printable version (25.06Mb)
Alternative title
LabelLife : a web-based image annotation tool for gene expression pattern images
Development of web-based image annotation tool and application of machine learning methods
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Manolis Kellis.
Terms of use
Metadata
Show full item recordAbstract
Large-scale in situ hybridization screens are providing an abundance of spatio-temporal patterns of gene expression data that is valuable for understanding the mechanisms of gene regulation. Drosophila gene expression pattern images have been generated by the Berkeley Drosophila Genome Project (BDGP) for over 7,000 genes in over 90,000 digital images. These images are currently hand curated by field experts with developmental and anatomical terms based on the stained regions. These annotations enable the integration of spatial expression patterns with other genomic data sets that link regulators with their downstream targets. However, the manual curation has become a bottleneck in the process of analyzing the rapidly generated data therefore it is necessary to explore computational methods for the curation of gene expression pattern images. This thesis addresses improving the manual annotation process with a web-based image annotation tool and also enabling automation of the process using machine learning methods. First, a tool called LabelLife was developed to provide a systematic and flexible way of annotating images, groups of images, and shapes within images using terms from a controlled vocabulary. Second, machine learning methods for automatically predicting vocabulary terms for a given image based on image feature data were explored and implemented. The results of the applied machine learning methods are promising in terms of predictive ability, which has the potential to simplify and expedite the curation process hence increasing the rate that biologically significant data can be evaluated and new insights can be gained.
Description
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011. Cataloged from PDF version of thesis. Includes bibliographical references (p. 91-92).
Date issued
2011Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.