An exploratory analysis of large health cohort study using Bayesian networks

Shen, Delin

dc.contributor.advisor	Peter Szolovits.	en_US
dc.contributor.author	Shen, Delin	en_US
dc.contributor.other	Harvard University--MIT Division of Health Sciences and Technology.	en_US
dc.date.accessioned	2007-10-22T16:24:19Z
dc.date.available	2007-10-22T16:24:19Z
dc.date.copyright	2006	en_US
dc.date.issued	2006	en_US
dc.identifier.uri	http://dspace.mit.edu/handle/1721.1/34478	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/34478
dc.description	Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2006.	en_US
dc.description	Includes bibliographical references (p. 91-98).	en_US
dc.description.abstract	Large health cohort studies are among the most effective ways in studying the causes, treatments and outcomes of diseases by systematically collecting a wide range of data over long periods. The wealth of data in such studies may yield important results in addition to the already numerous findings, especially when subjected to newer analytical methods. Bayesian Networks (BN) provide a relatively new method of representing uncertain relationships among variables, using the tools of probability and graph theory, and have been widely used in analyzing dependencies and the interplay between variables. We used BN to perform an exploratory analysis on a rich collection of data from one large health cohort study, the Nurses' Health Study (NHS), with the focus on breast cancer. We explored the data from the NHS using BN to look for breast cancer risk factors, including a group of Single Nucleotide Polymorphisms (SNP). We found no association between the SNPs and breast cancer, but found a dependency between clomid and breast cancer. We evaluated clomid as a potential riskfactor after matching on age and number of children. Our results showed for clomid an increased risk of estrogen receptor positive breast cancer (odds ratio 1.52, 95% CI 1.11-2.09) and a decreased risk of estrogen receptor negative breast cancer (odds ratio 0.46, 95% CI 0.22-0.97).	en_US
dc.description.abstract	(cont.) We developed breast cancer risk models using BN. We trained models on 75% of the data, and evaluated them on the remaining. Because of the clinical importance of predicting risks for Estrogen Receptor positive and Progesterone Receptor positive breast cancer, we focused on this specific type of breast cancer to predict two-year, four-year, and six-year risks. The concordance statistics of the prediction results on test sets are 0.70 (95% CI: 0.67-0.74), 0.68 (95% CI: 0.64-0.72), and 0.66 (95% CI: 0.62-0.69) for two, four, and six year models, respectively. We also evaluated the calibration performance of the models, and applied a filter to the output to improve the linear relationship between predicted and observed risks using Agglomerative Information Bottleneck clustering without sacrificing much discrimination performance.	en_US
dc.description.statementofresponsibility	by Delin Shen.	en_US
dc.format.extent	98 p.	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/34478	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582
dc.subject	Harvard University--MIT Division of Health Sciences and Technology.	en_US
dc.title	An exploratory analysis of large health cohort study using Bayesian networks	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph.D.	en_US
dc.contributor.department	Harvard University--MIT Division of Health Sciences and Technology
dc.identifier.oclc	70784336	en_US

Files in this item

Name:: 70784336-MIT.pdf
Size:: 5.471Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record