Show simple item record

dc.contributor.advisorPeter Szolovits.en_US
dc.contributor.authorShen, Delinen_US
dc.contributor.otherHarvard University--MIT Division of Health Sciences and Technology.en_US
dc.date.accessioned2007-10-22T16:24:19Z
dc.date.available2007-10-22T16:24:19Z
dc.date.copyright2006en_US
dc.date.issued2006en_US
dc.identifier.urihttp://dspace.mit.edu/handle/1721.1/34478en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/34478
dc.descriptionThesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2006.en_US
dc.descriptionIncludes bibliographical references (p. 91-98).en_US
dc.description.abstractLarge health cohort studies are among the most effective ways in studying the causes, treatments and outcomes of diseases by systematically collecting a wide range of data over long periods. The wealth of data in such studies may yield important results in addition to the already numerous findings, especially when subjected to newer analytical methods. Bayesian Networks (BN) provide a relatively new method of representing uncertain relationships among variables, using the tools of probability and graph theory, and have been widely used in analyzing dependencies and the interplay between variables. We used BN to perform an exploratory analysis on a rich collection of data from one large health cohort study, the Nurses' Health Study (NHS), with the focus on breast cancer. We explored the data from the NHS using BN to look for breast cancer risk factors, including a group of Single Nucleotide Polymorphisms (SNP). We found no association between the SNPs and breast cancer, but found a dependency between clomid and breast cancer. We evaluated clomid as a potential riskfactor after matching on age and number of children. Our results showed for clomid an increased risk of estrogen receptor positive breast cancer (odds ratio 1.52, 95% CI 1.11-2.09) and a decreased risk of estrogen receptor negative breast cancer (odds ratio 0.46, 95% CI 0.22-0.97).en_US
dc.description.abstract(cont.) We developed breast cancer risk models using BN. We trained models on 75% of the data, and evaluated them on the remaining. Because of the clinical importance of predicting risks for Estrogen Receptor positive and Progesterone Receptor positive breast cancer, we focused on this specific type of breast cancer to predict two-year, four-year, and six-year risks. The concordance statistics of the prediction results on test sets are 0.70 (95% CI: 0.67-0.74), 0.68 (95% CI: 0.64-0.72), and 0.66 (95% CI: 0.62-0.69) for two, four, and six year models, respectively. We also evaluated the calibration performance of the models, and applied a filter to the output to improve the linear relationship between predicted and observed risks using Agglomerative Information Bottleneck clustering without sacrificing much discrimination performance.en_US
dc.description.statementofresponsibilityby Delin Shen.en_US
dc.format.extent98 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/34478en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582
dc.subjectHarvard University--MIT Division of Health Sciences and Technology.en_US
dc.titleAn exploratory analysis of large health cohort study using Bayesian networksen_US
dc.typeThesisen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentHarvard University--MIT Division of Health Sciences and Technology
dc.identifier.oclc70784336en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record