Show simple item record

dc.contributor.advisorLucila Ohno-Machado.en_US
dc.contributor.authorVantzelfde, Nathan Hansen_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2006-07-13T15:18:53Z
dc.date.available2006-07-13T15:18:53Z
dc.date.copyright2005en_US
dc.date.issued2005en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/33370
dc.descriptionThesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.en_US
dc.descriptionIncludes bibliographical references (leaves 103-107).en_US
dc.description.abstractMalignant pleural mesothelioma is a rare and lethal form of cancer affecting the external lining of the lungs. Extrapleural pneumonectomy (EPP), which involves the removal of the affected lung, is one of the few treatments that has been shown to have some effectiveness in treatment of the disease [39], but this procedure carries with it a high risk of mortality and morbidity [8]. This paper is concerned with building models using gene expression levels to predict patient survival following EPP; these models could potentially be used to guide patient treatment. A study by Gordon et al built a predictor based on ratios of gene expression levels that was 88% accurate on the set of 29 independent test samples, in terms of classifying whether or not the patients survived shorter or longer than the median survival [15]. These results were recreated both on the original data set used by Gordon et al and on a newer data set which contained the same samples but was generated using newer software. The predictors were evaluated using N-fold cross validation. In addition, other methods of variable selection and machine learning were investigated to build different types of predictive models. These analyses used a random training set from the newer data set. These models were evaluated using N-fold cross validation and the best of each of the four main types of models -en_US
dc.description.abstract(cont.) decision trees, logistic regression, artificial neural networks, and support vector machines - were tested using a small set of samples excluded from the training set. Of these four models, the neural network with eight hidden neurons and weight decay regularization performed the best, achieving a zero cross validation error rate and, on the test set, 71% accuracy, an ROC area of .67 and a logrank p value of .219. The support vector machine model with linear kernel also had zero cross validation error and, on the test set, a 71% accuracy and an ROC area of .67 but had a higher logrank p value of .515. These both had a lower cross validation error than the ratio-based predictors of Gordon et al, which had an N-fold cross validation error rate of 35%; however, these results may not be comparable because the neural network and support vector machine used a different training set than the Gordon et al study. Regression analysis was also performed; the best neural network model was incorrect by an average of 4.6 months in the six test samples. The method of variable selection based on the signal-to-noise ratio of genes originally used by Golub et al proved more effective when used on the randomly generated training set than the method involving Student's t tests and fold change used by Gordon et al. Ultimately, however, these models will need to be evaluated using a large independent test.en_US
dc.description.statementofresponsibilityby Nathan Hans Vantzelfde.en_US
dc.format.extent107 leavesen_US
dc.format.extent6106147 bytes
dc.format.extent6110573 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypeapplication/pdf
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titlePrognostic models for mesothelioma : variable selection and machine learningen_US
dc.typeThesisen_US
dc.description.degreeM.Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc62521929en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record