Show simple item record

dc.contributor.advisorRus, Daniela L.
dc.contributor.authorWilliams, Christien Spencer
dc.date.accessioned2022-06-15T13:08:49Z
dc.date.available2022-06-15T13:08:49Z
dc.date.issued2022-02
dc.date.submitted2022-02-22T18:32:30.878Z
dc.identifier.urihttps://hdl.handle.net/1721.1/143272
dc.description.abstractSatellite imagery data analysis has made great strides, however still endures inertia due to difficulty generating robust, labeled datasets for complex learners. Variation in data and diverse tasks make it difficult to both generally crowd source to build such datasets, and to offload this responsibility to the small number of expert annotators that exist. Currently, no general machine learning methods can automatically generate data labels in all regimes. A chief data labeling concern for remote sensing projects is cloud mask dataset creation. Using optical satellite images requires detecting accurately all clouds in any image. For many applications, automatic cloud detection methods are not accurate enough. This thesis reformulates the problem away from finding a single automatic algorithm to conduct annotation. We amplify an expert annotator’s efforts with an algorithm that learns from his annotations to more efficiently annotate datasets, and an active learning loop that force multiplies this labeling effort. This thesis first contributes a fast, machine learning based annotation system and demonstrates on Sentinel-2 images its efficacy to reach, in four clicks or less, more than 95% accuracy. To obtain these statistics, we constructed an eclectic database of partially cloudy images and its ground truth, and evaluated its accuracy to be greater than 98%. We then show that our fast, supervised annotation is far more accurate than recent sophisticated cloud detectors. Next, we develop an active learning system that employs uncertainty sampling for query selection and uses a modified Efficient Neural Network (ENet) model as its backbone. We evaluate this active learning system by comparing different scoring functions for the uncertainty metric that powers query selection. We show that using this uncertainty measurement, the active learning system performs better using fewer data points. Ultimately, with a minimal number of clicks/annotations, the annotator can build a robust, large, labeled dataset.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleFast Supervised Annotation and Active Learning with Uncertainty for Cloud Mask Dataset Generation
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record