Fast Supervised Annotation and Active Learning with Uncertainty for Cloud Mask Dataset Generation

Williams, Christien Spencer

dc.contributor.advisor	Rus, Daniela L.
dc.contributor.author	Williams, Christien Spencer
dc.date.accessioned	2022-06-15T13:08:49Z
dc.date.available	2022-06-15T13:08:49Z
dc.date.issued	2022-02
dc.date.submitted	2022-02-22T18:32:30.878Z
dc.identifier.uri	https://hdl.handle.net/1721.1/143272
dc.description.abstract	Satellite imagery data analysis has made great strides, however still endures inertia due to difficulty generating robust, labeled datasets for complex learners. Variation in data and diverse tasks make it difficult to both generally crowd source to build such datasets, and to offload this responsibility to the small number of expert annotators that exist. Currently, no general machine learning methods can automatically generate data labels in all regimes. A chief data labeling concern for remote sensing projects is cloud mask dataset creation. Using optical satellite images requires detecting accurately all clouds in any image. For many applications, automatic cloud detection methods are not accurate enough. This thesis reformulates the problem away from finding a single automatic algorithm to conduct annotation. We amplify an expert annotator’s efforts with an algorithm that learns from his annotations to more efficiently annotate datasets, and an active learning loop that force multiplies this labeling effort. This thesis first contributes a fast, machine learning based annotation system and demonstrates on Sentinel-2 images its efficacy to reach, in four clicks or less, more than 95% accuracy. To obtain these statistics, we constructed an eclectic database of partially cloudy images and its ground truth, and evaluated its accuracy to be greater than 98%. We then show that our fast, supervised annotation is far more accurate than recent sophisticated cloud detectors. Next, we develop an active learning system that employs uncertainty sampling for query selection and uses a modified Efficient Neural Network (ENet) model as its backbone. We evaluate this active learning system by comparing different scoring functions for the uncertainty metric that powers query selection. We show that using this uncertainty measurement, the active learning system performs better using fewer data points. Ultimately, with a minimal number of clicks/annotations, the annotator can build a robust, large, labeled dataset.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Fast Supervised Annotation and Active Learning with Uncertainty for Cloud Mask Dataset Generation
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: Williams-willch-meng-eecs-2022 ...
Size:: 14.62Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record