Measuring image difficulty under limited presentation time: towards building better test sets for object recognition

Lin, Xinyu

dc.contributor.advisor	Katz, Boris
dc.contributor.author	Lin, Xinyu
dc.date.accessioned	2022-08-29T16:28:48Z
dc.date.available	2022-08-29T16:28:48Z
dc.date.issued	2022-05
dc.date.submitted	2022-05-27T16:18:44.568Z
dc.identifier.uri	https://hdl.handle.net/1721.1/145037
dc.description.abstract	Datasets are crucial to computer vision and broader machine learning. In particular, with the advance of techniques that are less well-understood theoretically, raw performance on datasets such as ImageNet has been the main driver of developments and feedback in the state of the field. However, the source of data that datasets draw on today are highly biased; for example, object class is correlated with backgrounds and omit many phenomena. In addition, objects mostly appear in stereotypical rotations with little occlusion. The resulting datasets themselves are similarly biased. Thus, the performance on datasets is limited as a predictor of the performance users can expect on their own tasks. To approach this problem, datasets such as ObjectNet were built with images that more closely resemble real-world scenarios by controlling for object backgrounds, rotations, imaging viewpoints, etc. In this thesis, we further address this problem by proposing a novel difficulty metric that reflects the performance of humans on recognizing images. We derive this metric by conducting extensive psychophysics experiments to determine the minimal time humans need to recognize an image. This new metric can be used to construct datasets that controls for the difficulty of different scenes and views humans see on a daily basis. The models’ performance on these datasets will also better represent the performance of humans on their own tasks. However, collecting these labels can be costly, so we also propose machine proxies that can effectively estimate human difficulty for different images and datasets.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Measuring image difficulty under limited presentation time: towards building better test sets for object recognition
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: Lin-linx3-meng-eecs-2022-thesis.pdf
Size:: 5.539Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record