| dc.contributor.author | Cai, Bill | |
| dc.date.accessioned | 2024-08-01T18:42:32Z | |
| dc.date.available | 2024-08-01T18:42:32Z | |
| dc.date.issued | 2024-05-13 | |
| dc.identifier.isbn | 979-8-4007-0172-6 | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/155843 | |
| dc.description | WWW ’24 Companion, May 13–17, 2024, Singapore, Singapore | en_US |
| dc.description.abstract | The Online Safety Prize Challenge (OSPC) presented several challenges: (1) the lack of a training or sample dataset, and limited interactions with the submission portal, (2) limitations in hardware, software package size and processing time. In this report, we present our method that was consistently able to achieve AUROC score of above 0.74 (within top 3 of submissions). The following factors improved AUROC score significantly: (1) use of multilingual optical character recognition (OCR) models (+0.024), (2) exact logit scores instead of sampled decoding (+0.040), (3) fine-tuning of pretrained models on synthetically generated datasets (+0.076 to +0.106). We outline key implementation details in this report including the use of model quantization, robust integration testing including GPU memory leak checks and inference time restrictions. | en_US |
| dc.publisher | ACM|Companion Proceedings of the ACM Web Conference 2024 | en_US |
| dc.relation.isversionof | 10.1145/3589335.3665997 | en_US |
| dc.rights | Creative Commons Attribution | en_US |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en_US |
| dc.source | Association for Computing Machinery | en_US |
| dc.title | OSPC: Multimodal Harmful Content Detection using Fine-tuned Language Models | en_US |
| dc.type | Article | en_US |
| dc.identifier.citation | Cai, Bill. 2024. "OSPC: Multimodal Harmful Content Detection using Fine-tuned Language Models." | |
| dc.contributor.department | Massachusetts Institute of Technology. Computation for Design and Optimization Program | |
| dc.identifier.mitlicense | PUBLISHER_CC | |
| dc.eprint.version | Final published version | en_US |
| dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
| eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
| dc.date.updated | 2024-08-01T07:45:11Z | |
| dc.language.rfc3066 | en | |
| dc.rights.holder | The author(s) | |
| dspace.date.submission | 2024-08-01T07:45:12Z | |
| mit.license | PUBLISHER_CC | |
| mit.metadata.status | Authority Work and Publication Information Needed | en_US |