Show simple item record

dc.contributor.authorCai, Bill
dc.date.accessioned2024-08-01T18:42:32Z
dc.date.available2024-08-01T18:42:32Z
dc.date.issued2024-05-13
dc.identifier.isbn979-8-4007-0172-6
dc.identifier.urihttps://hdl.handle.net/1721.1/155843
dc.descriptionWWW ’24 Companion, May 13–17, 2024, Singapore, Singaporeen_US
dc.description.abstractThe Online Safety Prize Challenge (OSPC) presented several challenges: (1) the lack of a training or sample dataset, and limited interactions with the submission portal, (2) limitations in hardware, software package size and processing time. In this report, we present our method that was consistently able to achieve AUROC score of above 0.74 (within top 3 of submissions). The following factors improved AUROC score significantly: (1) use of multilingual optical character recognition (OCR) models (+0.024), (2) exact logit scores instead of sampled decoding (+0.040), (3) fine-tuning of pretrained models on synthetically generated datasets (+0.076 to +0.106). We outline key implementation details in this report including the use of model quantization, robust integration testing including GPU memory leak checks and inference time restrictions.en_US
dc.publisherACM|Companion Proceedings of the ACM Web Conference 2024en_US
dc.relation.isversionof10.1145/3589335.3665997en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleOSPC: Multimodal Harmful Content Detection using Fine-tuned Language Modelsen_US
dc.typeArticleen_US
dc.identifier.citationCai, Bill. 2024. "OSPC: Multimodal Harmful Content Detection using Fine-tuned Language Models."
dc.contributor.departmentMassachusetts Institute of Technology. Computation for Design and Optimization Program
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2024-08-01T07:45:11Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2024-08-01T07:45:12Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record