Show simple item record

dc.contributor.authorAl-Hadhrami, Suheer
dc.contributor.authorMenai, Mohamed El Bachir
dc.contributor.authorAl-Ahmadi, Saad
dc.contributor.authorAlnafessah, Ahmad
dc.date.accessioned2024-02-13T20:53:47Z
dc.date.available2024-02-13T20:53:47Z
dc.date.issued2023-08-28
dc.identifier.issn2076-3417
dc.identifier.urihttps://hdl.handle.net/1721.1/153515
dc.description.abstractVisual question answering (VQA) is a task that generates or predicts an answer to a question in human language about visual images. VQA is an active field combining two AI branches: NLP and computer vision. VQA in the medical field is still at an early stage, and it needs vast efforts and exploration to reach practical usage. This paper proposes two models that are utilized in the latest vision and NLP transformers that outperform the SOTA and have not yet been utilized in medical VQA. The ELECTRA-base transformer is used for textual feature extraction, whereas SWIN is used for visual feature extraction. In the SOTA medical VQA, selecting the model is based on the model that achieves the highest validation accuracy or the last model in training. The first proposed model, the best-value-based model, is selected based on the highest validation accuracy. The second model, the greedy-soup-based model, uses a greedy soup technique based on the fusion of multiple fine-tuned models to set the model parameters. The greedy soup selects the model parameters by fusing the model parameters that have significant performance on the validation accuracy in training. The greedy-soup-based model outperforms the best-value-based model, and both proposed models outperform the SOTA, which has an accuracy of 83.49%. The greedy-soup-based model is optimized with batch size and learning rate. During the optimization, seven extra models exceed the SOTA accuracy. The best model trained with a learning rate of 1.0×10−4 and batch size 16 achieves an accuracy of 87.41%.en_US
dc.language.isoen_US
dc.publisherMDPIen_US
dc.relation.isversionof10.3390/app13179735en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceMDPIen_US
dc.subjectFluid Flow and Transfer Processesen_US
dc.subjectComputer Science Applicationsen_US
dc.subjectProcess Chemistry and Technologyen_US
dc.subjectGeneral Engineeringen_US
dc.subjectInstrumentationen_US
dc.subjectGeneral Materials Scienceen_US
dc.titleAn Effective Med-VQA Method Using a Transformer with Weights Fusion of Multiple Fine-Tuned Modelsen_US
dc.typeArticleen_US
dc.identifier.citationAl-Hadhrami, S.; Menai, M.E.B.; Al-Ahmadi, S.; Alnafessah, A. An Effective Med-VQA Method Using a Transformer with Weights Fusion of Multiple Fine-Tuned Models. Appl. Sci. 2023, 13.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Sociotechnical Systems Research Center
dc.relation.journalApplied Sciencesen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.date.submission2024-02-13T20:50:02Z
mit.journal.volume13en_US
mit.journal.issue17en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record