ChatGPT 4.0
challenges in multimodal text interpretation
DOI:
https://doi.org/10.46230/lef.v16i2.13157Keywords:
ChatGPT, cartoon interpretation, human-AI comparisonAbstract
This study investigates the capability of the AI model ChatGPT 4.0 in interpreting cartoons, using human benchmarks as a reference. Cartoons were selected for their integration of verbal and non-verbal elements, allowing a detailed assessment of how ChatGPT handles contextual nuances, humor, and satire. The results show that although ChatGPT identifies main visual elements, it faces significant challenges in understanding broader contexts and interpreting complex humor and subtexts. The study reveals that ChatGPT's interpretations tend to be superficial and less detailed compared to human interpretations, particularly in aspects such as artistic style, visual techniques, and cultural contexts. Additionally, ChatGPT shows difficulties in capturing the depth and critical intent of satirical elements, resulting in interpretations that do not fully reflect the implicit messages in cartoons. The findings of this study contribute to the understanding of the current capabilities and limitations of AI models in interpreting complex discourses, offering valuable insights for the advancement of cognitive linguistics and natural language processing technologies.
Downloads
References
ALAWIDA, M.; MEJRI, S.; MEHMOOD, A.; CHIKHAOUI, B.; ABIODUN, O. I. A comprehensive study of ChatGPT: Advancements, limitations, and ethical considerations in natural language processing and cybersecurity. Information, v. 14, n. 8, p. 462, 2023. DOI: https://doi.org/10.3390/info14080462. Disponível em: https://www.mdpi.com/2078-2489/14/8/462 . Acesso em: 17 abr. 2024.
ANDRADE, A. C. De. A charge: análise do processo enunciativo-discursivo numa perspectiva dialógica. 2011. 329 f. Tese (Doutorado em Linguística) – Centro de Artes e Comunicação, Programa de Pós-graduação em Letras, Universidade Federal de Pernambuco, Recife, 2011. Disponível em:
https://repositorio.ufpe.br/handle/123456789/15037. Acesso em: 13 abr. 2024.
BARROT, J. S. ChatGPT as a Language Learning Tool: An Emerging Technology Report. Technology, Knowledge and Learning, California, v. 28, n. 4, p. 1-6, dec. 2023. DOI: https://doi.org/10.1007/s10758-023-09711-4. Disponível em: https://link.springer.com/article/10.1007/s10758-023-09711-4. Acesso em: 22 ago. 2024.
CAO, Y.; ZHOU, L.; LEE, S.; CABELLO, L.; CHEN, M.; HERSHCOVICH, D. Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. In: Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), Dubrovnik, Croatia. Association for Computational Linguistics, 2023. p. 53–67.
CAZO. Charge sobre o Dia da Árvore. Blog do AFTM. São Paulo, 22 set. 2023. Disponível em: https://anafisco.org.br/charge-dia-da-arvore/. Acesso em: 16 mar. 2024.
CONG-LEM, N.; SOYOOF, A.; TSERING, D. A systematic review of the limitations and associated opportunities of ChatGPT. International Journal of Human–Computer Interaction, 08 maio 2024, p. 718-738. DOI: 10.1080/10447318.2024.2344142. Acesso em: 29 maio 2024.
DUQUE, P. H. Discurso e cognição: uma abordagem baseada em frames. Revista da ANPOLL, v. 1, n. 39, p. 25-48, 2015. Disponível em: https://revistadaanpoll.emnuvens.com.br/revista/article/view/902. Acesso em: 29 maio 2024.
FARINA, M.; LAVAZZA, A. ChatGPT in society: emerging issues. Front. Artif. Intell, v. 6, p. 1-7, jun. 2023. Disponível em: https://www.frontiersin.org/articles/10.3389/frai.2023.1130913/full. Acesso em: 29 maio 2024.
FELDMAN, J. A. From molecule to metaphor. [S.L]: MIT Press, 2006.
GARCIA, G. I. Uma imagem, tantas possibilidades: os avanços e desafios no estudo das caricaturas. Revista em Perspectiva, v. 4, n. 1, p. 109-125, 2019. Disponível em: http://periodicos.ufc.br/emperspectiva/article/view/41573. Acesso em: 29 maio 2024.
GAWRYSZEWSKI, A. Conceito de caricatura: não tem graça nenhuma. Domínios da Imagem, v. I, n. 2, p. 7-26, 2008. Disponível em: https://www.academia.edu/43273460/Conceito_de_caricatura_n%C3%A3o_tem_gra%C3%A7a_nenhuma. Acesso em: 29 maio 2024.
GHOSH, A.; JAIN, S.; KAPOOR, A.; KUMAR, V.; AGARWAL, P. Exploring the frontier of vision-language models: A survey of current methodologies and future directions. Artificial Intelligence Review, v. 2, p. 1-16, abr. 2024. Disponível em: https://arxiv.org/pdf/2404.07214. Acesso em: 29 maio 2024.
HALLIDAY, M. A. K. Language as social semiotic. London: Edward Arnold, 1978.
HE, S.; CHEN, Y.; XIA, Y.; LI, Y.; LIANG, H-N.; YU, L. Visual harmony: Text-visual interplay in circular infographics. Journal of Visualization, v. 27, p. 255-271, 2024. Disponível em: https://arxiv.org/pdf/2402.05798. Acesso em: 29 maio 2024.
HESSEL, J.; MARASOVIC, A.; HWANG, J. D.; LEE, L.; DA, J.; ZELLERS, R.; MANKOFF, R.; CHOI, Y. Do Androids Laugh at Electric Sheep? Humor “Understanding” Benchmarks from The New Yorker Caption Contest. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, jul, p. 688-714, 2023.
HODGE, R.; KRESS, G. Social semiotics. London: Polity Press, 1988.
HUA, S. Y.; JIN, S. C.; JIANG, S. Y. The Limitations and Ethical Considerations of ChatGPT. Data Intelligence, v. 6, n. 1, p. 201–239, 2024. DOI: 10.1162/dint_a_00243. Disponível em: https://www.researchgate.net/publication/376740720_The_Limitations_and_Ethical_Considerations_of_ChatGPT. Acesso em: 29 maio 2024.
JOHNSON, M. The body in the mind: The bodily basis of meaning, imagination, and Reason. Chicago: University of Chicago Press, 1987.
KENNEY, N. M. A Brief Analysis of the Architecture, Limitations, and Impacts of ChatGPT. Georgia: Georgia Institute of Technology, 2023. DOI: https://zenodo.org/doi/10.5281/zenodo.7762244. Disponível em: https://zenodo.org/records/7762245. Acesso em: 12 abril 2024.
KRESS, G. R. Multimodality: A Social Semiotic Approach to Contemporary Communication. London e New York: Routledge, 2010.
KRESS, G.; VAN LEEUWEN, T. Multimodal discourse: the modes and media of contemporary communication. London: Hodder Arnold, 2001.
LAKOFF, G. Women, fire, and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press, 1987.
MANDLER, J. M.; CÁNOVAS, C. P. On defining image schemas. Language and Cognition, v. 6, n. 4, p. 510–532, 2014. Disponível em: https://www.researchgate.net/publication/269931714_On_defining_image_schemas. Acesso em: 29 maio 2024.
NADELLA, G. Visual ChatGPT: A comprehensive guide to multimodal AI. Analytics Vidhya, 13 de março de 2024. Disponível em: https://www.analyticsvidhya.com/blog/2023/03/power-of-visual-chatgpt-conversations-with-ai-and-images/. Acesso em: 15 março 2024.
SANDLER, M.; CHOUNG, H.; ROSS, A.; DAVID, P. A Linguistic Comparison between Human and ChatGPT-Generated Conversations. ArXiv, v. 3, p. 1 – 15, abr, 2024. Disponível em: https://arxiv.org/pdf/2401.16587. Acesso em: 29 maio 2024.
SCHANK, R. C.; ABELSON, R. P. Scripts, plans, goals, and understanding: An inquiry into human knowledge structures. Hillsdale, NJ: Lawrence Erlbaum Associates, 1977.
SCHMOCK. Charge sobre as viagens de Lula e Janja. Revista Oeste. São Paulo, 23 jun. 2023. Disponível em: https://revistaoeste.com/politica/charge-da-semana-46/. Acesso em: 16 mar. 2024.
SOUZA, I. C. de O. A charge como fonte e representação da informação no desenvolvimento político brasileiro. 2018. 194 f. Tese (Doutorado) – Instituto de Ciência da Informação, Universidade Federal da Bahia, Salvador, 2018. Disponível em: https://repositorio.ufba.br/handle/ri/27843. Acesso em: 23 ago. 2024.
SPENNEMANN, D. H. R. ChatGPT and the generation of digitally born “knowledge”: How does a generative AI language model interpret cultural heritage values? Knowledge, v. 3, n. 3, p. 480-512, 2023. Disponível em: https://doi.org/10.3390/knowledge3030032. Acesso em: 29 maio 2024.
VASWANI, A.; SHAZEER, N.; PARMAR, N.; USZKOREIT, J.; JONES, L.; GOMEZ, A. N.; KAISER, L.; POLOSUKHIN, I. Attention is All You Need. In: Advances in Neural Information Processing Systems, 2017. Disponível em: https://arxiv.org/pdf/1706.03762. Acesso em: 11 mar. 2024.
VEREZA, S. Entrelaçando frames: a construção do sentido metafórico na linguagem em uso. Cadernos de Estudos Linguísticos, n. 1, v. 55, p. 109-125, 2013. DOI: https://doi.org/10.20396/cel.v55i1.8636598. Disponível em: https://periodicos.sbu.unicamp.br/ojs/index.php/cel/article/view/8636598. Acesso em: 22 ago. 2024.
ZENG, Y.; ZHANG, H.; ZHENG, J.; XIA, J.; WEI, G.; WEI, Y.; ZHANG, Y.; KONG, T. What Matters in Training a GPT4-Style Language Model with Multimodal Inputs? ArXiv, 2023. Disponível em: https://arxiv.org/pdf/2307.02469. Acesso em: 29 maio 2024.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Paulo Henrique Duque
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish in Linguagem em Foco Scientific Journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication. The articles are simultaneously licensed under the Creative Commons Attribution License which allows sharing the work with an acknowledgement of its authorship and initial publication in this journal.
- The concepts issued in signed articles are the absolute and exclusive responsibility of their authors. Therefore, we request a Statement of Copyright, which must be submitted with the manuscript as a Supplementary Document.
- Authors are authorized to make the version of the text published in Linguagem em Foco Scientific Journal available in institutional repositories or other academic work distribution platforms (ex. ResearchGate, Academia.edu).