Overview of FLARES at IberLEF 2024: Fine-grained Language-based Reliability Detection in Spanish News

Sepúlveda-Torres, Robiert; Bonet-Jover, Alba; Diab, Isam; Guillén-Pacho, Ibai; Cabrera-de Castro, Isabel; Badenes-Olmedo, Carlos; Saquete, Estela; Martín-Valdivia, M. Teresa; Martínez-Barco, Patricio; Ureña-López, L. Alfonso

Overview of FLARES at IberLEF 2024Fine-grained Language-based Reliability Detection in Spanish News

Sepúlveda-Torres, Robiert
Bonet-Jover, Alba
Diab, Isam
Guillén-Pacho, Ibai
Cabrera-de Castro, Isabel
Badenes-Olmedo, Carlos
Saquete, Estela
Martín-Valdivia, M. Teresa
Martínez-Barco, Patricio
Ureña-López, L. Alfonso

Revue:

Procesamiento del lenguaje natural

ISSN: 1135-5948

Année de publication: 2024

Número: 73

Pages: 369-379

Type: Article

DIALNET GOOGLE SCHOLAR Accès ouvert editor

D'autres publications dans: Procesamiento del lenguaje natural

Résumé

This paper presents FLARES, a shared task organised in the framework of the evaluation campaign of Natural Language Processing systems in Spanish and other Iberian languages, IberLEF 2024. FLARES aims to detect patterns of reliability in the language used in news that will allow the development of effective techniques for the future detection of misleading information. To this end, the 5W1H journalistic technique for detecting the relevant content of a news item is proposed as a basis, as well as an annotation guideline designed to detect linguistic reliability. Two subtasks are proposed: the first focusing on the identification of the 5W1H elements and the second focusing on the detection of reliability. A total of 7 participants registered in the shared task, of which 3 participated in the first subtask and 4 in the second. The teams proposed various approaches, especially based on fine-tuning of encoding models and adjustment of instructions in decoding models.

Références bibliographiques

Abas, A. R., I. El-Henawy, H. Mohamed, and A. Abdellatif. 2020. Deep learning model for fine-grained aspect-based opinion mining. IEEE Access, 8:128845–128855.
AI@Meta. 2024. Llama 3 model card. Bani-Hani, A., O. Adedugbe, E. Benkhelifa, M. Majdalawieh, and F. Al-Obeidat. 2020. A semantic model for context-based fake news detection on social media. In 2020 IEEE/ACS 17th International Conference on Computer Systems and Applications (AICCSA), pages 1–7. IEEE.
Bonet-Jover, A., R. Sepúlveda-Torres, E. Saquete, P. Martínez-Barco, and M. Nieto-Pérez. 2024. Run-as: A novel approach to annotate news reliability for disinformation detection. Language Resources and Evaluation, 58(2):609–639.
Cañete, J., G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, and J. Pérez. 2023. Spanish pretrained bert model and evaluation data. arXiv preprint arXiv:2308.02976.
Chakma, K. and A. Das. 2018. A 5w1h based annotation scheme for semantic role labelling of english tweets. Computación y Sistemas, 22(3):747–755.
Chakma, K., S. D. Swamy, A. Das, and S. Debbarma. 2020. 5w1h-based semantic segmentation of tweets for event detection using bert. In International Conference on Machine Learning, Image Processing, Network Security and Data Sciences, pages 57–72. Springer.
Chung, H. W., L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, E. Li, X. Wang, M. Dehghani, S. Brahma, A. Webson, S. S. Gu, Z. Dai, M. Suzgun, X. Chen, A. Chowdhery, D. Valter, S. Narang, G. Mishra, A. W. Yu, V. Zhao, Y. Huang, A. M. Dai, H. Yu, S. Petrov, E. H. hsin Chi, J. Dean, J. Devlin, A. Roberts, D. Zhou, Q. V. Le, and J. Wei. 2022. Scaling instruction-finetuned language models. ArXiv, abs/2210.11416.
Grande, E. and A. Begga. 2024. Syntax Savants-UA at IberLEF 2024: Leveraging FLAN-T5-XXL for Automatic 5W1H Identification in Texts. In In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEURWS.org.
Grieve, J. and H. Woodfield. 2023. The language of fake news. Cambridge University Press.
Gutiérrez-Fandiño, A., J. Armengol-Estapé, M. Pàmies, J. Llop-Palao, J. Silveira-Ocampo, C. P. Carrino, A. Gonzalez-Agirre, C. Armentano-Oller, C. Rodriguez-Penagos, and M. Villegas. 2021. Maria: Spanish language models. arXiv preprint arXiv:2107.07253.
Horne, B. and S. Adali. 2017. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the international AAAI conference on web and social media, volume 11, pages 759–766.
Ibrahim, M. 2024. Fine-Grained Language-based Reliability Detection in Spanish New with Fine-Tuned Llama-3 Model. In In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEURWS.org.
Keith, B., M. Horning, and T. Mitra. 2020. Evaluating the inverted pyramid structure through automatic 5w1h extraction and summarization. Computational Journalism C+ J.
Khodra, M. L. 2015. Event extraction on Indonesian news article using multiclass categorization. In 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), pages 1–5. IEEE.
Lugea, J. 2021. Linguistic approaches to fake news detection. Data science for fake news: Surveys and perspectives, pages 287–302.
Mangrulkar, S., S. Gugger, L. Debut, Y. Belkada, S. Paul, and B. Bossan. 2022. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft.
Pan, R., J. A. García-Díaz, F. García-Sánchez, and R. Valencia-García. 2024. UMUTeam at FLARES@IberLEF 2024: Enhancing Disinformation Detection with 5W1H Techniques and Transformer Models. In In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEURWS.org.
Pardo, J., J. Liu, V. Ramón-Ferrer, E. Amador-Domínguez, and P. Calleja. 2024. K-Flares: A K-Adapter Based Approach for the FLARES Challenge. In In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEURWS.org.
Piad-Morffis, A., Y. Gutiérrez, Y. Almeida-Cruz, and R. Muñoz. 2020. A computational ecosystem to support ehealth knowledge discovery technologies in spanish. Journal of Biomedical Informatics, 109:103517.
Saquete, E., D. Tomás, P. Moreda, P. Martínez-Barco, and M. Palomar. 2020. Fighting post-truth using natural language processing: A review and open challenges. Expert systems with applications, 141:112943.
Seddari, N., A. Derhab, M. Belaoued, W. Halboob, J. Al-Muhtadi, and A. Bouras. 2022. A hybrid linguistic and knowledge-based analysis approach for fake news detection on social media. IEEE Access, 10:62097–62109.
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
Vlachos, A. and S. Riedel. 2014. Fact checking: Task definition and dataset construction. In Proceedings of the ACL 2014 workshop on language technologies and computational social science, pages 18–22.
Vosoughi, S., D. Roy, and S. Aral. 2018. The spread of true and false news online. science, 359(6380):1146–1151.
Wang, R., D. Tang, N. Duan, Z. Wei, X. Huang, G. Cao, D. Jiang, M. Zhou, et al. 2020. K-adapter: Infusing knowledge into pretrained models with adapters. arXiv preprint arXiv:2002.01808.
Zhang, H., X. Chen, and S. Ma. 2019. Dynamic news recommendation with hierarchical attention network. In 2019 IEEE International Conference on Data Mining (ICDM), pages 1456–1461. IEEE.
Zhao, S., F. You, and Z. Y. Liu. 2020. Leveraging pre-trained language model for summary generation on short text. IEEE Access, 8:228798–228803.
Zhou, L. and D. Zhang. 2008. Following linguistic footprints: Automatic deception detection in online communication. Communications of the ACM, 51(9):119–122.
Zhou, X. and R. Zafarani. 2020. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR), 53(5):1–40.

La source de données: Dialnet