Automatic counter-narrative generation for hate speech in Spanish

  1. Montejo Ráez, Arturo
  2. Martín Valdivia, María Teresa
  3. Vallecillo Rodríguez, M. Estrella
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2023

Número: 71

Páginas: 227-245

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Este trabajo analiza el uso de modelos lingüísticos para generar automáticamente contranarrativas al discurso del odio en español. A pesar de la existencia de algunos estudios en inglés y otros idiomas, ningún trabajo previo ha explorado este tema centrado en el español. El artículo muestra que el uso de GPT-3 supera a otros modelos en la generación de contranarrativas no ofensivas e informativas incluyendo en ocasiones argumentos convincentes. Hemos utilizado diferentes algoritmos de few-shot learning aplicando varias estrategias de prompting y analizando los resultados para cada una de ellas. Además, se ha puesto a disposición de la comunidad investigadora un nuevo corpus llamado CONAN-SP, que consta de 238 pares de discursos de odio y contranarrativas en español, para facilitar nuevas investigaciones en este ámbito. Estos resultados ponen de relieve el potencial de los modelos del lenguaje para combatir el discurso de odio en español mediante la generación de contranarrativas.

Referencias bibliográficas

  • Ashida, M. and M. Komachi. 2022. Towards automatic generation of messages countering online hate speech and microaggressions. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 11–23.
  • Bang, Y., S. Cahyawijaya, N. Lee, W. Dai, D. Su, B. Wilie, H. Lovenia, Z. Ji, T. Yu, W. Chung, Q. V. Do, Y. Xu, and P. Fung. 2023. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity.
  • Benesch, S. 2014. Countering dangerous speech: New ideas for genocide prevention. Available at SSRN 3686876.
  • Brown, T., B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  • Cawsey, A. J., R. B. Jones, and J. Pearson. 2000. The evaluation of a personalised health information system for patients with cancer. User Modeling and User-Adapted Interaction, 10:47–72.
  • Chung, H. W., L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, E. Li, X. Wang, M. Dehghani, S. Brahma, et al. 2022. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  • Chung, Y.-L., E. Kuzmenko, S. S. Tekiroglu, and M. Guerini. 2019. Conan–counter narratives through nichesourcing: a multilingual dataset of responses to fight online hate speech. arXiv preprint arXiv:1910.03270.
  • Chung, Y.-L., S. S. Tekiroglu, and M. Guerini. 2021. Towards knowledgegrounded counter narrative generation for hate speech. arXiv preprint arXiv:2106.11783.
  • Djuric, N., J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, and N. Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th international conference on world wide web, pages 29–30.
  • Doddington, G. 2002. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the second international conference on Human Language Technology Research, pages 138–145.
  • Fandiño, A. G., J. A. Estape, M. Pamies, J. L. Palao, J. S. Ocampo, C. P. Carrino, C. A. Oller, C. R. Penagos, A. G. Agirre, and M. Villegas. 2022. Maria: Spanish language models. Procesamiento del Lenguaje Natural, 68.
  • Fanton, M., H. Bonaldi, S. S. Tekiroglu, and M. Guerini. 2021. Human-in-the-loop for data collection: a multi-target counter narrative dataset to fight online hate speech. arXiv preprint arXiv:2107.08720.
  • Fortuna, P., M. Domınguez, L. Wanner, and Z. Talat. 2022. Directions for nlp practices applied to online hate speech detection. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11794–11805.
  • Fortuna, P. and S. Nunes. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR), 51(4):1–30.
  • Frenda, S., B. Ghanem, M. Montes-y Gomez, and P. Rosso. 2019. Online hate speech against women: Automatic identification of misogyny and sexism on twitter. Journal of Intelligent & Fuzzy Systems, 36(5):4743–4752.
  • Gu, Y., X. Han, Z. Liu, and M. Huang. 2022. PPT: Pre-trained prompt tuning for few-shot learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8410–8423, Dublin, Ireland, May. Association for Computational Linguistics.
  • Hangartner, D., G. Gennaro, S. Alasiri, N. Bahrich, A. Bornhoft, J. Boucher, B. B. Demirci, L. Derksen, A. Hall, M. Jochum, et al. 2021. Empathy-based counterspeech can reduce racist hate speech in a social media field experiment. Proceedings of the National Academy of Sciences, 118(50):e2116310118.
  • Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
  • Mathew, B., R. Dutt, P. Goyal, and A. Mukherjee. 2019. Spread of hate speech in online social media. In Proceedings of the 10th ACM conference on web science, pages 173–182.
  • Mathew, B., N. Kumar, P. Goyal, A. Mukherjee, et al. 2018. Analyzing the hate and counter speech accounts on twitter. arXiv preprint arXiv:1812.02712. Mathew, B., P. Saha, H. Tharad, S. Rajgaria, P. Singhania, S. K. Maity, P. Goyal, and A. Mukherjee. 2019. Thou shalt not hate: Countering online hate speech. In Proceedings of the international AAAI conference on web and social media, volume 13, pages 369–380.
  • OpenAI. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  • Papineni, K., S. Roukos, T. Ward, and W.- J. Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  • Plaza-Del-Arco, F.-M., M. D. Molina- Gonzalez, L. A. Ureña-Lopez, and M. T. Martın-Valdivia. 2020. Detecting misogyny and xenophobia in spanish tweets using language technologies. ACM Transactions on Internet Technology (TOIT), 20(2):1–19.
  • Plaza-del Arco, F. M., M. D. Molina- Gonzalez, L. A. Urena-Lopez, and M. T. Martın-Valdivia. 2021. Comparing pretrained language models for spanish hate speech detection. Expert Systems with Applications, 166:114120.
  • Qian, J., A. Bethke, Y. Liu, E. Belding, and W. Y. Wang. 2019. A benchmark dataset for learning to intervene in online hate speech. arXiv preprint arXiv:1909.04251.
  • Radford, A., J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  • Richards, R. D. and C. Calvert. 2000. Counterspeech 2000: A new look at the old remedy for bad speech. BYU L. Rev., page 553.
  • Scao, T. L., A. Fan, C. Akiki, E. Pavlick, S. Ilic, D. Hesslow, R. Castagne, A. S. Luccioni, F. Yvon, M. Galle, et al. 2022. Bloom: A 176b-parameter openaccess multilingual language model. arXiv preprint arXiv:2211.05100.
  • Tekiroglu, S. S., Y.-L. Chung, and M. Guerini. 2020. Generating counter narratives against online hate speech: Data and strategies. arXiv preprint arXiv:2004.04216.
  • Touvron, H., T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Roziere, N. Goyal, E. Hambro, F. Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.