Automatic counter-narrative generation for hate speech in Spanish

Montejo Ráez, Arturo; Martín Valdivia, María Teresa; Vallecillo Rodríguez, M. Estrella

Automatic counter-narrative generation for hate speech in Spanish

Montejo Ráez, Arturo
Martín Valdivia, María Teresa
Vallecillo Rodríguez, M. Estrella

Journal:

Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2023

Issue: 71

Pages: 227-245

Type: Article

DIALNET GOOGLE SCHOLAR RUA editor

More publications in: Procesamiento del lenguaje natural

Abstract

This paper analyzes the use of language models to automatically generate counter-narratives for hate speech in Spanish. Despite the existence of a few studies in English and other languages, no previous work has explored this topic focused on Spanish. The article shows that the use of GPT-3 outperforms other models in generating non-offensive and informative counter-narratives, which sometimes present compelling arguments. We have used few-shot learning algorithms applying different prompt strategies and analyzing the results for each of them. Additionally, a new corpus called CONAN-SP, which consists of 238 pairs of hate speech and counter-narratives in Spanish, has been made available to the research community to facilitate further investigations in this area. These findings highlight the potential of language models to combat hate speech in Spanish by counter-narrative generation.

Bibliographic References

Ashida, M. and M. Komachi. 2022. Towards automatic generation of messages countering online hate speech and microaggressions. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 11–23.
Bang, Y., S. Cahyawijaya, N. Lee, W. Dai, D. Su, B. Wilie, H. Lovenia, Z. Ji, T. Yu, W. Chung, Q. V. Do, Y. Xu, and P. Fung. 2023. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity.
Benesch, S. 2014. Countering dangerous speech: New ideas for genocide prevention. Available at SSRN 3686876.
Brown, T., B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
Cawsey, A. J., R. B. Jones, and J. Pearson. 2000. The evaluation of a personalised health information system for patients with cancer. User Modeling and User-Adapted Interaction, 10:47–72.
Chung, H. W., L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, E. Li, X. Wang, M. Dehghani, S. Brahma, et al. 2022. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
Chung, Y.-L., E. Kuzmenko, S. S. Tekiroglu, and M. Guerini. 2019. Conan–counter narratives through nichesourcing: a multilingual dataset of responses to fight online hate speech. arXiv preprint arXiv:1910.03270.
Chung, Y.-L., S. S. Tekiroglu, and M. Guerini. 2021. Towards knowledgegrounded counter narrative generation for hate speech. arXiv preprint arXiv:2106.11783.
Djuric, N., J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, and N. Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th international conference on world wide web, pages 29–30.
Doddington, G. 2002. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the second international conference on Human Language Technology Research, pages 138–145.
Fandiño, A. G., J. A. Estape, M. Pamies, J. L. Palao, J. S. Ocampo, C. P. Carrino, C. A. Oller, C. R. Penagos, A. G. Agirre, and M. Villegas. 2022. Maria: Spanish language models. Procesamiento del Lenguaje Natural, 68.
Fanton, M., H. Bonaldi, S. S. Tekiroglu, and M. Guerini. 2021. Human-in-the-loop for data collection: a multi-target counter narrative dataset to fight online hate speech. arXiv preprint arXiv:2107.08720.
Fortuna, P., M. Domınguez, L. Wanner, and Z. Talat. 2022. Directions for nlp practices applied to online hate speech detection. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11794–11805.
Fortuna, P. and S. Nunes. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR), 51(4):1–30.
Frenda, S., B. Ghanem, M. Montes-y Gomez, and P. Rosso. 2019. Online hate speech against women: Automatic identification of misogyny and sexism on twitter. Journal of Intelligent & Fuzzy Systems, 36(5):4743–4752.
Gu, Y., X. Han, Z. Liu, and M. Huang. 2022. PPT: Pre-trained prompt tuning for few-shot learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8410–8423, Dublin, Ireland, May. Association for Computational Linguistics.
Hangartner, D., G. Gennaro, S. Alasiri, N. Bahrich, A. Bornhoft, J. Boucher, B. B. Demirci, L. Derksen, A. Hall, M. Jochum, et al. 2021. Empathy-based counterspeech can reduce racist hate speech in a social media field experiment. Proceedings of the National Academy of Sciences, 118(50):e2116310118.
Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
Mathew, B., R. Dutt, P. Goyal, and A. Mukherjee. 2019. Spread of hate speech in online social media. In Proceedings of the 10th ACM conference on web science, pages 173–182.
Mathew, B., N. Kumar, P. Goyal, A. Mukherjee, et al. 2018. Analyzing the hate and counter speech accounts on twitter. arXiv preprint arXiv:1812.02712. Mathew, B., P. Saha, H. Tharad, S. Rajgaria, P. Singhania, S. K. Maity, P. Goyal, and A. Mukherjee. 2019. Thou shalt not hate: Countering online hate speech. In Proceedings of the international AAAI conference on web and social media, volume 13, pages 369–380.
OpenAI. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
Papineni, K., S. Roukos, T. Ward, and W.- J. Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
Plaza-Del-Arco, F.-M., M. D. Molina- Gonzalez, L. A. Ureña-Lopez, and M. T. Martın-Valdivia. 2020. Detecting misogyny and xenophobia in spanish tweets using language technologies. ACM Transactions on Internet Technology (TOIT), 20(2):1–19.
Plaza-del Arco, F. M., M. D. Molina- Gonzalez, L. A. Urena-Lopez, and M. T. Martın-Valdivia. 2021. Comparing pretrained language models for spanish hate speech detection. Expert Systems with Applications, 166:114120.
Qian, J., A. Bethke, Y. Liu, E. Belding, and W. Y. Wang. 2019. A benchmark dataset for learning to intervene in online hate speech. arXiv preprint arXiv:1909.04251.
Radford, A., J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
Richards, R. D. and C. Calvert. 2000. Counterspeech 2000: A new look at the old remedy for bad speech. BYU L. Rev., page 553.
Scao, T. L., A. Fan, C. Akiki, E. Pavlick, S. Ilic, D. Hesslow, R. Castagne, A. S. Luccioni, F. Yvon, M. Galle, et al. 2022. Bloom: A 176b-parameter openaccess multilingual language model. arXiv preprint arXiv:2211.05100.
Tekiroglu, S. S., Y.-L. Chung, and M. Guerini. 2020. Generating counter narratives against online hate speech: Data and strategies. arXiv preprint arXiv:2004.04216.
Touvron, H., T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Roziere, N. Goyal, E. Hambro, F. Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.

Data source: Dialnet