Overview of MeOffendEs at IberLEF 2021offensive Language Detection in Spanish Variants

  1. Jarquín-Vásquez, Horacio
  2. Villaseñor Pineda, Luis
  3. Plaza-del-Arco, Flor Miriam
  4. Casavantes, Marco
  5. Casavantes, Marco
  6. Escalante, Hugo Jair
  7. Martín Valdivia, María Teresa
  8. Montejo Ráez, Arturo
  9. Montes y Gómez, Manuel
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2021

Número: 67

Páginas: 183-194

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Este artículo presenta la tarea MeOffendES 2021, organizada en iberLEF 2021 junto a la 37ª Conferencia Internacional de la Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN 2021). El objetivo principal de MeOffendEs es promover la detección del lenguaje ofensivo en las variantes del español. La tarea compartida implica cuatro subtareas, las dos primeras corresponden a la identificación de categorías de lenguaje ofensivo en textos genéricos en español extraídos de diferentes redes sociales, mientras que las subtareas 3 y 4 están relacionadas con la identificación de lenguaje ofensivo dirigido a la variante mexicana del español. Para la competencia se han puesto a disposición de la comunidad del Procesamiento del Lenguaje Natural dos conjuntos de datos anotados con lenguaje ofensivo. MeOffendes ha atraído a un gran número de participantes: un total de 69 se inscribieron para participar en la tarea, 12 presentaron resultados oficiales sobre los datos de evaluación y 10 presentaron artículos describiendo su sistema. Los conjuntos de datos y los resultados oficiales están disponibles en el sitio web de la tarea compartida: https://competitions.codalab.org/competitions/28679.

Referencias bibliográficas

  • Alvarez-Carmona, M. ´ A., E. Guzmán-Falcón, M. Montes-y-Gómez, H. J. Escalante, L. Villaseñor-Pineda, V. Reyes-Meza, and A. Rico-Sulayes. 2018. Overview of MEXA3T at ibereval 2018: Authorship and aggressiveness analysis in mexican spanish tweets. In P. Rosso, J. Gonzalo, R. Martínez, S. Montalvo, and J. C. de Albornoz, editors, Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018, volume 2150 of CEUR Workshop Proceedings, pages 74–96. CEUR-WS.org.
  • Aragón, M. E., M. A. ´ Alvarez-Carmona, M. Montes-y-Gómez, H. J. Escalante, L. Villaseñor-Pineda, and D. Moctezuma. 2019. Overview of MEX-A3T at iberlef 2019: Authorship and aggressiveness analysis in mexican spanish tweets. In M. A. G. Cumbreras, J. Gonzalo, E. M. Cámara, R. Martínez-Unanue, P. Rosso, J. Carrillo-de-Albornoz, S. Montalvo, L. Chiruzzo, S. Collovini, Y. Gutiérrez, S. M. J. Zafra, M. Krallinger, M. Montes-y-Gómez, R. Ortega-Bueno, and A. Rosá, editors, Proceedings of the Iberian Languages Evaluation Forum colocated with 35th Conference of the Spanish Society for Natural Language Processing, IberLEF-SEPLN 2019, Bilbao, Spain, September 24th, 2019, volume 2421 of CEUR Workshop Proceedings, pages 478–494. CEUR-WS.org.
  • Aragón, M. E., H. J. Jarquín-Vásquez, M. Montes-y-Gómez, H. J. Escalante, L. Villaseñor-Pineda, H. Gómez-Adorno, J. P. Posadas-Durán, and G. Bel-Enguix. 2020a. Overview of MEX-A3T at iberlef 2020: Fake news and aggressiveness analysis in mexican spanish. In M. A. G. ´ Cumbreras, J. Gonzalo, E. M. Cámara, R. Martínez-Unanue, P. Rosso, S. M. J. Zafra, J. A. O. Zambrano, A. Miranda, J. P. Zamorano, Y. Gutiérrez, A. Rosá, M. Montes-y-Gómez, and M. G. Vega, editors, Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) colocated with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020), Málaga, Spain, September 23th, 2020, volume 2664 of CEUR Workshop Proceedings, pages 222– 235. CEUR-WS.org.
  • Aragón, M., H. Jarquín, M. M.-y. Gómez, H. Escalante, L. Villaseñor-Pineda, H. Gómez-Adorno, G. Bel-Enguix, and J. Posadas-Durán. 2020b. Overview of mex-a3t at iberlef 2020: Fake news and aggressiveness analysis in mexican spanish. In Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain.
  • Aroyehun, S. T. and A. Gelbukh. 2021. Evaluation of intermediate pre-training for the detection of offensive language. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021), CEUR Workshop Proceedings. CEUR-WS.org.
  • Calderón, J. J., E. S. Tellez, and M. Graff. 2021. Dccd-infotec at meoffendes@iberlef21 subtask 3: A transfer learning approach based on evomsa’s stacked generalization. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021), CEUR Workshop Proceedings. CEUR-WS.org.
  • Canales, L. and P. Martínez-Barco. 2014. Emotion detection from text: A survey. In Proceedings of the workshop on natural language processing in the 5th information systems research working days (JISIC), pages 37–43.
  • Cañete, J., G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, and J. Pérez. 2020. Spanish pre-trained bert model and evaluation data. In PML4DC at ICLR 2020.
  • Chen, T. and C. Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794, New York, NY, USA. Association for Computing Machinery.
  • Díaz-Torres, M. J., P. A. Morán-Méndez, L. Villasenor-Pineda, M. Montes-y Gómez, J. Aguilera, and L. MenesesLerín. 2020. Automatic detection of offensive language in social media: Defining linguistic criteria to build a Mexican Spanish dataset. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pages 132–136, Marseille, France, May. European Language Resources Association (ELRA).
  • Garcá-Díaz, J. A., S. M. Jiménez-Zafra, and R. Valencia-García. 2021. Umuteam at meoffendes 2021: Ensemble learning for offensive language identification using linguistic features, fine-grained negation and transformers. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021), CEUR Workshop Proceedings. CEUR-WS.org.
  • García, M. N. and I. S. Bedmar. 2021. Detecting offensiveness in social network comments. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021), CEUR Workshop Proceedings. CEURWS.org.
  • Gómez-Espinosa, V., V. Muñiz-Sanchez, and A. P. López-Monroy. 2021. Transformers pipeline for offensiveness detection in mexican spanish social media. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021), CEUR Workshop Proceedings. CEUR-WS.org.
  • Grave, E., P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov. 2018. Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), page .
  • Huerta-Velasco, D. A. and H. Calvo. 2021. Using lexical resources for detecting offensiveness in mexican spanish tweets. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021), CEUR Workshop Proceedings. CEUR-WS.org.
  • MacAvaney, S., H.-R. Yao, E. Yang, K. Russell, N. Goharian, and O. Frieder. 2019. Hate speech detection: Challenges and solutions. PloS one, 14(8):e0221152. Medhat, W., A. Hassan, and H. Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal, 5(4):1093–1113.
  • Montes, M., P. Rosso, J. Gonzalo, E. Aragón, R. Agerri, M. Alvarez Carmona, E. Alvarez Mellado, J. Carrillo-de Albornoz, L. Chiruzzo, L. Freitas, H. Gómez Adorno, Y. Gutiérrez, S. Lima, S. M. Jiménez-Zafra, F. M. Plaza-delArco, and M. Taulé, editors. 2021. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021).
  • Mubarak, H., K. Darwish, W. Magdy, T. Elsayed, and H. Al-Khalifa. 2020. Overview of osact4 arabic offensive language detection shared task. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 48–52.
  • Plaza-del Arco, F. M., M. D. MolinaGonzález, L. A. Ureña-López, and M. T. Martín-Valdivia. 2021. Comparing pretrained language models for spanish hate speech detection. Expert Systems with Applications, 166:114120.
  • Qu, S., Q. Que, and Shuangjun. 2021. Noncontextual binary classification for mexican spanish with xlm and cnn. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021), CEUR Workshop Proceedings. CEUR-WS.org.
  • Qu, Y., Y. Yang, and G. Wang. 2021. Ynu qyc at meoffendes@iberlef 2021: the xlm-roberra and lstm for identifying offensive tweets. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021), CEUR Workshop Proceedings. CEUR-WS.org.
  • Razavi, A. H., D. Inkpen, S. Uritsky, and S. Matwin. 2010. Offensive language detection using multi-level classification. In A. Farzindar and V. Keˇselj, editors, Advances in Artificial Intelligence, pages 16–27, Berlin, Heidelberg. Springer Berlin Heidelberg.
  • Sánchez-Vega, F. and A. P. López-Monroy. 2021. Cimat-gto at meoffendes 2021: Bert’s auxiliary sentence focused on word’s information for offensiveness detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021), CEUR Workshop Proceedings. CEURWS.org.
  • Sreelakshmi, K., B. Premjith, and K. P. Soman. 2021. Transformer based offensive language identification in spanish. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021), CEUR Workshop Proceedings. CEUR-WS.org.
  • Tang, E. K., P. N. Suganthan, and X. Yao. 2006. An analysis of diversity measures. Mach. Learn., 65(1):247–271.
  • Twitter. 2021. Tweets – twitter developers. https://developer.twitter.com/. Accessed: 2021-06-30.
  • Wiebe, J., T. Wilson, R. Bruce, M. Bell, and M. Martin. 2004. Learning subjective language. Computational linguistics, 30(3):277–308.
  • Zampieri, M., P. Nakov, S. Rosenthal, P. Atanasova, G. Karadzhov, H. Mubarak, L. Derczynski, Z. Pitenis, and C¸ . C¸ öltekin. 2020. Semeval-2020 task 12: Multilingual offensive language identification in social media (offenseval 2020). arXiv preprint arXiv:2006.07235.