Técnicas de clasificación de opiniones aplicadas a un corpus en español

  1. Martínez Cámara, Eugenio
  2. Martín Valdivia, María Teresa
  3. Perea Ortega, José Manuel
  4. Ureña López, Luis Alfonso
Aldizkaria:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Argitalpen urtea: 2011

Zenbakia: 47

Orrialdeak: 163-170

Mota: Artikulua

Beste argitalpen batzuk: Procesamiento del lenguaje natural

Laburpena

Sentiment analysis is a new challenging task related to Text Mining and Natural Language Processing (NLP). Although there are some current works, most of them only focus on English texts. However, web pages, blogs and opinions on the Internet are increasing every day in any language and not only in English. Other language like Spanish is increasingly present so we have carried out an experimental study with a Spanish films reviews corpus. Our main goal is to check the results obtained using several classifiers trained in order to determinate the opinion polarity. We have tested two classification algorithms (SVM, Naïve Bayes) and several weighting schemes and different linguistic preprocessing (stopper and stemmer). The accomplished experiments show that SVM works better than Naïve Bayes. In addition, the stopper and stemmer also obtain a slight improvement.

Erreferentzia bibliografikoak

  • Agić, Z., N. Ljubešić, M. Tadić. 2010. Towards Sentiment Analysis of Financial Texts in Croatian. In Proceedings of Language Resources and Evaluation (LREC).
  • Ahmad, Cheng y Almas. 2006. Multi-lingual sentiment analysis of financial news streams. Proceedings of Science, GRID2006.
  • Boldrini, E., A. Balahur, A., P. Martínez-Barco, and A. Montoyo. 2009. Emotiblog: an annotation scheme for emotion detection and analysis in nontraditional textual genres. In DMIN, pp 491–497. CSREA Press.
  • Chang, C.C. y C.J. Lin, 2001. LIBSVM: a Library for Support Vector Machines. Software disponible en http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  • Cruz, F.L., J.A. Troyano, F. Enríquez, y J. Ortega. 2008. Clasificación de documentos basada en la opinión: experimentos con un corpus de críticas de cine en español. Sociedad Española para el Procesamiento de Lenguaje Natural , nº 41
  • Del-Hoyo, R., I. Hupont, F.J. Lacueva, D. Abadía. 2009. Hybrid Text Affect Sensing System for Emotional Language Analysis. Proceedings of the International Workshop on Affective-Aware Virtual Agents and Social Robots
  • Denecke, K. 2008. Using SentiWordNet for multilingual sentiment analysis. In ICDE Workshops, pp 507–512. IEEE Computer Society.
  • Esuli, A. and F. Sebastiani. 2006. SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In Proceedings of Language Resources and Evaluation (LREC).
  • Genkin, A., D.D. Lewis, D. Madigan. 2004. Large-Scale Bayesian Logistic Regression for Text Categorization. Technometrics, Vol. 49, No. 3
  • Ghorbel, H. y D. Jacot. 2010. Sentiment analysis of French movie reviews. Proceedings of the 4th international Workshop on Distributed Agent-based Retrieval Tools (DART 2010), June 2010, Geneva.
  • Mitchell, T. 1997. Machine Learning. Ed. McGraw-Hill.
  • Ortiz, Antonio Jesús, M.T: Martín, L.A. Ureaña, M.A. García. 2005. Detección automática de Spam utilizando Regresión Logística Bayesiana. Procesamiento del Lenguaje Natural. Nº 35, pp. 127-133. 2005.
  • Pang, B. and L. Lee, 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retr., 2(1-2):1–135.
  • Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Comput. Surv. , 34(1):1-47.
  • Stone, P. 1966. The General Inquirer: A Computer Approach to Content Analysis. The MIT Press.
  • Vapnik, V. 2008. Statistical Learning Theory. Wiley, Chichester, GB.
  • Whissell, C.M. 1989. The Dictionary of Affect in Language. Emotion: Theory, Research and Experience. Vol 4, The Measuerement of Emotions. R. Plutchik and H. Kellerman Eds. New York: Academic.
  • Zhang, C., D. Zeng, J. Li, F.Y. Wang, and W. Zuo, 2009. Sentiment analysis of chinese documents: From sentence to document level. JASIST, 60(12):2474–2487.