Automatic lexical collocate extraction for corpus-based ontology building and refinementA FunGramKB case study of the THEFT conceptual scenario
- Nicolás José Fernández-Martínez
- Ángel Miguel Felices-Lago
ISSN: 0213-2028
Year of publication: 2021
Volume: 34
Issue: 2
Pages: 435-463
Type: Article
More publications in: Revista española de lingüística aplicada
Abstract
Traditional corpus-based methods rely on manual inspection and extraction of lexical collocates in the study of selection preferences, which is a very costly, labor-intensive, and time-consuming task. Devising automatic methods for lexical collocate extraction becomes necessary to handle this task and the immensity of corpora available. With a view to leveraging the Sketch Engine platform and in-built corpora, we propose a working prototype of a Lexical Collocate Extractor (LeCoExt) command-line tool that mines lexical collocates from all types of verbs according to their syntactic constituents and Collocate Frequency Score (CFS). This might be the first tool that performs comprehensive corpus-based studies of the selection preferences of individual or groups of verbs exploiting the capabilities offered by Sketch Engine. This tool might facilitate the task of extracting rich lexico-semantic knowledge from diverse corpora in a few seconds and at a click away. We test its performance for ontology building and refinement departing from a previous detailed analysis of stealing verbs carried out by Fernández-Martínez & Faber (2020). We show how the proposed tool is used to extract conceptual-cognitive knowledge from the THEFT scenario and implement it into FunGramKB Core Ontology through the creation and modification of theft-related conceptual units.
Funding information
This article is based on the R&D Project within the framework of the ?Programa Operativo FEDER Andaluc?a 2014-2020 (code B-HUM177-UGR18)? and funded by Junta de Andaluc?a, Spain.Funders
-
European Regional Development Fund
European Union
- B-HUM177-UGR18
- Junta de Andalucía Spain
Bibliographic References
- Asaro, C., Biasiotti, M. A., Guidotti, P., Papini, M., Sagri, M. T., & Tiscornia, D. (2003) A domain ontology: Italian crime ontology. InProceedings of the ICAIL 2003 Workshop on Legal Ontologies & Web based legal information management, 1–7.
- Berman, R. (1982) On the Nature of ‘Oblique’ Objects in Bitransitive Constructions. Lingua, 56(2), 101–125. https://doi.org/10.1016/0024-3841(82)90026-2
- Boas, H. (2013) Frame Semantics and Translation. InA. Rojo & I. Ibarretxte-Antunano (Eds.), Cognitive Linguistics and Translation (pp.125–158). Berlin/New York: Mouton de Gruyter. https://doi.org/10.1515/9783110302943.125
- British National Corpus, version 3 (BNC XML Edition) (2007) Distributed by Oxford University Computing Services on behalf of the BNC Consortium. Available atwww.natcorp.ox.ac.uk/ [last accessed15 May 2019]
- Bušta, J., & Herman, O. (2017) JSI Newsfeed Corpus. InThe 9th International Corpus Linguistics Conference, University of Birmingham, 25–28July 2017.
- Dux, R. (2018) Frames, Verbs, and Constructions: German Constructions with Verbs of Stealing. InA. Ziem & H. Boas (Eds.), Approaching German Syntax from a Constructionist Perspective (pp.367–405). Berlin/New York: Mouton de Gruyter. https://doi.org/10.1515/9783110457155-010
- Faber, P., & Mairal-Usón, R. (1999) Constructing a Lexicon of English Verbs. Berlin: Mouton de Gruyter. https://doi.org/10.1515/9783110800623
- Faber, P., & Mairal-Usón, R. (2018) A Conceptually-Oriented Approach to Semantic Composition in RRG. InR. D. Van Valin (Ed.), The Cambridge Handbook of Role and Reference Grammar. Cambridge: Cambridge University Press.
- Felices-Lago, Á. (2014) The emergence of axiology as a key parameter in modern linguistics. InG. Thompson & L. Alba-Juex (eds), Evaluation in Context (pp.27–46). Jon Benjamins. https://doi.org/10.1075/pbns.242.02fel
- Felices-Lago, Á. (2015) Foundational considerations for the development of the Globalcrimeterm subontology: A research project based on FunGramKB. Onomazéin, 31(1): 127–144. https://doi.org/10.7764/onomazein.31.9
- Felices-Lago, Á. (2016) The Process of Constructing Ontological Meaning Based on Criminal Law Verbs. Círculo de Lingüística Aplicada a la Comunicación, 65, 109–148. https://doi.org/10.5209/rev_CLAC.2016.v65.51983
- Fernández-Martínez, N. J., & Faber, P. (2020) Who stole what from whom? A corpus-based, cross-linguistic study of English and Spanish verbs of stealing. Languages in Contrast, 20(1): 107–140. https://doi.org/10.1075/lic.19002.fer
- Fillmore, C., & Baker, C. (2010) A Frames Approach to Semantic Analysis. InB. Heine & H. Narrog (Eds.), The Oxford Handbook of Linguistic Analysis (pp.313–340). New York: Oxford University Press.
- Gangemi, A., Sagri, M., & Tiscornia, D. (2005) A Constructive Framework for Legal Ontologies. InV. R. Benjamins (Eds.), Law and the Semantic Web (pp.97–124). Berlin: Springer. https://doi.org/10.1007/978-3-540-32253-5_7
- Goldberg, A. (2010) Verbs, Constructions and Semantic Frames. InM. Rappaport-Hovav, E. Doron and I. Sichel (Eds.), Syntax, Lexical Semantics and Event Structure (pp.39–58). Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199544325.003.0003
- Jakubíček, M., Kilgarriff, A., McCarthy, D., & Rychlý, P. (2010) Fast Syntactic Searching in Very Large Corpora for Many Languages. PACLIC, 741–747.
- Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlý, P., & Suchomel, V. (2013) The TenTen Corpus Family. Seventh International Corpus Linguistics Conference CL, 125–127.
- Jiménez-Briones, R., & Luzondo-Oyón, A. (2011) Building Ontological Meaning in a Lexico-conceptual Knowledge Base. Onomázein, 23, 11–40.
- Kilgarriff, A., Vojtěch, K., Krek, S., Srdanovič, I., & Tiberius, C. (2010) A Quantitative Evaluation of Word Sketches. Proceedings of the 14th EURALEX International Congress, 372–379.
- Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P., & Suchomel, V. (2014) The Sketch Engine: Ten Years on. Lexicography, 1, 7–36. Available atwww.sketchengine.co.uk [last accessed28 December 2018]
- Leary, R., Vandenberghe, W., & Zeleznikow, J. (2004) Towards a financial fraud ontology: a legal modelling approach, ICAIL 2003 Workshop on Legal Ontologies & Web based legal information management, 1–33.
- Lenci, A. (2000) SIMPLE: A general framework for the development of multilingual lexicon. International Journal of Lexicography, 13(4), 249–263. https://doi.org/10.1093/ijl/13.4.249
- Masolo, C. (2003) WonderWeb Deliverable D18: Ontology Library. Laboratory for Applied Ontology, ISTC-CNR.
- McCarthy, D., Kilgarrif, A., Jakubíček, M., & Reddy, S. (2015) Semantic Word Sketches. Corpus Linguistics (CL2015), 1–5.
- Miller, G., & Fellbaum, C. (2007) WordNet Then and Now. Language Resources and Evaluation, 41(2), 209–214. Available athttps://wordnet.princeton.edu/ [last accessed17 May 2019] https://doi.org/10.1007/s10579-007-9044-6
- Niles, I., & Pease, A. (2001) Towards a standard Upper Ontology. InProceedings of the Second International Conference on Formal Ontology in Information Systems. Ogunquit. Available atwww.adampease.org/professional/FOIS.pdf [last accessed10 January 2019] https://doi.org/10.1145/505168.505170
- Pedersen, B. S., & Keson, B. (1999) SIMPLE–Semantic information for multifunctional plurilingual lexica: some examples of Danish concrete nouns. Proceedings of the SIGLEX-99 Workshop. Maryland. Available atclair.eecs.umich.edu/aan/paper.php?paper_id=W99-0507#pdf [last accessed15 January 2019]
- Periñán-Pascual, C. (2012) En defensa del procesamiento del lenguaje natural fundamentado en la lingüística teórica. Onomázein, 26, 13–48.
- Periñán-Pascual, C. (2013) A knowledge-engineering approach to the cognitive categorization of lexical meaning. VIAL – Vigo International Journal of Applied Linguistics, 10, 85–104.
- Periñán-Pascual, C., & Arcas-Túnez, F. (2004) Meaning postulates in a lexico-conceptual knowledge base. 15th International Workshop on Databases and Expert Systems Applications, IEEE, Los Alamitos (California), 38–42. https://doi.org/10.1109/DEXA.2004.1333446
- Periñán-Pascual, C., & Arcas-Túnez, F. (2005) Microconceptual-Knowledge Spreading in FunGramKB. Proceedings of the 9th IASTED International Conference on Artificial Intelligence and Soft Computing. Anaheim-Calgary-Zurich: ACTA Press, 239–244.
- Periñán-Pascual, C., & Arcas-Túnez, F. (2010a) The architecture of FunGramKB. Proceedings of the 7th International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA), 2667–2674.
- Periñán-Pascual, C., & Arcas-Túnez, F. (2010b) Ontological commitments in FunGramKB. Procesamiento del Lenguaje Natural, 44, 27–34.
- Periñán-Pascual, C., & Mairal-Usón, R. (2009) Bringing Role and Reference Grammar to Natural Language Understanding. Procesamiento del Lenguaje Natural, 43, 265–273.
- Periñán-Pascual, C., & Mairal-Usón, R. (2010) La gramática de COREL: un lenguaje de representación conceptual. Onomázein, 21, 11–45.
- Periñán-Pascual, C., & Mairal-Usón, R. (2011) The COHERENT Methodology in FunGramKB. Onomázein, 24,13–33.
- Ruiz-de-Mendoza Ibáñez, F., & Mairal-Usón, R. (2009) Constructing meaning: a brief overview of the Lexical Constructional Model. InMario Brdar (Ed.), Converging and diverging tendencies in Cognitive Linguistics. Amsterdam/Philadelphia: John Benjamins.
- Ruppenhofer, J., Boas, H., & Baker, C. (2017) FrameNet. InP. Fuertes-Olivera (Ed.), The Routledge Handbook of Lexicography (pp.383–398). New York: Routledge. https://doi.org/10.4324/9781315104942-25
- Rychlý, P. (2008) A Lexicographer-Friendly Association Score. Proceedings of the 2nd Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN, 2, 6–9.
- Sartor, G., Casanovas, P., Biasotti, M. A., & Fernández-Barrera, M. (Eds.) (2011) Approaches to legal ontologies, theories, domains, methodologies, Berlin: Springer. https://doi.org/10.1007/978-94-007-0120-5
- Thorgren, S. (2005) Transaction Verbs: A Lexical and Semantic Analysis of Rob and Steal. Reports from the Department of Language and Culture, 3, 1–44.
- Valente, A. (2005) Types and roles of legal ontologies. InR. Benjamins, P. Casonovas, J. Breuker & A. Gangemi (Eds.), Law and the semantic web (pp.65–76). Berlin: Springer. https://doi.org/10.1007/978-3-540-32253-5_5
- Van Valin, R. (2005) Exploring the Syntax-Semantics Interface. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511610578
- Velardi, P., Pazienza, M., & Fasolo, M. (1991) How to Encode Semantic Knowledge: A Method for Meaning Representation and Computer-aided Acquisition. Computational Linguistics, 17(2), 153–170.