Generación de un tesauro de similitud multilingüe a partir de un corpus comparable a CLIR

  1. Martín Valdivia, María Teresa
  2. García Vega, Manuel
  3. Martínez Santiago, Fernando
  4. Ureña López, Luis Alfonso
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2002

Issue: 28

Pages: 55-62

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

In this work, it is described a new approach to automatically generate a similarity thesaurus through a comparable corpus, with the aim of applying it to Cross Language Information Retrieval. Although the availability of linguistic resources is higher and higher, it is still difficult to heve access to some of them, above all on multilingual circles. Even, the complexity itself of the ask CLIR requires the global use of several resources to increase the efficiency of the system. The comparable corpus are one of this multilingual resources specially interesting due to its availability and due do its chance to be generated automatically. However, in order to make these corpora useful, they should be aligned at least at document level. In order to carry out this task, clustering techniques have been used. Once the documents are aligned, the similarity thesaurus is generated from them. The accomplished experiments show that the multilingual similarity thesaurus are a good chance when other more suitable resources are not available.