Nueva propuesta evolutiva para el agrupamiento de documentos en sistemas de recuperación de información

Castillo Sequera, José Luis

Nueva propuesta evolutiva para el agrupamiento de documentos en sistemas de recuperación de información

Castillo Sequera, José Luis

unter der Leitung von:

León Atilano González Sotos Doktorvater
José Raúl Fernández del Castillo Díez Doktorvater

Universität der Verteidigung: Universidad de Alcalá

Fecha de defensa: 17 von Dezember von 2010

Gericht:

Pedro Burillo López Präsident/in
María José Domínguez Alda Sekretärin
Miguel Ángel Patricio Guisado Vocal
José Javier Martínez Herráiz Vocal
Ramón Fuentes González Vocal

Fachbereiche:

Ciencias de la Computación

Art: Dissertation

Teseo: 302952 DIALNET e_Buah editor

Zusammenfassung

Explicit knowledge of the organizations is kept in highly controlled document collections, available to its users. A large document collection requieres tools to organize and reveal its content, that allow users to easily explore it, so as to better get to know its type and discover relations, patterns, trends and other features in order to “understand information”. The need for expertise in Information Retrieval Systems pushed researchers to analize intelligent systems that seek to incorporate and use such knowledge in order to optimize the system. In this thesis, it is shown an evolutionary system (EVS), and the results obtained with the construction of a system of this nature. In this paper we make a contribution in the field of Information Retrieval (IR), proposing the development of a new system using evolutionary techniques, implement a system for unsupervised learning type, to group documents in an Information Retrieval System (IRS) where their groups and number of are unknown a priori by the system. The criteria used to create document clusters will be based on the similarity and distance of the documents, thus forming groups or clusters of related documents, allowing document clustering of a IRS in an acceptable manner, presenting as a valid alternative to traditional clustering methods, being able to compare their experimental results with some traditional methods. The most relevant lexemes of each document, obtained by applying IR techniques, to enrich the information associated with documents in the collection and use them as metadata values for the evolutionary algorithm. Thus, the system works through a document processing method which selects the lexemes of documents using information retrieval criteria. The results prove the feasibility of building a large-scale application of this type in order to integrate it into a knowledge management system that needs to handle large controlled document collections .