Metodologías de datos de calidad (Smart Data) para Deep Learning: el problema del ruido de clase y aplicaciones en corales y COVID-19

Gómez Ríos, Anabel

Metodologías de datos de calidad (Smart Data) para Deep Learning: el problema del ruido de clase y aplicaciones en corales y COVID-19

Gómez Ríos, Anabel

Dirigida por:

Francisco Herrera Triguero Codirector/a
Julián Luengo Martín Codirector/a

Universidad de defensa: Universidad de Granada

Fecha de defensa: 19 de julio de 2022

Tribunal:

Salvador García Lopez Presidente/a
Rocío C. Romero Zaliz Secretario/a
María José del Jesús Díaz Vocal
Amelia Zafra Gómez Vocal
David Camacho Fernández Vocal

Tipo: Tesis

Teseo: 730739 DIALNET DIGIBUG editor

Resumen

Currently, all the processes that are being executed in governments, companies and research centres are generating data that will be processed to extract valuable information. The process of extracting relevant information in data is known as Knowledge Discovery in Databases. This process contains two important steps, which are data cleaning and preprocessing, and data mining. The first one cleans the data in terms of inconsistencies, possible missing values, noise (errors in the data), etc. The second one uses the clean or smart data generated in the first step and applies Machine Learning algorithms to extract patterns and information from the data. Deep Learning, a branch of Machine Learning, is now being widely used due to its good performance, especially when the data is composed of images, even outperforming other Machine Learning algorithms. However, Deep Learning is known to need great quantities of data to perform well, which is a drawback for the application of Deep Learning algorithms in scenarios that lack a big volume of data. In this thesis, we propose the use of different preprocessing and optimization techniques to be able to use Deep Learning, and in particular, Convolutional Neural Networks, when the image data sets that we have available are small (below 1500 images), because it is costly or hard to obtain more data. That way, we transform the small data sets into smart data that can be used to train Convolutional Neural Networks.