Metodologías de datos de calidad (Smart Data) para Deep Learning: el problema del ruido de clase y aplicaciones en corales y COVID-19

Gómez Ríos, Anabel

Metodologías de datos de calidad (Smart Data) para Deep Learning: el problema del ruido de clase y aplicaciones en corales y COVID-19

Gómez Ríos, Anabel

Supervised by:

Francisco Herrera Triguero Co-director
Julián Luengo Martín Co-director

Defence university: Universidad de Granada

Fecha de defensa: 19 July 2022

Committee:

Salvador García Lopez Chair
Rocío C. Romero Zaliz Secretary
María José del Jesús Díaz Committee member
Amelia Zafra Gómez Committee member
David Camacho Fernández Committee member

Type: Thesis

Teseo: 730739 DIALNET DIGIBUG editor

Abstract

Currently, all the processes that are being executed in governments, companies and research centres are generating data that will be processed to extract valuable information. The process of extracting relevant information in data is known as Knowledge Discovery in Databases. This process contains two important steps, which are data cleaning and preprocessing, and data mining. The first one cleans the data in terms of inconsistencies, possible missing values, noise (errors in the data), etc. The second one uses the clean or smart data generated in the first step and applies Machine Learning algorithms to extract patterns and information from the data. Deep Learning, a branch of Machine Learning, is now being widely used due to its good performance, especially when the data is composed of images, even outperforming other Machine Learning algorithms. However, Deep Learning is known to need great quantities of data to perform well, which is a drawback for the application of Deep Learning algorithms in scenarios that lack a big volume of data. In this thesis, we propose the use of different preprocessing and optimization techniques to be able to use Deep Learning, and in particular, Convolutional Neural Networks, when the image data sets that we have available are small (below 1500 images), because it is costly or hard to obtain more data. That way, we transform the small data sets into smart data that can be used to train Convolutional Neural Networks.