Metodologías de datos de calidad (Smart Data) para Deep Learning: el problema del ruido de clase y aplicaciones en corales y COVID-19
- Gómez Ríos, Anabel
- Francisco Herrera Triguero Co-director
- Julián Luengo Martín Co-director
Defence university: Universidad de Granada
Fecha de defensa: 19 July 2022
- Salvador García Lopez Chair
- Rocío C. Romero Zaliz Secretary
- María José del Jesús Díaz Committee member
- Amelia Zafra Gómez Committee member
- David Camacho Fernández Committee member
Type: Thesis
Abstract
Currently, all the processes that are being executed in governments, companies and research centres are generating data that will be processed to extract valuable information. The process of extracting relevant information in data is known as Knowledge Discovery in Databases. This process contains two important steps, which are data cleaning and preprocessing, and data mining. The first one cleans the data in terms of inconsistencies, possible missing values, noise (errors in the data), etc. The second one uses the clean or smart data generated in the first step and applies Machine Learning algorithms to extract patterns and information from the data. Deep Learning, a branch of Machine Learning, is now being widely used due to its good performance, especially when the data is composed of images, even outperforming other Machine Learning algorithms. However, Deep Learning is known to need great quantities of data to perform well, which is a drawback for the application of Deep Learning algorithms in scenarios that lack a big volume of data. In this thesis, we propose the use of different preprocessing and optimization techniques to be able to use Deep Learning, and in particular, Convolutional Neural Networks, when the image data sets that we have available are small (below 1500 images), because it is costly or hard to obtain more data. That way, we transform the small data sets into smart data that can be used to train Convolutional Neural Networks.