Big Data Learning in Practice

This page has been created to provide information about the talk entitled: "Big data Learning in Practice" given at Benelearn 2016:


Extracting valuable knowledge from big data datasets by means of machine learning techniques may result in more accurate models than ever before. However, most of the standard methods fail to tackle the new space and time requirements. Fortunately, the leverage of recent advances achieved in distributed technologies enables machine learning techniques to discover unknown patterns or hidden relations from voluminous data in a faster way. Nevertheless, the issues posed by (real-world) complex data go beyond computational complexity, and big data mining techniques are confronted with multiple challenges w.r.t. scalability, dimensionality, or even lack of particular kinds of data (e.g. few annotated samples or class-imbalance).
In the first part of this talk I will provide a brief introduction to the big data problem, including MapReduce, as the most representative programing paradigm, as well as a quick overview of recent technologies (Hadoop ecosystem, Spark). Later, I will go across some machine learning libraries for big data (Mahout, MLlib, FlinkML), stating their main advantages.
The second part of this talk will be focused on a demonstration with the MLlib of Apache Spark and some of the models I am working with to tackle imbalanced big data classification problems.

Material Benelearn 2016:

For more information about big data, you can have a look to the tutorial we gave at WCCI 2016 Link

(c) Copyright: Isaac Triguero Velázquez

Totally Valid XHTML 1.0 Totally Valid WCAG AAA