I have moved to Newcastle University. This web page is maintained for historical reasons. New email



Project ideas

Jaume Bacardit - jqb@cs.nott.ac.uk


Big data infrastructure for evolutionary learning methods

We live in the era of big data. All aspects of science and society generate very large amounts of data, from internet to biology to astronomy. All this data, however, is useless unless we have computational techniques that are able to grasp its vastness and extract meaningful information providing added value. Within this context, Google's Map-Reduce methodology/framework for handling big data (popularised through the Hadoop open source implementation) is currently the bread-and-butter of big data analysis, the platform of choice for the majority of this community.

The objective of this project is to adapt the evolutionary data mining systems develop in the last few years at the University of Nottingham to Hadoop, so they are able to tackle problems of sizes that before were impossible to solve. To achieve this aim the project will make use of the brand new High-Performance Computing cluster of the University.

Web service for -omics data analysis using rule-based machine learning

In the last decade biological research has seen the development of many experimental technologies that are able to generate high-throughput quantitative data from biological samples. Their usage has improved our understanding about many different aspects of life. However, the effectiveness of these technologies is constrained by the limitations of the analysis methods applied to this data.

Recently at the University of Nottingham we have develop a new methodology based on rule-based machine learning to mine this kind of datasets to generate robust prediction models and extract meaningful information out of the mining process. The goal of this project is to develop a web service with a user-friendly web inteface to anybody can access our methodology. This web service will interact in the backend with the brand new High-Performance Computing cluster of the University to perform all computationally heavy elements of its functioning.

Rule-based knowledge representations for -omics data analysis

In the last decade biological research has seen the development of many experimental technologies that are able to generate high-throughput quantitative data from biological samples. Their usage has improved our understanding about many different aspects of life. However, the effectiveness of these technologies is constrained by the limitations of the analysis methods applied to this data.

Recently at the University of Nottingham we have develop a new methodology based on rule-based machine learning to mine this kind of datasets to generate robust prediction models and extract meaningful information out of the mining process. The type of information that we can extract out of the data mining process greatly depends on how the rule-based knowledge representations are defined. The goal of this project is to create and thoroughly evaluate a broad range of variants of knowledge representations for -omics data analysis in order to identify their domains of competence in terms of prediction capacity and knowledge discovery.

Large Scale Data Mining Challenge: Contact Map prediction


Bioinformatics is a very fascinating research area where many disciplines of science such as mathematics, computer science, engineering, etc. are put together to solve biological problems and bring new insight into our understanding of how life works. Within the bioinformatics context one of the most relevant topics of research is proteomics, the study of the role and structure of proteins and, in particular, the prediction of the structure of proteins (PSP).

The prediction of the (sub)structure of proteins is a very challenging task from a data mining point of view: Very large sets of records, high dimensionality spaces and high class unbalance are just some of these challenges. The focus of this project is in a specific type of PSP: contact map (CM) prediction, which involves all of these challenges. The CM prediction method developed at Nottingham is currently one of the top world methods for this class of problems, but its training process is extremely costly, using tens of thousands of CPU hours

The focus of this project is to perform a data mining-centric reassessment of our CM prediction method in order to (a) improve the quality of the predictions and (b) alleviate the computational cost of training the model.