This page contains a non-exhaustive list of projects I am keen to work on with potential undergraduate and MSc students. It's in progress and may be updated semi-frequently. All projects are open to modification and negotiation, if you think you have a way to make them more interesting.

  • 1. Experimental projects

    Projects focusing on scientific output. Typically they will demand less software engineering and more of a scientific mindset. Those projects are evaluated using experiments.

    • 1.1 Multilingual argument claim extraction with recurrent neural networks.


    • 1.2 Data augmentation approaches versus Mixup regularisation for stance classification

      In this project you will evaluate a set of data augmentation approaches in combination with the Mixup regularisation method for training a stance classifier on a small dataset.

    • 1.3 [Propaganda project]


    • 1.4 A comparative study of deep networks for mental workload estimation

      In this project you will perform an evaluation of recurrent networks versus temporal convolutions vs siamese networks for the task of mental workload estimation, which consists in classifying functional near-infrared spectroscopy (fNIRS) data into a mental workload class (e.g. light, medium, heavy workload).

  • 2. Engineering-oriented projects

    Projects focusing on building an application. Less science and more software development. Those projects are typically evaluated with software tests and, if time permits, human users.

    • 2.1 A case-based reasoning system for semi-automated essay marking.

      Case-Based Reasoning (CBR) is a general methodology for problem solving based on comparing a new problem to a set of previously solved problems and using an adapted solution from one of those solved problems in order to solve the new problem. We want to use CBR to provide a system that can automatically mark essay-type questions based on a marking grid and a few examples, in order to enhance distance learning platforms.

    • 2.2 An open source labeling application.

      Creating a good dataset is one of the major pain points when investigating new venues in machine learning. One of the reasons for this is that labeling data is difficult and time-consuming, and software designed to ease that process is scarce and expensive. This project aims at producing an open source alternative to those solutions, where the designer of the experiment can configure a hosted personalised labeling software in a couple of hours. The application would then allow authorised users to log in a Web interface, and be presented with a sequence of random examples that they can label, which is where the research part of the project is. One of the goals of the project is to investigate and develop a set of ordering heuristics that let the system know which instance to present to the user for labeling. This can depend on intrinsic features of the data (text, image, audio) or features derived from more complex models (e.g., distance from separating hyperplane).

    • 2.3 A food substitution recommender system.

      In this project you will work on building a food substitution recommender system, taking as inputs (1) a recipe and (2) an ingredient, and returning a ranked list of potential substitutes for that ingredient, based on several orthogonal dimensions (healthiness, taste, specific diet).

    • 2.4 A personalised movie plot generator.

      In this project you will build a system that generates a movie plot based on specific personal features of the user. The user will then receive a set of candidate plots and be able to mark them up or down, based on a specific set of criteria (whether they like it, whether it makes sense, whether it has too many grammatical mistakes). The system will then use that feedback to generate new movie plots. The system will then be enhanced to allow for cross-user recommendation, by providing a user with plots from other users in order to build a collaborative filtering profile, and use that collective information to write better plots.

    • 2.5 A policy intervention search engine with dynamic user feedback

      In this project you will develop a search engine for a restricted set of structured documents. The main features of this search engine will be the following: (1) a dynamic feedback mechanism that allows the search engine to learn from explicit feedback and refine its rankings ; and (2) a machine learning-powered interface that allows the authorised users to input new documents into the document base by dynamically predicting the representation of those new documents across a few selected representative dimensions.