Project Ideas

General

I am happy to discuss any interesting project ideas with students. Some examples of projects I have supervised past and present are given below. My research interests revolve around machine learning, data analytics and visualisation, and computational and mathematical modelling, often but not exclusively applied to biological problems. More broadly, I am interested in anything to do with virtual reality, cryptocurrencies, robotics and computer security.

Project ideas

Past Projects (not available for supervision)

Building and using an underwater drone to track fish

Supervisor:

Dr. Jamie Twycross (Computer Science)

Description:

In this project, you will design and build an underwater drone (for example [1, 2]) and implement a control system (for example [3, 4]) to allow the drone to autonomously find and track fish in a real-world environment.

References:

[1] RC Submarine 4.0, July 2022.

[2] Underwater Drone: The Story of the Madness. Ievgenii Tkachenko. Feb 3, 2019.

[3] Ji D, Rehman F ur, Ajwad SA, et al. Design and development of autonomous robotic fish for object detection and tracking. International Journal of Advanced Robotic Systems. May 2020

[4] Chen J, Yin B, Wang C, Xie F, Du R, and Zhong Y. Bioinspired Closed-loop CPG-based Control of a Robot Fish for Obstacle Avoidance and Direction Tracking. J Bionic Eng 18, 171–183, 2021.

Genome-scale metabolic modelling of Clostridium cellulolyticum

Supervisors:

Dr. Jamie Twycross (Computer Science)
Dr. Nicole Pearcy (Veterinary Medicine and Science)

Description:

The bacterium Clostridium cellulolyticum is an attractive microbial host for producing biofuels and biochemicals due to its ability to naturally digest lignocellulose, which is an abundant waste feedstock from agriculture, crops, and municipal waste [1]. To optimise this strain to convert this waste by-product into valuable chemicals, however, requires a system-level understanding of the bacterium’s metabolism using genome-scale metabolic models (GSMs) (see [2, 3] for reviews on GSMs). In this project, you will use and update (if necessary) the published GSM of Clostridium cellulolyticum [4], to assess the bacterium’s capabilities as a microbial chassis. You will work with researchers in the Synthetic Biology Research Centre (SBRC) to complete this project. As a stretch goal, you will apply techniques, such as optKnock [5] and optGene [6], to identify potential knockout strategies for producing an SBRC-relevant biochemical. The project will require good python programming skills and an interest in applying mathematical modelling to biological problems. A basic understanding of linear programming is desirable.

References:

[1] Gaida SM, Liedtke A, Jentges AH, Engels B, Jennewein S. Metabolic engineering of Clostridium cellulolyticum for the production of n-butanol from crystalline cellulose. Microbial cell factories. 2016 Dec;15(1):1-1.

[2] Fang X, Lloyd CJ, Palsson BO. Reconstructing organisms in silico: genome-scale models and their emerging applications. Nature Reviews Microbiology. 2020 Dec;18(12):731-43.

[3] Gu C, Kim GB, Kim WJ, Kim HU, Lee SY. Current status and applications of genome-scale metabolic models. Genome biology. 2019 Dec;20(1):1-8.

[4] Salimi F, Zhuang K, Mahadevan R. Genome‐scale metabolic modeling of a clostridial co‐culture for consolidated bioprocessing. Biotechnology journal. 2010 Jul;5(7):726-38.

[5] Burgard AP, Pharkya P, Maranas CD. Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnology and bioengineering. 2003 Dec 20;84(6):647-57.

[6] Patil KR, Rocha I, Förster J, Nielsen J. Evolutionary programming as a platform for in silico metabolic engineering. BMC bioinformatics. 2005 Dec;6(1):1-2.

Predicting gene essentiality from experimental data using machine learning approaches

Supervisors:

Dr. Jamie Twycross (Computer Science)
Dr. Nicole Pearcy (Veterinary Medicine and Science)

Description:

The set of genes that are essential for growth and survival of bacteria are useful for both biotechnological and biomedical purposes (i.e., for developing new strategies to increase product yields or producing new treatments). A genome-wide essentiality screening of micro-organisms is now possible for determining whether genes are ‘essential’ or ‘non-essential’ (see for example [1, 2]). Current approaches, however, rely on setting an ambiguous threshold for the classification, which is problematic for more complex cases, such as those involving a combination of essential and non-essential domains within the gene (see [3] for examples). In this project, you will develop a machine learning approach which is able to accurately classify these more complicated cases. You will have the opportunity to work with researchers in the Synthetic Biology Research Centre (SBRC), as well as experimentalists from DeepBranch, to complete this project. A good understanding of machine learning approaches is required for this project.

References:

[1] Page AJ, Bastkowski S, Yasir M, Turner AK, Le Viet T, Savva GM, Webber MA, Charles IG. AlbaTraDIS: comparative analysis of large datasets from parallel transposon mutagenesis experiments. PLoS Computational Biology. 2020. 17;16(7):e1007980.

[2] Barquist L, Mayho M, Cummins C, Cain AK, Boinett CJ, Page AJ, Langridge GC, Quail MA, Keane JA, Parkhill J. The TraDIS toolkit: sequencing and analysis for dense transposon mutant libraries. Bioinformatics. 2016 Apr 1;32(7):1109-11.

[3] Goodall EC, Robinson A, Johnston IG, Jabbari S, Turner KA, Cunningham AF, Lund PA, Cole JA, Henderson IR. The essential genome of Escherichia coli K-12. MBio. 2018. 9(1):e02096-17.

Predicting Keypresses using an Audio Side-Channel Attack and Machine Learning

Supervisor:

Dr. Jamie Twycross (Computer Science)

Description:

A number of attacks which use audio as a side-channel have been demonstrated, for example [1]. The central research question this project addresses is: can keystrokes be predicted from keypress sounds? Researchers have investigated a number of approaches addressing this question, for example [2, 3, 4]. This project will focus on investigating machine learning approaches to address this question. In this project, you will implement a keylogger which logs keystrokes and their associated keypress sounds to generate a data set. You will then evaluate the effectiveness of a number of state-of-the-art machine learning approaches on predicting keypresses using this data set. A stretch goal could be to also implement a program which records keypress sounds in the wild and uses a trained machine learning algorithm to predict keypresses. This project will require good python programming skills and interest in machine learning and computer security.

References:

[1] Physical locks are less hackable than digital locks, right? Maybe not: Boffins break in with a microphone. Tim Anderson. The Register, Aug 21, 2020.

[2] Don't Skype & Type! Acoustic Eavesdropping in Voice-Over-IP. Alberto Compagno, Mauro Conti, Daniele Lain, Gene Tsudik. ASIA CCS '17: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 703–715, Apr 2017.

[3] Acoustic Eavesdropping: Predicting Keystrokes With Google AutoML Vision. Charith De Silva. Jul 22, 2020.

[4] Keytap: description and some random thoughts. Georgi Gerganov. Nov 30, 2018.

Selecting Cryptocurrency Trading Pairs

Supervisor:

Dr. Jamie Twycross (Computer Science)

Description:

The Sequencer is an interesting new machine learning algorithm designed to automatically reveal the main trend or sequence in a dataset [1]. It can find patterns in data, which can be used to generate hypotheses based on these patterns (for example [2]). The idea behind this project is to apply the Sequencer algorithm and/or other unsupervised machine learning approaches [3] to cryptocurrency trading data to identify sets of cryptocurrency pairs with different trends considering a range of metrics e.g. price, volatility, volume indicators. These identified sets could be used, for example, to aid in the selection of a diverse portfolio of cryptocurrencies, or to select trading pairs based with specific characteristics.

References:

[1] The Sequencer Algorithm.

[2] Machine learning helps geoboffins spot huge beds of hot rocks 1,000km across deep below Earth's surface (The Register).

[3] Unsupervised learning (Wikipedia).

Informing School Decision Making

Supervisor:

Dr. Jamie Twycross (Computer Science)

Description:

Part of the effective management of a primary or secondary school involves making decisions on how to distribute financial and staff resources in order to improve school performance (i.e. improve the education of pupils). One way to help schools make these decisions is to look at how other successful and similar schools have distributed their resources (see for example [1]). In particular, if patterns can be recognised across a number of such schools, these patterns could suggest effective ways to distribute resources. Datasets are available for all U.K. schools over a number of years detailing for example: school spending, school academic performance, and school staffing [2]. This project involves exploring these dataset using approaches (e.g. unsupervised machine learning [3, 4]) from data analytics [5], data mining [6], and data visualisation [7, 8] in order to identify, explore are visualise patterns which could be useful in the decision making process. For example, one could use these approaches to explore if there is anything in common with the way schools rated outstanding spend their money, or if academic performance is related to the way schools are staffed.

References:

[1] Schools financial benchmarking (U.K. Government)

[2] Schools financial benchmarking - Data sources (U.K. Government)

[3] Data Exploration using Unsupervised Machine Learning — Cluster Analysis (Medium).

[4] Unsupervised learning (Wikipedia).

[5] Data mining (Wikipedia).

[6] Data analysis (Wikipedia).

[7] D3 - Gallery (Data-Driven Documents).

[8] Data visualization (Wikipedia).

Using Machine Learning to Identify Individual Birds by Call

Supervisors:

Dr. Jamie Twycross (Computer Science)
Drs. Alan Burbidge/Alan Wilkins (Biology)

Description:

The Manx shearwater [1] is a migratory seabird which can live for over 50 years. Manx shearwaters nest in burrows, and biologists have collected audio data from microphones placed in a number of burrows. This data records the calls made by the birds in the burrows. It has previously been shown for other bird species [2, 3] that individual birds can be identified by variations in their calls. In this project, you will develop a machine learning approach and associated software system which will be able to identify individual Manx shearwaters from recordings of their calls. Potential questions which this system could help address include determining, for example: whether two birds stay in the same burrow; if a bird has gone missing from a burrow; or if one bird has been replaced by another in a burrow. As well as working with myself on the Computer Science aspect of the project, you will have the opportunity to work with biologists who collected the recordings.

References:

[1] Manx shearwater (Wikipedia).

[2] Dan Stowell, Veronica Morfi, Lisa F. Gill. Individual Identity in Songbirds: Signal Representations and Metric Learning for Locating the Information in Complex Corvid Calls. INTERSPEECH 2016: 2607-2611.

[3] Julio G. Arriaga, Hector Sanchez, Edgar E. Vallejo, Richard Hedley, and Charles E. Taylor. 2016. Identification of Cassin's Vireo (Vireo cassinii) individuals from their acoustic sequences using an ensemble of learners. Neurocomput. 175, PB (January 2016), 966-979. doi:10.1016/j.neucom.2015.05.129.

3D Virtual Reality Visualisation of Phylogenetic Trees

Supervisor:

Dr. Jamie Twycross (Computer Science)

Description:

Phylogenetic trees [1] are used to show the evolutionary relationship between biological entities such as organisms, proteins and genes. These trees can often be large and complex, containing many hundreds or thousands of entities. A range of 2-dimensional approaches [2] have been developed to visualise such large trees and to explore and understand the relationships, and to a lesser extent 2.5/3-dimensional approaches [3, 4, 5]. In this project you will develop a 3D visualisation approach which allows large phylogenetic trees to be viewed and explored in a virtual reality environment.

References:

[1] Phylogenetic tree (Wikipedia).

[2] Pavlopoulos GA, Soldatos TG, Barbosa-Silva A, Schneider R. A reference guide for tree analysis and visualization. BioData Mining. 2010;3:1. doi:10.1186/1756-0381-3-1.

[3] Waese J, Provart NJ, Guttman DS. Topo-phylogeny: Visualizing evolutionary relationships on a topographic landscape. Guralnick R, ed. PLoS ONE. 2017;12(5):e0175895. doi:10.1371/journal.pone.0175895.

[4] Hughes T, Hyun Y, Liberles DA. Visualising very large phylogenetic trees in three dimensional hyperbolic space. BMC Bioinformatics. 2004;5:48. doi:10.1186/1471-2105-5-48.

[5] Kim N, Lee C. Three-Dimensional Phylogeny Explorer: Distinguishing paralogs, lateral transfer, and violation of "molecular clock" assumption with 3D visualization. BMC Bioinformatics. 2007;8:213. doi:10.1186/1471-2105-8-213.

Measuring process complexity in the diagnosis of gastrointestinal infections

Supervisors:

Dr. Jamie Twycross (Computer Science)
Dr. Mathew Diggle (Consultant Clinical Scientist, Department of Clinical Microbiology)

Description:

Gastrointestinal infections are a very common medical condition caused by micro-organisms such as bacteria and viruses [1]. Diagnosis of stool samples is often used to determine the cause of infection, and hence an appropriate treatment. However, diagnosis can be a time- and resource-consuming process, and a more quantitative understanding of the diagnostic process could ultimately lead to improvements such as better outcomes for patients (e.g. quicker diagnosis) and along with time and cost savings for health care providers.

The aim of this project will be to build a prototype software system to model the complexity of the clinical diagnostic process for gastrointestinal infections. Process complexity measures have been used in a number of different areas such as business process management and software engineering (see [2] for a good overview of complexity measures for business processes). The software system you develop will employ similar approaches to gain a quantitative understanding of the diagnostic process. The software should: (1) model and visualise the steps involved in the diagnostic process (for example, using process maps or charts); (2) calculate a variety of complexity measures for the process (for example, coefficient of network complexity, complexity index). As well as working with myself on the Computer Science aspect of the project, you will be expected to work with clinicians in the Microbiology Laboratory of the Queens Medical Centre to understand and quantify the complexity of the diagnostic process. A good maths and programming background is essential for this project.

References:

[1] Diarrhoea and vomiting (gastroenteritis). NHS Choices. (link)

[2] Finding a Complexity Measure for Business Process Models. Antti Latva-Koivisto, 2001. (pdf)

Visualising synthetic chemical reaction variables

Supervisors:

Dr. Jamie Twycross (Computer Science)
Dr. James Dowden (Chemistry)

Description:

The history of synthetic reactions have been recorded in the literature and archived in databases such as Reaxys [1] and Chemical Abstracts [2] (see [3] for a review). Currently, one can investigate this history of synthesis by drawing chemical sub-structures, but the returned information is typically a vertical stack of individual records, although it is possible to individually filter various reaction variables (e.g. solvent), to assist navigation.

The aim of this project will be to develop and implement a more user-friendly and powerful way of visualising reaction variables embedded in the synthetic chemical literature. Ideally, the user should be able to view key parameters as clusters in order to quickly decide which are most suitable for a planned transformation. Good software development skills are essential for this project, and ideally some knowledge of chemistry. You will be expected to work with researchers in Chemistry as well as Computer Science to complete this project.

References:

[1] Reaxys

[2] Chemical Abstracts Service

[3] A Short Review of Chemical Reaction Database Systems, Computer-Aided Synthesis Design, Reaction Prediction and Synthetic Feasibility. Wendy A. Warr. Molecular Informatics, 33:469–476, 2014. doi:10.1002/minf.201400052

www:    https://www.cs.nott.ac.uk/~pszjpt
email:  jamie.twycross AT nottingham.ac.uk
office: B48 School of Computer Science
        Jubilee Campus
        University of Nottingham
        Wollaton Road
        Nottingham NG8 1BB
        U.K.