MSC Dissertation

Student projects
A dense simplex solver for HiGHS

Project supervisor: Julian Hall
Project type
standard
Suitable degrees
Operational Research with Computational Optimization

Operational Research with Data Science
Project description
HiGHS is a high performance open-source software suite being developed by Julian Hall and his PhD students. It is built on an
efficient sparsity-exploiting implementation of the dual revised simplex method. However, for problems that are sufficiently dense,
the techniques for exploiting sparsity become inefficient, in terms of both computational cost and storage requirement. For most
computational components of the revised simplex method, their implementation within HiGHS using dense matrix utilities is
relatively routine, allowing some study of cache usage for further performance optimization. The more major challenge is
understanding and implementing the Fletcher-Matthews technique for updating the dense LU decomposition of the basis matrix
following a basis change. This project will also look at specific performance issues in the context of medical data science
applications yielding large scale dense LP problems.
Prerequisites
Fluent programming skills in C++
Recommended reading
Computational Techniques of the Simplex Method by Istvan Maros
Fletcher, R. and Matthews, S.P.J., 1984. Stable modification of explicit LU factors for simplex updates. Mathematical Programming,
30(3), pp.267-284.
Page 1/108
A primal revised simplex solver for HiGHS
Project type
standard
Suitable degrees

Project description
HiGHS is a high performance open-source software suite being developed by Julian Hall and his PhD students. It is built on an
efficient implementation of the dual revised simplex method, which is preferred over the primal revised simplex method in many
applications and for most classes of LP problems. However, in several important situations, it is important to use the primal revised
simplex method to exploit the structure of an LP problem and information known about the optimal solution. This project will
review advanced algorithmic enhancements and computational techniques in the primal revised simplex method and implement
them within HiGHS.
Prerequisites
Fluent programming skills in C++
Recommended reading
Computational Techniques of the Simplex Method by Istvan Maros
Page 2/108
A toolset to model facilities location distribution problems
Project supervisor: Tim Pigden
Suitable degrees
Operational Research
Operational Research with Risk
Statistics and Operational Research
Project description
Many companies face the problem of locating warehouses and transshipment points, balancing transport costs against costs of
warehouses and inventory stock holding. While this area has been the subject of significant academic research, there is a
comparative lack of commercially available solutions that can be used by companies and consultants to assist in this type of
planning. The commercial optimisation software LocalSolver is marketed as a solution to many optimisation problems including this
type of supply-chain problem.
The objective of this project is to investigate the capabilities of the LocalSolver software for solving different types of facilities
location problem, possibly in conjunction with data and parameters generated with vehicle routing software packages. The exact
problem to work on is still open, but Optrak will assist in formulation of scenarios that have been encountered in the course of its
business and can provide access to its own software and data.
Optrak, the project sponsor, produces vehicle routing software and carries out consultancy to assist in various supply chain decision
making processes. It is specifically interested in whether a tool such as LocalSolver can significantly reduce the time taken by and
increase the scope of location distribution analysis; improving the ability of its clients to make well informed decisions to increase
efficiency and reduce costs and environmental impact of distribution operations.
Prerequisites
Good programming skills are required, as well as the course "Risk and Logistics".
Page 3/108
A toolset to model multi-echelon Urban Consolidation Centres
Project supervisor: Tim Pigden
Suitable degrees
Project description
Urban consolidation centres have the potential to significantly reduce the number of trucks going into urban environments across
many distribution activities, including cold-chain deliveries and other food services. This provides major environmental benefits as
well as reducing overall distribution costs. While smaller cities can utilities a single distribution centre and multiple local hubs,
mega-cities are likely to require multiple distribution centres on the periphery as well as local centres for distribution, hence the
particular interest in multi-echelon systems.
There are no readily available off-the-shelf products that allow urban planners or consultants to model these problems. However,
there do exist a number of tools, such as Vehicle Routing Software, commercial tool LocalSolver, open-source LP solvers and
constraints programming tools.
The aim of this project is to formulate practical approaches for tackling these problems, using the most appropriate available
software and understanding the approximations and other simplifications necessary in these approaches.
Optrak, the project sponsor, produces vehicle routing software and also carries out consultancy in related areas. Optrak will provide
access to its own software and is in a position to assist with problem definition and generation of test data. The company is
particularly interested in exploring the potential of LocalSolver for this problem.
Prerequisites
Good programming skills are required, as well as the course "Risk and Logistics".
Page 4/108
Algorithmic approaches to semidefinite optimization with applications
Project supervisor: Miguel Anjos
Project type
standard
Suitable degrees
Project description
This project is concerned with implementing and testing a new algorithm to solve semidefinite optimization problems. Semidefinite
optimization is relevant to a range of practical applications in areas such as control theory, circuit design, sensor network
localization, and principal component analysis [1].
Although several software packages can handle semidefinite optimization problems, solving large-scale instances in practice
remains a challenge. The objective of this project is to experiment with a new algorithm that uses a sequence of linear optimization
problems to solve a semidefinite problem.
In this project we will focus on an application in finance, namely the computation of the nearest correlation matrix [2]. The
well-known Markowitz portfolio optimization model requires as input the correlation matrix of all possible assets, but frequently the
data available is insufficient or too inaccurate to reliably estimate the pairwise asset correlations.
The task is to estimate the (unknown) symmetric positive semidefinite covariance matrix using only the available information by a
least-squares-like approach.
Objectives:
- study the theoretical background of the algorithm;
- understand the nearest correlation matrix problem and its formulations, especially those eligible for a semidefinite approach;
- implement (in Matlab)the new algorithm for semidefinite optimization;
- use this implementation to test the new algorithm on instances of the nearest correlation matrix problem.
Prerequisites
- Fluency in Matlab, knowledge of linear algebra.

- Very good grasp of FuO (mark above 70%) and ODS.
Recommended reading
[1] Anjos, Miguel F., and Jean B. Lasserre. "Introduction to semidefinite, conic and polynomial optimization." Handbook on
semidefinite, conic and polynomial optimization. Springer US, 2012. 1-22.
https://scholar.google.ca/scholar?oi=bibs&cluster=7157849937737684992&btnI=1&hl=en
[2] Nicholas J. Higham, Computing the Nearest Correlation Matrix-A Problem from Finance, IMA Journal of Numerical Analysis
22 (2002) No 3, 329-343. http://scholar.google.co.uk/citations?user=EYlMkOgAAAAJ&hl=en&oi=ao
Page 5/108
Approximate Bayesian Computation (ABC) for 'large" data
Project supervisor: Ruth King
Project type
standard
Suitable degrees
Project description
As data increases in size and/or models increase in complexity fitting the models to the data can become increasingly time
consuming. This project will investigate the use of sub-sampling the data in order to obtain a reduced dataset from which a standard
Bayesian analysis can be conducted and a sample from the posterior obtained using standard computational techniques, such as
Markov chain Monte Carlo. The sampled posterior values will then be used within an approximate Bayesian computation approach
to "correct" the sampled posterior values obtained from the sub-sampled data to obtain an approximate sample from the posterior of
the full dataset. The project will not only investigate the ABC algorithm but also investigate the efficiency of different sub-sampling
techniques for a particular application area identified by the student.
Prerequisites
Bayesian Theory, Bayesian Data Analysis, Generalised Regression Models; Statistical Programming
Recommended reading
Handbook of Approximate Bayesian Computation. Edited by Sisson, Fan and Beaumont. CRC Press.
Page 6/108
Assessing risk of electricity capacity shortfalls using extreme value methods
Project supervisor: Chris Dent
Project type
standard
Suitable degrees
Project description
Ensuring an appropriately high level of security of supply is one of the key issues in management of electric power systems and
markets, and thus associated risk assessment is a major topic in electric power system analysis. There has been renewed research
interest in recent years due to the advent of high capacities of renewable generation in many systems, whose availability has very
different statistical properties from that of conventional fossil fuel powered generation (as its availability is primarily determined by
the wind or solar resource, rather than by mechanical availability). Within such risk assessments it is necessary to build a joint
statistical model of demand and available renewable capacity (a statistical association between these is naturally expected, as in most
power systems temperature has an influence on demand through heating or air conditioning load, and available renewable capacity is
clearly driven by the weather). This project will take extreme value statistical methods developed for modelling wind and demand in
the Great Britain system, and apply them to data from the Texas system (which has very different characteristics compared to GB,
with the main annual peak in summer driven by air conditioning rather than in the winter driven by heating and lighting). The
models developed will then be incorporated into risk assessments for the Texas system.
Prerequisites
Ideally, familiarity with applied statistical modelling, preferably in R.

This project has been listed for some OR MSc variants - however without having taken some optional courses in statistics within the
MSc (or equivalent background), the project would be challenging.
Recommended reading
https://www.dur.ac.uk/dei/resources/briefings/blackouts/
http://sites.ieee.org/pes-rrpasc/files/2015/09/AmyWilson.pdf
http://sites.ieee.org/pes-rrpasc/files/2016/08/Amy-Wilson-Durham-University-Accounting-for-wind-demand-dependence-when-esti
mating-LoLE.pdf
Page 7/108
Assessing uncertainty in planning background for government investment decisions
Project type
standard
Suitable degrees
Project description
Capital planning decisions are, by their nature, taken against uncertainty in the future planning background. Moreover, directly
relevant data are commonly not available - the future has not yet happened, so the possibility of changes in drivers of service
demands typically makes historic data at best indirectly relevant.
This project will develop methods for assessing uncertainty in planning background and using this in decison making. In some cases
assessment of future demand might be assessed directly, in other cases it may be necessary to do some kind of system modelling
(e.g. in the case of electric vehicles, charging point demand is a consequence of number of vehicles combined with usage patterns).
Possible applications include electric vehicle and school capacity planning in the city of Edinburgh, assessing need for newly
qualified teachers, and planning of new prison capacity. Detail of application direction can be discussed with a student carrying out
the project, as there are options to take this in a statistical, simulation or optimization direction depending.
Prerequisites
Familiarity with applied statistical modelling, preferably in R.

Recommended reading
http://www.edinburgh.gov.uk/news/article/2556/edinburgh_blazes_green_trail_with_new_electric_vehicle_infrastructure_plan
https://www.nao.org.uk/wp-content/uploads/2016/02/Training-new-teachers.pdf
Page 8/108
Assigning Students to Learning Groups
Project supervisor: Joerg Kalcsics
Project type
standard
Suitable degrees
Project description
This project focusses on models and heuristic algorithms to assign students to learning groups. Several studies suggest that it is
desirable to create heterogeneous groups that include students of different genders, different cultural and academic backgrounds. The
more diverse groups are, the deeper the perspective each student can gain from his or her peers. The goal is therefore to assign
students to groups such that all students who are assigned to the same group are as "diverse" as possible. A common approach to
measure diversity is to quantify the characteristics of individual students using a binary attribute matrix that specifies for each
student whether he or she exhibits a certain attribute or not. Based on that, one can compute how "similar" two students are by
calculating, for example, the weighted sum of differences between the attribute values for all characteristics. The goal of the problem
can then be rephrased as assigning students to study groups such that a function of the pairwise distances between all students who
are assigned to the same group is as large as possible (larger = more diverse).
The abstract version of this problem is known as the maximum dispersion or maximum diversity problem. In this problem, we are
given a set of objects where each object has a non-negative weight assigned to it. Moreover, we are given a measure of distance
between each pair of objects. In addition, a number of groups is given and each group has a target value in terms of the sum of
weights, where the target value may be different for each group. The task is to exclusively assign each object to a group such that the
minimal pairwise distance or the sum of pairwise distances between two objects that are assigned to the same group is maximised
and such that the actual weight of a group meets its target weight. Hereby, the weight of a group is the sum of the weights of objects
assigned to it.
The goal of this project is to develop and inplement an efficient heuristic for the problem. There is no preference for a certain
heuristic. A natural choice would be a local search heuristic that is embedded in a Tabu Search or Variable Neighborhood Search
framework. The basic idea of a local search heuristic is to modify a given solution by moving, for example, a single object from its
current group to another group or to swap two objects that belong to different groups. If the exchange yields an improvement, we
accept it; otherwise, we reverse it and try a different exchange. We keep on doing this until we can no longer improve the solution,
i.e. until we have run into a local optimum. The heuristic frameworks Tabu Search and Variable Neighborhood Search then differ in
the way they try to escape from a local optimum and find another local optimium. A second type of heuristics that would be suitable
are matheuristics that combine concepts from mathematical programming and heuristic design. The idea of a matheuristic is to solve
the problem optimally just for a subset of the unknowns while keeping all other decisions fixed. The task is then to develop heuristic
strategies to select a "good" subset of unknows to be optimised.
Prerequisites
Page 9/108
Good programming skills. For the matheuristics knowledge of Xpress Mosel would be desirable.
Recommended reading
E. Fernandez, J. Kalcsics, S. Nickel, The Maximum Dispersion Problem, Omega 41, 721-730 (2013)
Page 10/108
Automatic Classification of Model Issues in Counterparty Credit Risk
Project supervisor: John Faben
Suitable degrees
Project description
In order to calculate counterparty credit risk exposure, trade exposure simulated across a long time
horizon. The data required for this calculation comes from a variety of sources, including market data,
trade specifics and information on legal agreements. When these simulations fail, there are therefore
a variety of different possible points of failure. Currently, identifying what has caused a particular
trade to fail requires a large amount of repetitive manual work. This project will explore whether the
identification of the reasons for failure for a particular trade can be made more efficient using
machine learning to consider the relevant properties of that trade.
Prerequisites
Student will need some knowledge of machine learning techniques, and some programming experience.
Page 11/108
Bayesian analysis of single cell genomic data
Project supervisor: Natalia Bochkina
Project type
standard
Suitable degrees
Project description
Recent breakthrough in technology allows to observe concentration of molecules in a single cell. The aim of the project is to review
the current literature on the Bayesian and frequentist models of single cell data, identify the gaps and develop a novel model for a
real data single cell data.
Prerequisites
Bayesian Data Analysis
Recommended reading
Greg Finak et al (2015) MAST: a flexible statistical framework for assessing transcriptional changes and characterizing
heterogeneity in single-cell RNA sequencing data, Genome Biology, v.16, p.278.
Davide Risso et al (2018) A general and flexible method for signal extraction from single-cell RNA-seq data. Nature
Communications, volume 9, Article number: 284
CA Vallejos, S Richardson and JC Marioni (2016) Beyond comparisons of means: understanding changes in gene expression at the
single-cell level. Genome Biology 17:70
Page 12/108
Bayesian cluster analysis
Project supervisor: Sara Wade
Project type
standard
Suitable degrees
Project description
Clustering is widely studied in statistics and machine learning, with applications in a variety of fields. However, an important
problem, common to all clustering methods, is how to choose the number of clusters. In popular algorithms such as agglomerative
hierarchical clustering or k-means, the number of clusters is fixed or selected based on specified criteria and a single clustering
solution is returned. On the other hand, Bayesian model-based clustering provides a formal framework to assess uncertainty in the
cluster structure and the number of clusters through the posterior over the space of partitions. This project will consider and compare
three Bayesian approaches to incorporate uncertainty in the number of clusters: 1) hierarchical approaches that include a prior over
the number of clusters, 2) sparse overfitted mixtures that automatically prune unnecessary components, and 3) nonparametric infinite
mixtures.
Prerequisites
Bayesian Theory
Recommended reading
Wade and Ghahramani (2018). "Bayesian Cluster Analysis: Point Estimation and Credible Balls"
Malsiner-Walli (2016). "Model-based clustering based on sparse finite Gaussian mixtures."
Page 13/108
Bayesian decision analysis for electricity capacity procurement
Project type
standard
Suitable degrees
Project description
the wind or solar resource, rather than by mechanical availability).
In optimising procurement of electricity capacity, it is necessary to consider a full range of modelling uncertainties if one wishes to
take robust decisions for the real world. This includes not only uncertainty in planning background and in random fluctuations in the
availability of renewable and conventional generation, but also uncertainty over the statistical properties of generation availability
and demand (incluing the consequences of having finite histories of relevant data). All these uncertainties can naturally be combined
in a Bayesian decision analysis framework, where all uncertainties are quantified as probability distributions. Depending on the
interests of the student the project could take a more conceptual or computational direction.
Prerequisites

Recommended reading
mating-LoLE.pdf
https://ieeexplore.ieee.org/document/5378484
Page 14/108
Bayesian estimation of ROC curves for biomarkers subject to a limit of detection
Project supervisor: Vanda Inacio De Carvalho
Project type
standard
Suitable degrees
Project description
Accurate diagnosis of disease is of fundamental importance in clinical practice and medical research. The receiver operating
characteristic (ROC) curve is the most widely used tool for evaluating the discriminatory ability of a continuous biomarker/medical
test. In many clinical settings, due to technical limitations, measurements below a certain limit of detection cannot be obtained.
Ignoring observations below the limit of detection leads to biased estimates of the ROC curve and its corresponding area under the
curve, which is a popular summary measure of diagnostic accuracy. The aim of this project is to propose a Bayesian method that
properly takes into account the observations below the limit of detection, thus leading to valid estimates. An extensive simulation
study comparing the performance of the proposed method with the existing approaches will be conducted. Finally, an application to
data on the accuracy of glucose levels as a biomarker of diabetes will be investigated.
Prerequisites
Knowledge of R and Bayesian methods.
Recommended reading
Pepe, M. S. (1997). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford Statistical Science Series.
Perkins, N. J., Schisterman, E. F., and Vexler, A. (2007). Receiver operating characteristic curve inference from a sample with a
limit of detection. American Journal of Epidemiology, 165, 325?333.
Page 15/108
Bayesian misspecified models
Project type
standard
Suitable degrees
Project description
Many statistical models used for big data are often misspecified, ie the assumed distribution of the data often does not corresponds to
the true distribution of the data. This implies that the corresponding statistical inference, such as confidence intervals or hypothesis
testing, can be incorrect. Considering a tempered likelihood has shown to be an effective method to adjust for model
misspecification and to lead to asymptotically valid statistical inference in a Bayesian setting. This project has scope for applied
work by creating code and applying this method to real data.
Prerequisites
Bayesian Data analysis or Bayesian Theory
Recommended reading
P.D. Grünwald. Safe Probability. Journal of Statistical Planning and Inference 195, 2018, pp. 47-63
P.D. Grünwald and T. van Ommen. Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for
Repairing It . Bayesian Analysis, 2017, pp. 1069-1103
Page 16/108
Bias Reduction in Stochastic Programs Using Latin Hypercubes
Project supervisor: Burak Buke
Project type
standard
Suitable degrees
Project description
In real life many decision problems should be solved under the presence of uncertainty in parameters. Stochastic programming
mainly deals with optimizing expectation of a function which depends on some random parameters. When the parameters follow a
continuous distribution or has a large support, it is not practical or sometime not possible to evaluate expectation exactly. Sampling
from the underlying distribution is a common approach to address this problem. However, it is known that the objective values are
biased and even though the estimators are asymptotically unbiased, it is not possible to assess the value of bias for a finite sample. In
this work, we will analyze how the Latin hypercube sampling affects the optimality bias. We will first investigate the problem
analytically and then employ numerical schemes to evaluate the effect of Latin hypercubes.
Prerequisites
Simulation
Fundamentals of Optimization
Page 17/108
C++ implementation of efficient interior point method for large-scale truss topology optimization.
Project supervisor: Alemseged Weldeyesus
Project type
standard
Suitable degrees

Project description
Trusses are engineering structures that consist of straight members or bars connected at joints. Given some loading conditions, we
are concerned with finding the lightest truss structure that can sustain the given set of loads. The problems are usually modelled
using a ground structure approach in which a finite set of nodes are distributed in the design domain and are connected by the
potentials bars. The design variables are then the cross-sectional areas and internal forces of these bars.
We are concerned with problems that can be formulated as linear programming. However, they often lead to a very large scale
problems due to the large number of possible connections between each of the nodes, i.e., for n nodes, we will have n(n-1)/2
potential member bars. Then, the problems become computationally challenging. This imposes further requirements to optimization
method. Recently, a specialized interior point method for such problems is proposed in [1], which employs several novel approaches
such as column generation, warm-start strategy, the exploitation of sparsity structures, and the use iterative methods for the linear
systems originating in the interior point algorithm.
The purpose of this project is to develop a C++ implementation of the efficient interior point method described in [1].
References
[1] Weldeyesus, A. G., Gondzio, J.: A specialized primal-dual interior point method for the plastic truss layout optimization.
Computational Optimization and Applications 71(3), 613?640 (2018)
Page 18/108
Prerequisites
Familiarity with large-scale optimization problems, interior point methods, and linear algebra, and a good programming skill in C++
are requirements for the project.
Page 19/108
Calculation of bounds for the traveling salesman problem
Project supervisor: Sergio GarcÃa Quiles
Project type
standard
Suitable degrees
Project description
The Traveling Salesman Problem (TSP) is one of the most widely studied problems in Operational Research. Given a set of nodes
(cities), we need to find a route that visits each city exactly once and ends at the node we started the route (e.g., the home of a seller).
This problem has many applications not only in logistics but also in other areas such as drilling holes in circuit boards, astronomy,
and DNA sequencing.
The TSP is very difficult to solve because its formulation has an exponential number of constraints. Therefore, an important effort in
research is devoted to the calculation of bounds that help to solve this problem faster, for example when using branch-and-bound.
The goal of this project is to do a literature review of these methods and to choose some of them that will be coded and compared.
Examples (but the project is not restricted to them) are Clarke-Wright savings algorithm, moat packing, and the Lin-Kernighan
heuristic.
Prerequisites
- Integer programming.
- Experience with an optimization solver (e.g., Xpress).
- Programming skills (e.g., C++).
Recommended reading
- "The traveling salesman problem", D.L. Applegate, R.E. Bixby, V. Chvátal, and W.J. Cook, Princeton University Press (2007).
- "Scheduling of vehicles from a central depot to a number of delivery points", G.U. Clarke, and J.W. Wright. Operations Research
12 (4): 568?581 (1964).
"New primal and dual matching heuristics", M. Junger and W. Pulleyblank. Algorithmica 13:357-380 (1995).
- "An effective heuristic algorithm for the traveling-salesman problem", S. Lin and B.W. Kernighan. Operations Research 21 (2):
498?516 (1973).
Page 20/108
Censoring in multi-list data
Project supervisor: Serveh Sharifi Far
Project type
standard
Suitable degrees
Project description
The project will consider the analysis of multi-list data in the presence of censored cell observations. For example, due to the data
collection process some cells may be left censored so that the cell entry corresponds to only an upper limit of the cell entry.
Alternatively cell entries <5 are often censored for data protection issues. The aim of this project is to develop a model-fitting
approach for fitting log linear models to such contingency table data in the presence of censored cells. The effect of censoring the
cells will be investigated for simulated and/or real data.
Prerequisites
Familiarity with statistical modelling in R, and Generalised linear models.
Recommended reading
Bishop, Fienberg and Holland (1975), Discrete Multivariate Analysis: Theory and Practice. M.I.T. Press, Cambridge, MA.
Overstall, A. M., King, R., Bird, S. M., Hutchinson, S. J. and Hay, G. (2014), Incomplete Contingency Tables with Censored Cells
with Application to Estimating the Number of People who Inject Drugs in Scotland. Statistics in Medicine 33 1564-1579.
Page 21/108
Challenging mixed integer nonlinear programming problems for the maintenance planning for
hydropower plants
Project type
standard
Suitable degrees
Project description
The aim of the project is to study a crucial problem in energy management: the optimal maintenance planning for hydroelectric
power generation plants.
This is a challenging optimization problem and it is of great practice importance because the strategic decisions taken during
maintenance planning directly impact the availability of hydropower stations. This availability is a key input for the unit
commitment decisions taken daily by utility companies.
The objective of the project is to propose novel mixed integer nonlinear optimization approaches that take into account both the
standard constraints in maintenance planning for hydropower plants and the nonlinear aspects of the power output function, often
linearized in the literature but with loss of information.
This project will be carried out in collaboration with researchers from CNRS & École Polytechnique (France), and EDF R&D
(France).
Recommended reading
https://ieeexplore.ieee.org/abstract/document/8353798
Page 22/108
Column Generation for Combinatorial Optimization Problems
Project supervisor: Jacek Gondzio
Project type
standard
Suitable degrees
Project description
This project will require to study and implement a column generationformulation of a particular combinatorial optimization
problem.The approach relies on an existing primal-dual column generationalgorithm which has already been used in a number of
applications: - the multicommodity network flow problem (MCNF), - the cutting stock problem (CSP), - the vehicle routing problem
with time windows (VRPTW), and - the capacitated lot sizing problem with setup times (CLSPST).The student may choose their
class of combinatorial optimizationproblem, such as for example, travelling salesman problem (TSP),Hamiltonian cycle problem
(HCP), quadratic assignment problem (QAP),or some other relevant OR problem.Students who took the OR in Telecommunication
course may suggestan interesting application which arises in telecom industry.
Objective: - study the theoretical background of the chosen problem; - study its formulations, especially those eligible for column
generation; - implement the approach within the context of PDCGM, Primal-Dual Column Generation Method based on HOPDM
software; - apply this implementation to solve real-life public domain instances of problems.
Prerequisites
- fluency in the C programming language,

- very good grasp of FuO (mark above 70%) and ODS and CO
Recommended reading
J. Gondzio, P. Gonzalez-Brevis, P. Munari

New Developments in the Primal-Dual Column Generation Technique
European Journal of Operational Research 224 (2013) 41--51.
DOI: http://dx.doi.org/10.1016/j.ejor.2012.07.024
(http://www.maths.ed.ac.uk/~gondzio/reports/pdcgm2011.html)
J. Gondzio, P. Gonzalez-Brevis, P. Munari

Large-Scale Optimization with the Primal-Dual Column Generation Method
http://www.maths.ed.ac.uk/~gondzio/reports/pdcgmDemo.html
{it Mathematical Programming Computation} 8 (2016), pp. 47--82.
DOI: 10.1007/s12532-015-0090-6
Page 23/108
(http://www.maths.ed.ac.uk/~gondzio/reports/pdcgmDemo.html)
Software available: Primal-Dual Column Generation Method (PDCGM):

http://www.maths.ed.ac.uk/~gondzio/software/pdcgm.html
Page 24/108
Comparative study of nonparametric ROC regression techniques
Project supervisor: Vanda Inacio De Carvalho
Project type
standard
Suitable degrees
Project description
The statistical evaluation of diagnostic and screening procedures, such as biomarkers and imaging technologies, is of great
importance in public health and medical research. The receiver operating characteristic (ROC) curve is a popular tool for evaluating
the performance of continuous markers and it is widely used in medical studies. It has been recently recognized that several factors
can affect the marker distribution beyond the disease status; examples of such factors include different test settings and
subject-specific characteristics (e.g., age or gender). For instance, in this project, we are interested in evaluating the influence of age
on the performance of blood glucose levels to accurately diagnose individuals with diabetes. Ignoring the covariate information may
yield biased or oversimplified inferences. Several methods have been proposed to assess covariate effects on the ROC curve. The
so-called "induced methodology" models the distribution of the marker in healthy and diseased populations separately and then
computes the induced ROC curve. Recently, some approaches have been developed for nonparametric estimation of the
covariate-specific ROC curve, within the induced methodology. The aim of this project is to compare the performance of
nonparametric methods based on kernel and splines techniques to estimate the covariate-specific ROC curve. An extensive
simulation study will be conducted and an application to the aforementioned diabetes study will be provided.
Prerequisites
Knowledge of R. Knowledge of basic nonparametric techniques is useful but not mandatory.
Recommended reading
Pepe, M. S. (1997). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford Statistical Science Series.
Rodriguez-Alvarez, M.X., Tahoces, P.G., Cadarso-Suarez, C., and Lado, M.J. (2011). Comparative study of ROC regression
techniques?Applications for the computer aided diagnostic system in breast cancer detection. Computational Statistics and Data
Analysis, 55, 888-902.
Rodriguez-Alvarez, M.X., Roca-Pardiñas, J., and Cadarso-Suarez, C. (2011). ROC curves and covariates: extending induced
methodology to the non-parametric framework. Statistics and Computing, 21, 483?499.
Page 25/108
Comparing environmental and behavioural factors in determining zoonotic disease risk
Project supervisor: Ioannis Papastathopoulos
Project type
standard
Suitable degrees
Project description
Many emerging or re-emerging diseases have zoonotic origins and some have had severe consequences for public health (e.g.
Ebola). Vietnam has been identified as a ?hotspot' for emerging zoonotic diseases which is thought to be due to the rapid
environmental changes which have occurred in Vietnam in recent years, as well as social and cultural practices facilitating close
contact between humans, livestock and wildlife (Jones et al. 2008). VIZIONS (Vietnam Initiative on Zoonotic InfectiONS) was a
multidisciplinary Vietnam-based project established to increase understanding of the origins, risks and emergence of zoonotic
infections (Rabaa et al. 2015). Data on disease symptoms, demographics, and animal contact behaviour were collected from hospital
patients with enteric disease using questionnaires distributed at six hospitals throughout Vietnam from 2012 to 2016. Clinical
specimens were also collected from patients to test for pathogens associated with zoonotic infections and environmental data were
collated from publically available datasets. This project aims to determine whether local environmental factors (e.g. land use,
livestock density, and climate) or behavioural factors (e.g. animal contact and water use) are better predictors of zoonotic disease
risk in hospital patients in Vietnam.
The student will carry out statistical modelling using the VIZIONS dataset to identify the most important risk factors for testing
positive for a zoonotic pathogen, and determine whether individual human behaviour or local environment better predict risk of
contracting a zoonotic infection.
References:
Jones KE, Patel NG, Levy MA, Storeygard A, Balk D, Gittleman JL & Daszak P (2008) Global trends in emerging infectious
diseases. Nature 451(7181): 990?994. doi:10.1038/nature06536
Rabaa, M. A., Tue, N. T., Phuc, T. M., Carrique-Mas, J., Saylors, K., Cotten, M., et al? Baker, S. (2015). The Vietnam Initiative on
Zoonotic Infections (VIZIONS): A Strategic Approach to Studying Emerging Zoonotic Infectious Diseases. EcoHealth, 12(4),
726?735. doi:10.1007/s10393-015-1061-0
Recommended reading
Coker, R. J., Hunter, B. M., Rudge, J. W., Liverani, M., & Hanvoravongchai, P. (2011). Emerging infectious diseases in southeast
Asia: Regional challenges to control. The Lancet, 377(9765), 599?609. https://doi.org/10.1016/S0140-6736(10)62004-1
Page 26/108
Graham et al. 2004. Spatial analysis for epidemiology. Acta Tropica 91, 219-225. https://doi.org/10.1016/j.actatropica.2004.05.001.
Diggle, P.J. 2003. Statistical analysis of spatial point patterns. Second edition. Oxford University Press
Page 27/108
Comparison of various linearizations of quadratic programmes
Project type
standard
Suitable degrees
Project description
In practice quadratic programming problems are sometimes solved by creating an equivalent linear model. There are a number of
different approaches to linearizing (mixed-integer) quadratic programming instances. They vary in the number of additional
variables and constraints, and in tightness. Additionally, most linearization induce some sort of structure in the constraints that can
be exploited by the linear solver. The goal of this dissertation project is to study and implement a few different linearization
approaches and compare them via numerical experiments. The comparison will be performed using problems from QPLIB, a library
of quadratic programming instances of various types (convex/nonconvex, continuous/binary/mixed-integer, sparse/dense) and sizes.
Prerequisites
Fluent programming skills, preferably C/C++
Recommended reading
Sherali, Smith 2007: An improved linearization strategy for zero-one quadratic programming problems
Page 28/108
Computational study of the exactness of the semidefinite relaxation for large-scale optimal power flow
benchmarks
Project type
standard
Suitable degrees
Project description
This project is concerned with analysis of semidefinite optimization relaxations of Optimal Power Flow (OPF).
The OPF problem is an important nonlinear nonconvex optimization problem which minimizes the electricity generation costs while
taking into account the constraints of the high voltage transmission network represented by the nonconvex power flow
equations. Recently semidefinite relaxations of OPF have been introduced that produce very good lower bounds. In some cases the
relaxation is even exact, and a global solution may be computed from the optimal solution of the relaxation. Moreover,
decomposition schemes using graph theory applied to the graph of the physical electric grid allow to decompose the large
semidefinite constraints into smaller ones; as a consequence, relaxations of OPF may be solved even for large-scale realistic
instances with more than 10,000 variables.
Many benchmarks have been published in repositories such as Matpower, Nesta, PGlib. Their OPF formulations may have
different characteristics. Specifically, the objective function (linear or quadratic) or the constraints (different ways of representing
maximum power on the lines) are not always the same.
The objective of this project is to carry out a thorough computational analysis of the exactness of the semidefinite relaxation in terms
of its dependence on the characteristics (objective function, flow constraints) of the formulation used. The project begins with
collecting all the available benchmarks and solving the semidefinite relaxation of each of them to determine the status (exact or not)
of the relaxation. In a second step, different variations of the characteristics will be considered for each benchmark instances, and for
each of them the exactness of the relaxation will again be determined. The set of results will then be analyzed using appropriate
statistical tools to quantify as precisely as possible the dependence of the exactness of the relaxation on the characteristics of the
formulation solved.
This project will be carried out in collaboration with RTE (France).

Prerequisites
Page 29/108
- Fluency in Matlab, knowledge of linear algebra
- Very good grasp of FuO (mark above 70%) and ODS
Recommended reading
Christian Bingane, Miguel F. Anjos, Sébastien Le Digabel. Tight-and-cheap conic relaxation for the optimal reactive power dispatch
problem. To appear in IEEE Transactions on Power Systems. https://www.gerad.ca/en/papers/G-2018-76/view
Anjos, Miguel F., and Jean B. Lasserre. "Introduction to semidefinite, conic and polynomial optimization." Handbook on
semidefinite, conic and polynomial optimization. Springer US, 2012. 1-22.
https://scholar.google.ca/scholar?oi=bibs&cluster=7157849937737684992&btnI=1&hl=en
Page 30/108
Considering load uncertainty in truss topology optimization with integer variables
Project supervisor: Alemseged Weldeyesus
Project type
standard
Suitable degrees

Project description
Trusses are engineering structures that consist of straight members or bars connected at joints. The mathematical models for
optimization of truss structures are usually formulated using the so-called ground structure approach, where a set of nodes are
distributed in the design domain and the interconnecting bars are generated.
In this project, we will fuss when the design variables are the discrete finite set of cross-sectional areas of these bars. Moreover,
due to the practical uncertainty in the applied loads, we will choose the loads from ellipsoids to promote robustness of the optimal
designs.
The goal of the project is to propose a (mixed-integer) semidefinite programming problem formulation for robust truss topology
optimization based on [1] and [2].
[1] Kocvara, M.: Truss topology design with integer variables made easy, Opt. Online (2010)
[2] Ben-Tal, A., Nemirovski, A.: Robust truss topology optimization via semidefinite programming. SIAM J. Optim. 7, 991?1016
(1997)
Prerequisites
Familiarity with semidefinite programming and skill in Matlab are requirements for the project.
Page 31/108
Dynamic scenario selection for stochastic programming
Project supervisor: Andreas Grothey
Project type
standard
Suitable degrees
Project description
The second assignment for the course "Optimization Methods in Finance" was using the following setup: You seek to divide your
money into 8 different investment opportunities in order to maxmimise expected return (actually minimize expected shortfall
relative to a return target). Future return of the investments are unknown but are assumed to be described (exactly) by 100000
(equally likely) future scenarios. Your aim is to find the investment that minimizes expected shortfall subject to these 100000
scenarios.
Unfortunately solving this problem is very expensive, so you seek a cheaper way of solving the problem approximately. In OMF
assignment 2 various avenues were explored, for example only using 10 well chosen scenarios as an approximation (these can be
found by clustering) or using stochastic gradient methods.
The aim of this project is to follow up with more sophisticated approaches:
- After solving the problem with 10 scenarios one could try to identify which other scenarios should be added (or how the 10
scenarios should be modifed) in order to improve the performance of the investment.
- One could try to estimate the difference between the predicted performance of the investment (evaluated over the 10 chosen
scenarios) and the real performance (measure by evaluating expected shortfall over all 100000 scenarios) - i.e. the model error and
use this estimate as a correction to the 10-scenario model.
Prerequisites
While this project follows on naturally from Optimization Methods in Finance, it is not necessary that students have attended this
course, not is any previous knowledge of finance required. Indeed the same problem can be posed in any optimization problem that
includes uncertainty.
Page 32/108
Eat and grow old â€“ interplay of ingestion and ageing in immune cells
Project supervisor: Linus Schumacher
Project type
standard
Suitable degrees
Project description
Mathematical models in biology can be used to predict the macroscale behaviour of cell populations from the microscale dynamics
of individual cells. This project will use stochastic descriptions at the cellular scale to model experimental data from liver immune
cells, in a collaboration between the mathematical modelling group of Dr Schumacher and Dr Campana from the experimental group
of Prof. Forbes at the Centre of Regenerative Medicine in Edinburgh.
Immune cells play an important part in resolving tissue injury. Macrophages are a type of immune cell that can "eat up" debris from
dead cells, through a process known as phagocytosis. This is a necessary step in tissue regeneration, and failure in phagocytosis has
been implicated autoimmune and inflammatory disorders. A current challenge is to understand how phagocytosis is impaired by, and
itself causes, ageing-like stress, known as cellular senescence.
Recent experiments with liver macrophages [1] provide accessible data to explore some basic hypotheses of how phagocytosis and
senescence (eating and ageing) may be interlinked. To do this, we will use simple mathematical frameworks (broadly akin to [2]) to
test which cellular behaviours can explain the measured cell population data for healthy and age-stressed liver macrophages. A
rough outline of steps in this project is as follows:
? Construct stochastic models of macrophage phagocytosis, the dependence of phagocytosis on senescence, and senescence as a
function debris ingested
? Analyse models for their statistical differences in population behaviour
? Fit models against experimental data of phagocytosis over time
? Perform statistical model comparison based on fit to data and model complexity
? Outstanding students could (a) perform model comparison in Bayesian framework and (b) treat the cell behaviour as a sample
space reducing stochastic process, and with phagocytosis as a function of senesce to state-dependent driving rate
Questions about the project should be directed to Dr Schumacher.
Page 33/108
Prerequisites
- knowledge of stochastic processes

- interest in biological applications
Recommended reading
1.Campana L, Starkey Lewis PJ, Pellicoro A, et al. The STAT3?IL-10?IL-6 Pathway Is a Novel Regulator of Macrophage
Efferocytosis and Phenotypic Conversion in Sterile Liver Injury. J. Immunol. 2018;200(3):1169?1187. Available at:
http://www.jimmunol.org/lookup/doi/10.4049/jimmunol.1701247
2.Karin O, Agrawal A, Porat Z, Krizhanovsky V, Alon U. Senescent cells and the dynamics of aging. bioRxiv. 2018. Available at:
http://dx.doi.org/10.1101/470500.
Page 34/108
Effect of zero observations on modelling categorical data
Project supervisor: Serveh Sharifi Far
Project type
standard
Suitable degrees
Project description
The project will focus on the analysis of categorical data in presence of sampling zero observations. Fitting log-linear and logistic
models to count data in contingency tables are considered. The zero cell entries in tables can cause parameter estimates with big
standard errors. The main aim of the project is to address and compare the effect of sampling zero observations on the two models,
using simulated and/or real data, and to develop models to overcome the problem. The project can be extended to assessing the
model's identifiability and also the Bayesian method of estimating the parameters.
Prerequisites
Familiarity with statistical modelling in R, and Generalised linear models.
Recommended reading
Agresti, A. (2002). Categorical data analysis, Second Edition. John Wiley and Sons publication.
Fienberg, S. E. and Rinaldo, A. (2012). Maximum likelihood estimation in log-linear models. The Annals of Statistics. 40(2),
996--1023.
Brown, M. B., Fuchs, C. (1983). On Maximum likelihood estimation in sparse contingency table. Computational Statistics and Data
Analysis, 1, 3?15.
Page 35/108
Efficiency of Approximate Bayesian Computation
Project type
standard
Suitable degrees
Project description
Approximate Bayesian Computation (ABC) algorithm is very popular in practice when the likelihood is intractable. Different
versions of it (for instance, the original rejection algorithm and its smoother local linear regression version) will be compared to
study their efficiency for different types of models, for instance arising in genetics. Both statistical and computational efficiency of
the methods will be studied.
Prerequisites
Bayesian Data Analysis
Recommended reading
1. Mark A. Beaumont (2010) Approximate Bayesian Computation in Evolution and Ecology

Annual Review of Ecology, Evolution, and Systematics , Vol. 41:379-406,
https://doi.org/10.1146/annurev-ecolsys-102209-144621
2. Approximate Bayesian Computation in Population Genetics

Mark A. Beaumont, Wenyang Zhang and David J. Balding
GENETICS December 1, 2002 vol. 162 no. 4 2025-2035
http://www.genetics.org/content/162/4/2025
Page 36/108
Electricity security of supply risk assessment including climate considerations
Project type
standard
Suitable degrees
Project description
the wind or solar resource, rather than by mechanical availability). Within such risk assessments it is necessary to build a joint
statistical model of demand and available renewable capacity (a statistical association between these is naturally expected, as in most
power systems temperature has an influence on demand through heating or air conditioning load, and available renewable capacity is
clearly driven by the weather).
This project will investigate whether the statistical modelling can be improved by bringing in climate statistics as explanatory
variables for the wind and demand models. Candidates include the "North Atlantic Oscillation" and wind direction statistics. The
project will further consider how this can be used to assess uncertainty in estimates of headline risk indices for the Great Britain
system.
Prerequisites

Recommended reading
mating-LoLE.pdf
http://centaur.reading.ac.uk/33770/1/1-s2.0-S0301421513005223-main.pdf
https://ieeexplore.ieee.org/document/8440216
Page 37/108
Energy Systems Planning and Operation
Project supervisor: Ken McKinnon
Project type
standard
Suitable degrees
Project description
This project will build on the planning model presented in Part A of the OR in the Energy Industry course. The project will tie in
with some of the work we are doing in UK National Centre for Energy Systems Integration (CESI).
Several separate projects can run in this area. There are several different possible topics that can be included in the project and these
can be combined together in different ways depending on your interests and degree programme. If the area interests you, contact me
to discuss details.
Topics:
1. Build a stochastic planning model. Investigate how to find scenarios that behave badly under the current plans and find how to
incorporate new scenarios incrementally so as to adapt the plans to cover these. Build an evaluator to test how plans perform on a
range of possible future scenarios. This can be done either by building your own stochastic planning model based on your OREI
model, or by extending our existing stochastic planning model, which is written in julia/JuMP.
2. When there is storage in the system it is no longer possible to plan future system using load duration curves. To assess how well
particular investments behave we can run the resulting operational model over realistic time sequences of demand and renewable
resources. The sequences need to cover several years to capture the range of weather conditions -- (e.g. there are persistently hot and
persistently cold years). It takes too long to solve the operational model over enough year to capture all these variations, so there is
need to understand these variations and select a set of "slices" which, when used in the operational model, result in the same
investment decisions as would occur if the entire multi-year operational model had been used. There is no generally accepted
methodology for selecting the slices, so this project will review the methods that have have suggested in the literature and the then
propose and develop and test a promising strategy.
3. Develop a model for district heating and incorporate this in the whole system model. Assess the economics of heat storage over
different time periods from night-day to summer-winter.
4. Long term storage of energy on an annual time scale, e.g. storing heat from summer to winter can be incorporated in the
operational model by including entire years. However as mentioned in 2, including so much data leads to very slow solution times.
Page 38/108
This project will develop methods of a combining model of annual storage using long time steps that capture the slow dynamics of
the bulk storage, with slices from different times of year that operate with short time steps and capture the fast dynamics of load and
renewable generaton.
5. There are usually economies of scale in developing and deploying new renewable technologies. This results in non-convex
planning problems and the possibility that there will be multiple optimal solutions, in which different technologies have been
developed. This project will develop and multi-period planning model incorporating commissioning and decommissioning of units
over and accounting for economies of scale. Methods will be investigated for finding global andlocal optima of the problem.
6 Some locations are better on average for wind, solar and wave power generation than others. However good locations are often in
places remote from the major demand centres. Also renewable generation of the same type and a different pattern at different
locations and this diversity means that the total generation over well spaced locations is is less variable that at individual locations.
Delivering the power to demand centres from possibly remote locations may need network reinforcement. This project will
investigate what is the optimal placement for new wind, solar and wave power generation, taking into account necessary network
reinforcements.
7 CO2 is produced in the manufacture of equipment used in the energy system, such as generators and storage, and this CO2
production should be taken into account in a whole system model. Gather information on this "embedded carbon" for the main
components of the energy system and incorporate this into the system planning model. Analyse how different the conclusion are
from investment models that do and don't include embedded carbon.
Prerequisites
Knowledge equivalent to have take the OREI course.
Recommended reading
Material covered in Part A of the OREI course provides the background. See: OREI Web site
http://www.maths.ed.ac.uk/mckinnon/OREI-PartA/
Page 39/108
European Power Market Modelling in Aurora
Project supervisor: Dan Eager
Suitable degrees
Project description
Wood Mackenzie (WM), a Verisk business, is a trusted source of commercial intelligence for the world's natural resources sector.
With detailed coverage of power market fundamentals, gas, coal, solar, wind, energy storage, and grid edge technologies, we make it
easier to understand the rapidly evolving power landscape.
Our Power & Renewables (P&R) Research department are in the process of migrating their European power modelling capability to
a dispatch optimization modelling suite provided by Aurora.
Aurora is price forecasting and analysis software based on the fundamentals of the competitive electric market.[1] It applies
economic principles and dispatch simulation to model the relationships of supply, transmission, and demand for electric energy to
forecast market prices.
The operation of existing and future resources are based on forecasts of key fundamental drivers such as demand, fuel prices, hydro
conditions, and operating characteristics of new resources.
The Aurora software has been applied already by WM P&R to model power markets in the US and the intention is to draw on the
wealth of knowledge & expertise gained to migrate the existing Europe Power Model from the existing monthly time-step AIMMS
model to the 8760 hourly Aurora option.
The purpose of this MSc dissertation project will be to explore/review one or more of the following functionalities in Aurora with a
case study application to one of the key European power markets to calibrate model results[2]:
Long-term power planning with possible extension to include portfolio optimization.
Energy storage - price and dispatch impacts of [battery and pumped storage] and [solar/wind + storage] units.
Hydro modelling capabilities
The project will require consolidation of existing data and, where necessary, the research of data to enrich the existing power plant
database and other complimentary modelling.
Page 40/108
Depending on the 1-3 option chosen, the calibration exercise will include a combination of comparing model results against 1)
out-turn market data (e.g., storage plant operation); 2) WM's existing European Power Model; and 3) if appropriate/possible a
comparison of results with a similar OR department dispatch model.
[1] See: epis.com/aurora/
[2] Options include France, Germany, GB, Integrated Single Electricity Market is a new wholesale market for Ireland and Northern
Ireland, Italy, Portugal, Spain.
Prerequisites
OR in the Energy Industry
Recommended reading
TBC
Page 41/108
Exploring the efficient frontier for genetic management and improvement of populations
Project type
standard
Suitable degrees
Project description
The classical Markowitz Portfolio model is most familiar in financial portfolio analysis, where it balances the maximization of
expected return (a linear function of asset distribution) against the minimization of risk (a convex quadratic function of asset
distribution) by using a parameter lambda to weight the latter in the objective. However, the mathematically equivalent problem is
obtained in genetic management and improvement of populations when balancing the maximization of response to selection against
the desire to maintain genetic diversity. As the parameter lambda varies, the optimal values of the desired response to selection and
genetic diversity trace out an "efficient frontier", the study of which is of value to population managers (breeders). With the
exception of the case when lambda is zero, so the problem reduces to a LP, each point on the efficient frontier corresponds to the
solution of a quadratic program. However, the efficient frontier can be traced by solving the KKT conditions parametrically, starting
from the solution of the LP when lambda is zero. This project will study the algorithm required and implement it in C++ or C#,
using HiGHS to solve the initial LP problem. The performance of the algorithm will be studied in the context of pedigree and
genomic data in collaboration with Gregor Gorjanc of The Roslin Institute.
Prerequisites
Fluent programming skills in C++ or C#
Recommended reading
Markowitz, Harry M., and G. Peter Todd. Mean-variance analysis in portfolio choice and capital markets. Vol. 66. John Wiley &
Sons, 2000.
Page 42/108
Exploring the efficient frontier in portfolio analysis
Project type
standard
Suitable degrees
Project description
The classical Markowitz Portfolio model balances the maximization of expected return (a linear function of asset distribution)
against minimization of risk (a convex quadratic function of asset distribution) by using a parameter lambda to weight the latter in
the objective. As the parameter varies, the optimal values of return and risk trace out an "efficient frontier", the study of which is of
value to financial analysts. With the exception of the case when lambda is zero, so the problem reduces to an LP, each point on the
efficient frontier corresponds to the solution of a quadratic programming problem. However, the efficient frontier can be traced by
solving the KKT conditions parametrically, starting from the solution of the LP when lambda is zero. This project will study the
algorithm required and implement it in C++ or C#, using HiGHS to solve the initial LP problem. The performance of the algorithm
will be studied in the context of either portfolio analysis for a local financial services company or (see related project OR166) in the
context of genomic data.
Prerequisites
Recommended reading
Markowitz, Harry M., and G. Peter Todd. Mean-variance analysis in portfolio choice and capital markets. Vol. 66. John Wiley &
Sons, 2000.
Page 43/108
Exponential family models on affine spaces
Project supervisor: Ioannis Papastathopoulos
Project type
standard
Suitable degrees
Project description
A family of probability densities on a finite dimensional affine space is standard exponential (EF) if the log-densities are affine
functions. When the likelihood of an exponential family cannot be calculated exactly, it can sometimes be calculated by Monte
Carlo using the Metropolis algorithm or the Gibbs sampler. The Monte Carlo log-likelihood, i.e., the log-likelihood in the
exponential family generated by the Monte Carlo empirical distribution, then converges almost surely over sample paths to the true
log-likelihood. During this project, EF models will be for used for modelling the event times of extremes in spatial environmental
applications. Particular emphasis will be placed on Gibbs sampling and in particular, in the simulation of a given class of
conditionally specified random fields.
Prerequisites
This project requires strong programming skills.
Recommended reading
- Davison, Padoan and Ribatet (2012). Statistical modeling of spatial extremes. Statistical Science.
- Charles J. Geyer, (1990). Likelihood and exponential models. PhD Thesis. University of Washington
- Whittle, P. J. (1954). On stationary processes on the plane. Biometrika
Page 44/108
Extended log-linear models for multi-list data
Project supervisor: Ruth King
Project type
standard
Suitable degrees
Project description
This project will consider multi-list data, where individuals may be recorded by multiple sources. Log-linear models are typically
applied to such data, which in turn permits an estimation of the number of unobserved individuals. The case will be considered
where individuals are not simply recorded as being observed by a given source, but also the time that they are observed. The data
can be summarised as the number of individuals observed by each source in a particular order. New, extended, log-linear models
will be developed within the project and fitted to data (either simulated or real data).
Prerequisites
Generalised regression models; Statistical Programming.
Recommended reading
Fienberg (1972) The multiple recapture census for closed populations and incomplete 2^k contingency tables. Biometrika 59,
591-603
Page 45/108
Forecasting time-series as a problem within learning of linear dynamical systems
Project supervisor: Jakub Marecek
Suitable degrees
Project description
Consider the problem of predicting the evolution of stock prices over time. This is, in general, impossible to do perfectly. One could
imagine, however, that the prices are a reflection of some underlying state of the market that evolves in a fashion that is not directly
observable. Clearly, there is also noise entering both the evolution of the hidden state and in the observations. If the relationships
between the hidden state and the observations are linear, and the noise is additive, this model is called a linear dynamical system
comprising a ?state equation' and an ?observation equation'. While the state equation provides predictions of the hidden state at the
next time-step, the observation equation relates the hidden state with the observation at the same time-step. The task, where one is to
combine the estimate of the hidden state with observations made at the time of prediction using this model, is known as ?filtering'.
For more than half a century, the dominant approach in filtering is based on the work of Rudolf Emil Kálmán. There, one guesses
what the system could be, and using observed measurements, possibly corrupted by noise, gauges one's confidence in the guess, and
produces estimates of its outputs based on a combination of the guess and recent observations. Despite Kalman filtering being a
staple of undergraduate courses across Engineering and Statistics, there has been little or no understanding as to how to perform or
circumvent the initial guess, such that one can guarantee the performance of the system relative to the use of the best possible guess.
Working with Mark Kozdoba and Shie Mannor at the Technion, the Israel Institute of Technology, the supervisor has recently asked
what conditions make it possible to model the predictions of Kalman filtering as a regression of a few past observations. We have
shown that when one sees the Kalman filter as an infinite sum of autoregressive terms, the dependence on the terms from the past is
decaying exponentially, whenever the linear dynamical system is observable and process noise is non-degenerate. Therefore,
Kalman filter can be approximated by regression on a few recent observations. This makes it possible to circumvent the initial guess
in Kalman filtering.
In particular, in a paper presented at AAAI 2019 (https://arxiv.org/pdf/1809.05870.pdf),we introduced an on-line algorithm for
estimating the output of a linear dynamical system considering only a few most recent observations and the respective autoregressive
terms within an on-line convex optimisation framework. The algorithm is imminently practical: its per-update run-time is linear in
the number of observations used (the regression depth). The above-mentioned decay results make it possible to prove the first-ever
regret bounds relative to Kalman filters, that is, relative to the use of Kalman filtering with the best initial guess in hindsight. In
computational experiments on an instance proposed by Hazan et al. (NIPS 2017), as well as stock market price evolution considered
by other authors, our present work reduces the error considerably.
Working with the code we have open-sourced:
https://github.com/jmarecek/OnlineLDS
Page 46/108
the project aims to test the approach computationally and, depending on the skill and interests of the student, extend to robustness to
heavy-tailed noise.
Prerequisites
An ability to work in Python. Interest in time-series forecasting or online optimisation is an advantage.
Recommended reading
On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters

Mark Kozdoba, Jakub Marecek, Tigran Tchrakian, Shie Mannor
AAAI 2019, https://arxiv.org/abs/1809.05870
Page 47/108
Identifying the Author of Literary Texts and Social Media Posts
Project supervisor: Gordon Ross
Project type
standard
Suitable degrees

Project description
Statistical methods can be used to quantify linguistic style for the purpose of authorship attribution. The assumption is that authors'
have a distinctive writing style which is stable over time, and which can be identified in all their writings. As such, a statistical
model that has been learned for the linguistic style of a particular author can be used to answer questions such as whether texts with
unknown authorship were written by the author in question. Applications include the analysis of literary texts with disputed
authorship (e.g. several Shakespeare plays) and internet security, where compromised social media accounts can be flagged as
having new authors. This project will explore various statistical approaches for this task
Prerequisites
Comfortable with computer programming and data analysis
Page 48/108
Improved algorithms for the Bin Packing Problem
Project supervisor: Maxence Delorme
Project type
standard
Suitable degrees
Project description
Given a set of weighted items and an unlimited number of identical capacitated bins, the Bin Packing Problem (BPP) consists in
packing all the items into the minimum number of bins. The BPP has been studied for 80 years now and we count thousands of
published papers dealing with the subject. The BPP has various real world application such as cutting problems in industry (e.g.,
wood, paper, aluminium) and loading problems (e.g., containers in harbours, file storage on computer).
One particular challenge for the researchers working on the BPP is to find good Integer Linear Programming (ILP) models. In the
last decades, no less than seven different ILP formulations have been proposed for the problem: in chronological order, the textbook
model, the set covering formulation, the one-cut, the arc-flow [1], the DP-flow, the general arc-flow with graph compression, and
reflect [2]. A survey on the BPP and most of these models is given in [3].
Among these models, the arc-flow formulation was shown to be a very good compromised between simplicity (the model can be
coded in few days) and performance. In the arc-flow, a bin is represented by a path that goes from node 0 to node c (the capacity of
the bin), in a graph where arcs are items. For example, to pack the items of size (4, 4, 3, 3, 2, 2) in bins of capacity 9, the optimal
solution of the arc-flow model would be two identical paths (0,4), (4,7), (7,9), that correspond to two identical bins containing one
item of each size.
It is a known fact that the way to build the graph has a major impact on the performances of the arc-flow model: as a variable in the
model is associated with each arc, the number of arcs should be as small as possible. However, it must always be possible to build
the path that corresponds to any feasible bin to ensure the correctness of the model.
In the literature, only one algorithm to build the graph was studied, and is based on ordering the items by non-increasing weight.
However, it is easy to find counter examples in which other strategies (e.g., put all the items with even size first) lead to graph of
smaller size.
In this project, we will build algorithms that determine the item ordering that leads to the graph of smallest size and we will measure
the performance of various ordering strategies (non-increasing, non-decreasing, even first ?).
The goals of the project are:
Page 49/108
Review the literature on the arc-flow formulation applied to packing problem
Build a brute force algorithm that tries every possible item ordering, and test it on small size instances
Study statistically the results of the algorithm and outline the best ordering strategies
Build an ILP model that determines the best item ordering and test it on medium size instances
Design some BPP instances that favour some specific strategies
Extend this work to the arc-flow model applied to the BPP with item fragmentation
Prerequisites
- Good modelling skills

- Good programming skills
Recommended reading
[1] Valério de Carvalho, J.M. (2002). LP models for bin packing and cutting stock problems, European Journal of Operational
Research, 141(2), 253?273.
[2] Delorme M., Iori M. (2019). Enhanced pseudo-polynomial formulations for bin packing and cutting stock problems, INFORMS
Journal on Computing, forthcoming.
[3] Delorme M., Iori M., Martello S. (2016). Bin packing and cutting stock problems: mathematical models and exact algorithms,
European Journal of Operational Research, 255(1):1?20.
Page 50/108
Inference for dynamical systems models of nerve regeneration
Project supervisor: Linus Schumacher
Project type
standard
Suitable degrees
Project description
Mathematical models can help to resolve complex interactions of multiple cell types in tissue regeneration. This project involves
modelling spinal cord repair in zebrafish using systems of non-linear differential equations representing cells and molecules at the
wound site. A successful project will contribute to ongoing research in a collaboration between the mathematical modelling group of
Dr Schumacher at the Centre of Regenerative Medicine in Edinburgh, and the Becker lab at the Centre for Inflammation Research.
Zebrafish have the ability to regenerate their spinal cord after injury, and immune cells of various kinds play important roles in
regulating inflammation and repair immediately after injury. However, the proposed interactions remain imprecisely defined, and
have not been quantitatively tested. This project will use recently published data [1] to construct, parameterise, and evaluate
quantiative models of cell interactions, and use these to predict outcomes of future experiments. A rough project outline is as
follows:
? Construct coupled ordinary differential equations to represent abundances of cells, interactions through secreted chemicals, and
their interactions
? Solve models numerically and fit to time-course data from normal and mutant animals
? Quantify model and parameter uncertainty using Bayesian inference approaches (see reference [2])
Questions about the project should be directed to Dr Schumacher.
Prerequisites
- differential equation modelling

- Bayesian statistics
- interest in Biological applications
Recommended reading
1. Tsarouchas, T. M., Wehner, D., Cavone, L., Munir, T., Keatinge, M., Lambertus, M., ? Becker, C. G. (2018). Dynamic control of
proinflammatory cytokines Il-1? and Tnf-? by macrophages in zebrafish spinal cord regeneration. Nature Communications, 9(1),
4670. https://doi.org/10.1038/s41467-018-07036-w
Page 51/108
2.Toni, T., Welch, D., Strelkowa, N., Ipsen, A., & Stumpf, M. P. H. (2009). Approximate Bayesian computation scheme for
parameter inference and model selection in dynamical systems. Journal of The Royal Society Interface, 6(31), 187?202.
https://doi.org/10.1098/rsif.2008.0172
Page 52/108
Integer programming formulations for edge deletion problems in graphs with applications to bovine
epidemiology
Project type
standard
Suitable degrees
Project description
Livestock trading can be modelled through a graph where the nodes are farms with animals and the edges represent movement of the
animals. These nodes are in danger of becoming infected. Once a node is infected we assume that over time all the nodes in the
graph that can be reach from this infected node will also become infected. In the control and prevention of livestock disease it is
important the effect of deleting some edges of this graph, which in practice means forbidding some trade patterns or having extra
vaccination on some routes. In practice, deleting edges may create protected isolated subgraphs - the disease cannot jump from one
subgraph to another if there are no edges between them. The goal is to delete some edges so that the largest isolated subgraph is as
small as possible (it has as few nodes as possible)
So far there have been computer science approaches to this problem based on tree decompositions, but there have not been formal
mathematical programming formulations. The objective of this project will be to provide one or more possible formulations based
on integer programming and study how to solve them. As an extension, we will study temporal graphs, which happen when the
edges exist only during some time periods of the time horizon.
Prerequisites
Recommended reading
- "Deleting edges to restrict the size of an epidemic", Jessica Enright and Kitty Meeks. Research paper on Arxiv:
https://arxiv.org/abs/1504.05773
Page 53/108
Interior Point Methods for LP and QP
Project type
standard
Suitable degrees
Project description
This project will require to study and implementan interior point method for a particular class of linearor quadratic programming
problem. It is a generic project.A student with a genuine interest in it is encouraged to contact Prof Jacek Gondzio to discuss
possible directions of research.
- study the theoretical background of the method; - implement the interior point method for a particular application.
Prerequisites
- fluency in Matlab, knowledge of linear algebra,

- fluency in the C programming language would be a plus,
- very good grasp of FuO (mark above 70%) and ODS.
Recommended reading
J. Gondzio, Interior Point Methods 25 Years Later,

European Journal of Operational Research, 218 (2012), pp. 587--601.
http://www.maths.ed.ac.uk/~gondzio/reports/ipmXXV.html
Page 54/108
Intra-day Electricity Market Design and Optimization
Project type
standard
Suitable degrees
Project description
Intra-Day Market is an intermediate electricity market between the Day-Ahead Market and the Reserve Capacity Market that aims to
reduce the likelihood of imbalances (fluctuations in generation and consumption of electricity) by creating additional trading
opportunities for the electricity market participants. Intraday markets help to reduce imbalances arising from utility breakdowns,
fluctuations of power generation from renewables, and etc. In Europe, there are a number of Intra-day market designs used by
different countries. These designs mainly differ from each other by their focus on speed and liquidity. In this project, we aim to
provide mathematical models to formulate the different approaches to intra-day market modelling, and provide effective solution
methods to such models.
Page 55/108
Joint reliability assessment of the Great Britain and Ireland electricity systems
Project type
standard
Suitable degrees
Project description
the wind or solar resource, rather than by mechanical availability).
Within such risk assessments it is necessary to build a joint statistical model of demand and available renewable capacity (a
statistical association between these is naturally expected, as in most power systems temperature has an influence on demand
through heating or air conditioning load, and available renewable capacity is clearly driven by the weather).
This project will take methods developed for modelling wind and demand in the Great Britain system, and apply them to joint
reliability assessment of the GB and Ireland systems. Key applied questions will include how the reliability of each system depends
on the size of interconnection between them and the protocol for sharing resource between systems in the event of a shortfall.
Methodological questions will include the approach for joint modelling of wind generation and demand in the two systems
(including at extremes of the distribution of net demand), assessing the uncertainty in statistical estimates of reliability levels, and
efficient computation for calculations in the two area system.
Prerequisites

Recommended reading
Page 56/108
mating-LoLE.pdf
Page 57/108
Laplace-type distributions for statistical modelling
Project supervisor: Bruce J Worton
Project type
standard
Suitable degrees
Project description
This project involves studying the Laplace distribution, and its numerous generalizations and extensions, for statistical modelling.
There will be the opportunity to investigate the theory as well as apply the methods to data sets from areas such as Communications,
Economics, Engineering, and Finance.
Prerequisites
Generalised Regression Models (MATH11187)

Statistical Programming (MATH11176)
Bayesian Theory (MATH11177)
Statistical Research Skills (MATH11188)
Recommended reading
Barnard, G.A. (1989). Sophisticated theory and practice in quality improvement. Philos. Trans. R. Soc. Lond. A 327, 581-589.
Barnard, G.A. (1995). Pivotal models and the fiducial argument. International Statistical Review 63, 309-323.
Kotz, S., Kozubowski, T.J. and Podgorski, K. (2001). The Laplace Distribution and Generalizations. New York: Wiley.
Wallis, K.F. (2014). The two-piece normal, binormal, or double Gaussian distribution: its origin and rediscoveries. Statistical
Science 29, 106-112.
Page 58/108
Learning and Forgetting Effects in Worker Cross-Training
Project type
standard
Suitable degrees
Project description
Cross-training workers to work on several tasks has proven to be one of the most efficient methods to hedge against the variabilities
in demand and increase utilization of the available resources. However, inclusion of cross-training significantly complicates the
capacity planning for organizations. One needs to decide not only the number of workers to be employed, but also decide which
individuals will be capable of doing which set of tasks. Yet another complication is the learning and forgetting effects due to being
cross-trained on a set of tasks. In psychology terms, the workers may be subject to retroactive and proactive inhibition effects which
will change their productivity on different tasks.
Our goal in this project will be to develop optimization methods to minimize the cost of capacity. We will utilize integer and
stochastic programming methods and analyze how the learning effects influences the capacity planning.
Page 59/108
Least Cost Generation Planning in Developing Countries
Project supervisor: Alice Waltham
Suitable degrees
Project description
EMRC has developed a suite of models that support electricity utilities in developing countries with constrained resources to
optimise their investment. These utilities typically suffer from insufficient generation and network capacity, resulting in frequent
outages and low levels of customer satisfaction, leading to low willingness to pay, high aggregate technical commercial and
collection (ATC&C) losses and a dire financial situation for the utility. To support utilities in breaking this cycle, EMRC are
developing a practical suite of models that will allow utilities to optimise the use of their limited capital to improve their financial
position.
The models already allow:
Energy demand study using time series econometrics to model the underlying demand that is supressed by inadequate
generation/network constraints.
Optimised Modelling of Network Investments (OMNI) including the optimisation of the decision to invest capital expenditure in
loss reduction and/or network upgrades to remove constraints. The methodology recognises revenue/kWh for each feeder is different
depending on the mix of customers and ATC&C losses. The revenue/kWh by feeder provides a value for reducing technical losses
and a Value of Unmet Demand.
Optimised approach to electricity supply in currently unserved areas ? supporting the decision of whether to invest in network
expansion or off-grid supply (mini-grids or solar homes).
Modelling the financial performance of the utility given the selected investments.
This assignment will develop a least cost generation investment planning tool, to allow utilities to also optimise the procurement or
development of new electricity generation over a 5 ? 10 year planning horizon. The tool will be integrated with the other models to
provide an overall investment planning approach for utilities in developing countries.
The tool will need to consider different technologies (solar, hydro, diesel, LPG and possibly wind) and the great uncertainty attached
to the output of grid based generators. The possibility of intra-day trading to balance out load with existing generators will also need
to be considered.
The project will also improve the interface of the energy demand models to allow users to model supressed demand and derive load
duration curves more easily.
Page 60/108
The student will demonstrate the least cost generation plan operating in selected scenarios, based on confidential data EMRC will
provide for a specific developing country. An enthusiastic student may be able to demonstrate the full application of the integrated
model for selected scenarios.
Prerequisites
Advanced modelling skills in Excel are essential. Python and R also desirable for the econometrics element.
Page 61/108
Matrix-Free Interior Point Method
Project type
standard
Suitable degrees
Project description
This project will require to study and implementan interior point method for a particular class of linear(and separable quadratic)
programming problems in which the numberof variables significantly exceeds the number of constraints.The approach relies on
conjugate gradient algorithm anda special preconditioner which exploits particular featuresof the interior point method.
Objective - study the theoretical background of the method; - implement the interior point method with only an implicit access to the
constraint matrix; - apply this implementation to solve several classes of real-life problems originating from network optimization
and/or signal processing.
Prerequisites
- fluency in Matlab, knowledge of linear algebra,

- fluency in the C programming language would be a plus,
- very good grasp of FuO (mark above 70%) and ODS.
Recommended reading
J. Gondzio, Matrix-Free Interior Point Method,

Computational Optimization and Applications, 51 (2012), pp. 457--480.
http://www.maths.ed.ac.uk/~gondzio/reports/mtxFree.html
J. Gondzio, Interior Point Methods 25 Years Later,

European Journal of Operational Research, 218 (2012), pp. 587--601.
http://www.maths.ed.ac.uk/~gondzio/reports/ipmXXV.html
Page 62/108
MIP/MINLP techniques for sparse regression/subset selection
Project supervisor: Lars Schewe
Suitable degrees
Project description
A standard additional constraint in many regression problems is to enforce sparsity on the solution or, to put it another way, to select
a subset of variables before solving the regression problem.
In the literature, there are a number of approaches to this problem
replace the sparsity constraint by a convex penalty
use combinatorial algorithms to find the correct subset (often via an active set type approach)
use mixed-integer nonlinear programs to solve the problem
If one is interested in algorithms for large data sets then approach A is the most common. One, however, cannot guarantee global
optimality in most of the cases. To guarantee global optimality, one typically uses approach B. But general purpose solvers have
become much much better over the last years. So, how good is approach C?
The goal of this project is to give an answer to this question.
Prerequisites
Experience with an MINLP solver, e.g., SCIP is of advantage.
Page 63/108
Misalignment and uncertainty in remote sensing data
Project supervisor: Finn Lindgren
Project type
academic
Suitable degrees
Project description
Remote sensing data from satellites for weather and climate analysis comes in large volumes and at high resolution. When
performing spatial analysis it is often necessary to aggregate the raw pixel information into larger gridboxes or other regions. If the
aggregated data later needs to be converted into data on a different grid, misalignment errors are introduced. That aim of the
proposed project is to assess the effect of such data aggregation and regridding, in the context of spatial statistics methods based on
finite element mesh calculations, and is based on problems from the EUSTACE EU project on high resolution historical climate
reconstruction in cooperation with the UK MetOffice.
Prerequisites
Linear models are essential. Bayesian methods may also be useful. Some R coding experience is necessary.
Recommended reading
F. Lindgren, H. Rue, J. Lindström (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: the
stochastic partial differential equation approach, Journal of the Royal Statistical Society Series B, 73(4), pp. 423?498.
Lindgren, F. and Rue, H., 2015. Bayesian Spatial Modelling with R-INLA. Journal of Statistical Software, 63 (19).
Page 64/108
Mixed Integer Linear Programming for the Bin Packing Problem with Item Fragility
Project type
standard
Suitable degrees
Project description
Given a set of weighted items and an unlimited number of identical capacitated bins, the Bin Packing Problem (BPP) consists in
packing all the items into the minimum number of bins. The BPP has been studied for 80 years now and we count thousands of
published papers dealing with the subject.
One variant of the BPP is the BPP with Fragile Objects (BPPFO). In the BPPFO, each item has a fragility in addition of a weight,
and the bin capacity is not fixed anymore: it is now equal to the smallest fragility among the items inserted in the bin. The BPPFO is
important in telecommunications, where it models the allocation of cellular calls to frequency channels. In Code Division Multiple
Access (CDMA) systems, a network is divided into cells, and each cell is equipped with an antenna having a limited number of
frequency channels. Any time a user makes a call, the CDMA system assigns the call to a frequency channel. See [1] for further
details.
One particular challenge for the researchers working on the BPP is to find good Integer Linear Programming (ILP) models. In the
last decades, no less than seven different ILP formulations have been proposed for the problem: in chronological order, the textbook
model, the set covering formulation, the one-cut, the arc-flow [2], the DP-flow, the general arc-flow with graph compression, and
reflect. A survey on the BPP and most of these models is given in [3].
Among these models, the arc-flow formulation was shown to be a very good compromised between simplicity (the model can be
coded in few days) and performance. In the arc-flow, a bin is represented by a path that goes from node 0 to node c (the capacity of
the bin), in a graph where arcs are items. For example, to pack the items of size (4, 4, 3, 3, 2, 2) in bins of capacity 9, the optimal
solution of the arc-flow model would be two identical paths (0,4), (4,7), (7,9), that correspond to two identical bins containing one
item of each size.
In this project, we will extend the arflow formulation (that was designed for the BPP) to the BPPFO.
Review the literature on the BPPFO
Page 65/108
Propose a naïve implementation of the arcflow for the BPPFO, with one graph per item fragility
Try several ideas (item ordering, alternative graph construction) to improve the performances of the naïve model
Develop heuristics to solve the problem
Prerequisites

Recommended reading
[1] Martínez, M. A. A., Clautiaux, F., Dell'Amico, M., & Iori, M. (2013). Exact algorithms for the bin packing problem with fragile
objects. Discrete Optimization, 10(3), 210-223.
[2] Valério de Carvalho, J.M. (2002). LP models for bin packing and cutting stock problems, European Journal of Operational
Research, 141(2), 253?273.
[3] Delorme M., Iori M., Martello S. (2016). Bin packing and cutting stock problems: mathematical models and exact algorithms,
European Journal of Operational Research, 255(1):1?20.
Page 66/108
Mixed Integer Linear Programming for the Diet Problem
Project type
standard
Suitable degrees
Project description
Human diet is complicated to organize: in addition of the obvious budget constraint shared by any individual, one has to pick a set of
aliments that he likes, respects the recommended daily allowance (RDA) for each of the main nutrients, provides satiety, takes into
account any personal restriction (e.g., low-cholesterol, vegetarian, low-energy), and is sufficiently diversified to be followed.
As all of the constraints can be expressed in linear forms, the aim of the project is to use a mixed integer linear programming (MILP)
model to construct a diet schedule suitable for a multitude of profiles. The Cisqual tables (see
https://ciqual.anses.fr/#/cms/download/node/20) will be used as input data for the aliments while official UK guidelines (
https://www.nutrition.org.uk/nutritionscience/nutrients-food-and-ingredients/nutrient-requirements.html) will serve for the nutrient
constraints.
Review the literature on the MILP models proposed for the diet problem
Simplify the Cisqual tables that originally have 2807 aliments and 60 attributes through statistical analysis (e.g., by merging groups
of aliments and eliminating unnecessary attributes)
Build an MILP model able to solve the diet problem
Define additional rules (i.e., constraints) to make the diet feasible (e.g., [1] stated that a naive modelling generated a solution
requiring to drink gallons of vinegar)
Generate a diet schedule for a week for various user profiles
Analyse the limits of the models: e.g., up to how many aliments can the MILP model take into account until it is not able to give
the optimal solution anymore? Can we propose some preprocessing techniques excluding automatically some aliments with very
poor nutritive values?
Page 67/108
Prerequisites

Recommended reading
[1] Dantzig, G. B. (1990). The diet problem. Interfaces, 20(4), 43-47.

[2] Sklan, D., & Dariel, I. (1993). Diet planning for humans using mixed-integer linear programming. British Journal of Nutrition,
70(1), 27-35.
[3] Anderson, A. M., & Earle, M. D. (1983). Diet planning in the third world by linear and goal programming. Journal of the
Operational Research Society, 34(1), 9-16.
Page 68/108
Mixed Integer Linear Programming in video games: Dota 2 Auto Chess case study
Project type
standard
Suitable degrees
Project description
Before year 2000, most of the video games (e.g., Zelda, Mario, Sonic) had a linear gameplay: they consisted in defeating a fictional
enemy in an offline world where only few decisions were left to the player. Nevertheless, some optimization problems could already
be found: how to pack efficiently the "loots" in the character's knapsack to maximize the profit (Knapsack problem) or how to finish
the game as fast as possible in some speed run challenges (Shortest path problem).
When affordable broadband internet connectivity spread, other types of video games such as:
MMORPG -- Massively Multiplayer Online Role-Playing Game (e.g., World of Warcraft, Dofus)
MOBA -- Multiplayer Online Battle Arena (e.g., League of Legends, Dota)
RTS -- Real-Time Strategy (e.g., Warcraft 3, Age of Mythology)
DCCG -- Digital Collectible Card Game (Hearthstone, Magic)
were developed. These video games usually allow players to compete one against each other, and motivated by the huge cash price
involved (e.g., up to 100,000 $ for the winner of some Hearthstone competitions) various strategies based on optimization emerged
in order to be the strongest. For example, some websites (e.g., Raidbots https://www.raidbots.com/simbot) determine the set of items
for a World of Warcraft character that maximizes his damages. Some others (e.g., HSRreplay https://hsreplay.net) perform the win
rate of some Hearthstone cards so that the player knows which one he should keep in his starting hand. Some algorithms [1] also
determine the tours in Pokemon Go that maximize the items collected.
In this project, we are interested in Dota 2 Auto Chess, a recent game in which, at each turn, the player picks some pieces to position
on a chessboard in order to fight one of the seven other players (chosen randomly). Each piece has a cost, and only 5 pieces out of 55
(in the patch 1.1.7) are proposed to the player at each turn. The pieces can be sold for gold, or combined with similar pieces to
become stronger. The pieces also have a chance of appearance, a species, and a class. Some specific boosts (such as damage
multiplicator or defence bonus) can be obtained if the player has enough pieces of the same species / class on board.
Page 69/108
The main topic of the project is to identify the optimization problems related to the game, and propose some techniques to solve
them. In particular, the student should:
Review the literature on optimization applied to video games and games in general (e.g., [2] and [3]).
Design an MILP model that will, according to arbitrary weight functions, determine the optimal combination of pieces (i.e., the one
that has maximum weight).
Simulate the phases where the player picks the pieces, and determine the likelihood of obtaining a given combination of pieces. If
the best combination has too many rare pieces, it is unlikely that it can be obtained by the player.
Modify the MILP to incorporate some data about (i) the positive or negative interaction between boosts and pieces, and (ii) the
likelihood of obtaining the mentioned pieces.
Modify the simulation and the MILP model to make them as close as possible to the original game (e.g., the rarity of a piece is
impacted by the other players' choices, the gold income depends on the current gold of the player (interests), and his results of the
last games (last game won, winning or losing streak ?).
Prerequisites

- Excellent programming skills (ideally in c++)
- Interest in video games (ideally in Dota 2 Auto Chess)
Recommended reading
[1] Álvarez-Miranda E., Luipersbeck M., Sinnl M. (2018). Gotta (efficiently) catch them all: Pokémon GO meets Orienteering
Problems. European Journal of Operational Research, 265(2), 779-794.
[2] Hertog D.D., Hulshof P.B. (2006). Solving Rummikub Problems by Integer Linear Programming. The Computer Journal, 49(6),
665-669.
[3] Kiyomi M., Matsui T. (2001). Integer Programming Based Algorithms for Peg Solitaire Problems. International Conference on
Computers and Games, 2063, 229-240.
Page 70/108
Mixed-Integer Equilibrium problems
Suitable degrees
Project description
A common model for energy markets, eps. electricity markets, are equilibrium models. We assume that we have a number of players
who want to maximise their profit and they all need to fulfill a market clearing condition. The theory for these problems is
well-studied in the case that each player has to optimise a convex problem. In real markets, however, nonconvexities appear
naturally. One instance are fill-or-kill bids, i.e. one either commits to produce a fixed amount or nothing. These requirements lead to
the question, how to include integer variables into the equilibrium framework. There are a number of approaches in the literature.
In this project, the goal is to analyse the results of different proposed schemes for a model problem.
Recommended reading
O'Neill, R. P., Sotkiewicz, P. M., Hobbs, B. F., Rothkopf, M. H., & Stewart Jr, W. R. (2005). Efficient market-clearing prices in
markets with nonconvexities. European Journal of Operational Research, 164(1), 269-285.
Huppmann, D., & Siddiqui, S. (2018). An exact solution method for binary equilibrium problems with compensation and the power
market uplift problem. European Journal of Operational Research, 266(2), 622-638.
Page 71/108
Money Dashboard - Duplicate Transactions
Project supervisor: Chris Dryden
Suitable degrees
Project description
Money Dashboard holds 300 million UK consumer banking transactions. This reflects the spend in people's current, savings and
credit card accounts. This data can provide invaluable insight into consumer spending habits. The data is rich and complex and
Money Dashboard continue to strive to understanding and "taming" that data. This allows us to provide a better service to our users
and generate revenue from understanding consumer spending.
Money Dashboard has a web and mobile application which users can sign up to and provide their banking transaction data. We
inevitably end up with duplicate data where users signup multiple times (with different email addresses), where users add their bank
accounts multiple times or where users have a joint account with another Money Dashboard users.
This duplicate data can create "noise" within our analysis and the identification and management of duplicate handling is something
we need to handle. We have a process currently which can be shared.
The identification of duplicate accounts is one process we do well. The identification of duplicate transactions is much more
challenging (we can legitimately have two transactions with iTunes for the same amount on the same day which is not a duplicate, or
someone can buy two rounds of drinks on the same day for the same value or where someone withdraws the same amount of money
from the same cash machine).
This project would aim to find a consistent method of identifying duplicates within the data and provide a confidence rating for
those duplicates based on the transactions or trends in the data that identifies duplicates.
The final result would be a methodology that could be applied to identify duplicates in existing and incoming transactions.
Page 72/108
Money Dashboard - Income Identification
Suitable degrees
Project description
Money Dashboard is poor at identifying people's income. This presents challenges when we are trying to help our users be better
with their money. A fundamental first step in that help is to understand how much the user earns.
Income transactions are complex. Some people are paid monthly, others every four weeks and some weekly. Some people have two
jobs. People change jobs and therefore change the pattern of how they get paid. Some people get promoted and change their salary
amounts.
We need to define an algorithm that will allow us, with a high degree of confidence to identify income transactions and understand
the users income.
This approach would require machine learning techniques and feature identification over a large dataset.
The final result would be a method and algorithm that can be applied to a new user to accurately identify their income.
Difficulty: Hard (variations in income make the statistics very hard to implement)
Page 73/108
Money Dashboard - Metric Normalisation
Suitable degrees
Project description
Money Dashboard's user base is growing at an accelerated pace as more users adopt the platform, additionally some users are also
leaving the platform, ending their spending data history. We need to be able to control for this shifting panel size by creating steady
user panels which can be used to understand long-term real patterns in consumer spend.
Money Dashboard has developed a way to do this which does control for the panel but the method has its weaknesses.
This project would be looking to explore different ways to normalise the data to see if other methods should be used to improve
upon the current method. The efficacy of the method adopted should be measured by the following criteria:
The method should effectively control for the shifting panel size. Allowing for consistent analysis across time periods as the panel
grows.
The method should allow for calculation of metrics (such as spend, transaction count etc.) at different dimension granularities i.e.
daily, weekly, monthly, merchant, merchant brand.
The method should allow for any results to be replicable
The method should be intuitive enough to be easily communicated to our data clients
Difficulty: Medium (a number of methodologies for normalisation will more than likely exist which can be tested against the data)
Page 74/108
Money Dashboard - Representative Panel
Suitable degrees
Project description
Money Dashboard has a skew in it's panel. We have more men than women (67% to 33%), younger, technology savvy users (we are
under represented in the over 50 age ground).
We would like to define a method that would weight the panel and their transactions to allow us to make the data more
representative of the UK population as a whole.
Using publicly available datasets the project would look to understand the representation of the UK population (by gender, age,
salary and postcode) and then define a method that would weight the Money Dashboard panel allowing the statistical methods to
analyse consumer spend to be more representative.
The result would be a dynamic method that could be maintained that would adjust to the constant changing Money Dashboard panel.
Difficulty: Hard (due to the need to map to publicly available external data sources and to assess the quality of the result).
Page 75/108
Multi-trip Multi-period Vehicle Routing Problems with Time Windows
Project type
standard
Suitable degrees
Project description
The project is motivated by a problem brought to us by a Scottish company. The company manufacturs goods and distributes them
with their own fleet of vehicles to customers. If a customer places an order with the company before 3 pm on a given day, then the
company guarantees delivery on the next day. After 3 pm, transport planner of the company then starts to build the delivery routes
for the next day. However, not all customers require a next-day delivery. Some of them specify a time window saying that, for
example, they need the delivery within the next three days, or, if they order on Tuesday, that the delivery should be on Wednesday
or Friday. Hence, there is some flexibility for the planner as to when to schedule the deliveries of those customers. Thus, ideally, the
planner should create routes not only for the next day but for the next 2-3 days. However, given the complexity of the problem and
the size of data, this is usually not possible. In addition, most vehicles do two tours per day. That is, they have to return to the depot
for reloading during the day. The company has only two bays for reloading vehicles and if the trips are not properly scheduled,
queues form at the reloading stations, considerably delaying deliveries and increasing the working hours of the drivers. The
company is now seeking help with their routing problem.
The goal of the project is to develop and implement a model for above described Multi-trip Multi-period Vehicle Routing Problems
with Time Windows. There are two possible approaches that can be taken
Derive a mixed-integer linear programming formulation and implement it using a standard solver, e.g. Xpress.
Derive and implement a heuristic method.
It is not yet clear to what extent the company will want to get involved in the project and whether we will be able to obtain actual
data from them. If not, then we can either generate random data sets or use publicly available data sources, e.g., Ordnance Survey,
the Office of National Statistics, or the UK Data Service.
The project can be taken by two students, one working on a MIP formulation and implementation and the other one the heuristic.
Both students would then work together on gathering data and analysing the findings.
Page 76/108
Nonparametric Bayesian Methods
Project type
standard
Suitable degrees

Project description
In parametric Bayesian inference, one assumes that the data is generated by a probability distribution with a known form, such as the
Normal or Exponential distribution. However in most realistic situations the form of the distribution will not be known, and
inference based on a miss-specified parametric model can give wildly misleading results. This has motivated the development of
nonparametric and semi-parametric methods which do not require a full specification of the data distribution and can hence be used
when this is unknown. This project will explore the use of popular Bayesian nonparametric models such as the Dirichlet process and
Polya Tree process. A solid computational/programming background is required.
Prerequisites
Strong computer programming, and knowledge of basic Bayesian statistics
Page 77/108
Optimal offering strategy of distributed energy assets in the UK balancing market
Project supervisor: Dr. Nick Good
Suitable degrees
Project description
PROJECT DESCRIPTION:
Upside Energy develops software and hardware to control and dispatch distributed energy assets to provide power system services and/or to p
The aim of this project is to develop, validate, and test a stochastic programming model for selecting a optimal set of bids in the UK balancin
The student will be provided with the technical characteristics of the cluster, the generation schedule already contracted, and a set of balancin
· build a multi-stage stochastic program for optimal offering in the UK balancing market able to capture the delay between decision makin
· implement standard techniques for scenario generation and reduction in order to obtain a tractable scenario tree;
· validate the model on in-sample analysis.
Then, if time allows
· evaluate the trade-off between complexity/runtime and optimality/accuracy;
· test the decision-making tool in an out-of-sample analysis;
· evaluate how the scenario generation and reduction process influences the optimal decisions in the out-of-sample analysis;
Prerequisites
Essential:
*proficiency in LP and MILP optimization.
*computer programming and modelling skills (e.g., Julia/JuMP, Xpress, Python).
Desired/Recommended:
*basic knowledge of energy system optimization (OREI).
*basic knowledge of scenario generation/reduction techniques.
Page 78/108
Optimistic Priors and Pessimistic Data? Measuring Compatibility
Project supervisor: Miguel de Carvalho
Project type
standard
Suitable degrees
Project description
The goal of this project will be on quantifying how much two experts agree, and how much do they agree with the observed data.
The measures to be reviewed in the project are based on a geometrical interpretation of Bayes theorem that can be used for
measuring the level of agreement between priors, likelihoods, and posteriors. The starting point for the construction of the said
geometry is the observation that the marginal likelihood can be regarded as an inner product between the prior and the likelihood.
Some examples are used to illustrate our methods, including data related to on-the-job drug usage, midge wing length, and prostate
cancer.
Prerequisites
Recommended reading
Christensen, R., Johnson, W., Branscum, A. and Hanson, T. E., (2011). Bayesian Ideas and Data Analysis: An Introduction for
Scientists and Statisticians. Boca Raton, FL: Chapman & Hall/CRC.
Page 79/108
Pension Decumulation via Prospect Theory
Project supervisor: Dr John Dagpunar
Suitable degrees
Project description
Recently, people in the UK with a defined contribution pension pot have been able to drawdown or decumulate the pot. The question
of how much to take from the pot each year is a classic example of sequential decision making under uncertainty. The uncertainty
and consequential risk arises from (a) longevity ? living longer than expected risks ?failure' or ?ruin' in that the pension pot may be
exhausted, (b) future investment performance, and (c) future inflation behaviour. In addition, the investment performance depends
upon decisions taken to vary the balance between risky and risk-free assets.
Also, it is not obvious what the objective should be. Clearly, it is related to the income withdrawn each year, to a desire not to run
out of money, and perhaps a bequest motive. Here we follow the Prospect Theory of Kahneman and Tversky (1979) where the
emphasis is on an individual's yearly and terminal expectations (reference points), which in this case we choose to be their target
withdrawal amounts and target bequest amount. Regarding the former, suppose the aim is to withdraw a target amount £$T_n$ in
real terms at age $n$. If the actual real annual withdrawal amount is £$w_n$ then we define a strictly increasing value function
$v_n$such that $v_n>0$ and concave when $w_n>T_n$, elseit is negative and convex. We can think of the value in any year as the
degree of happiness or pain experienced in a year according to whether the target is met or not. As with pure concave utility
functions, losses (below target) result in more pain than the happiness resulting from gains (above target) of the same magnitude.
That is called loss aversion, but note that in Prospect Theory there is a mix of concave and convex. The implication is that people are
risk averse when meeting their expectations and risk taking when falling below them. According to Kahneman and Tversky there is
much exp$erimental evidence for this.
Now suppose that $1-phi_n$is the probability of surviving to age $n+1$ given the individual's current age is $n$ and that $R_n$is
the real (after inflation) investment return during that year. Let $a$ denote the age at which decumulation starts, $b$the age at death,
$s_n$the pension pot at age $n$, $y(s_b)$the value of a bequest of$s_b$, and $f_n^{ast}(s_n)$ the maximum of the expected value
of $y(s_b)+sum_{i=n}^{b}v_i(w_i)$. Then, to determine the withdrawal amount $w_n$we consider the stochastic dynamic
program
$$ f_n^{ast}(s)=max_{w_n<s}left{v_n(w_n)+sum_{r}P(R_n=r)left([1-phi_n]f_{n+1}^{ast}[(s-w_n)(1+r)]+phi_n y[(s-w_n)(1+r)]

right) right} $$
The project will involve numerical computations and so excellent programming skills are required. Data on survival probabilities is
available from the Office of National Statistics. We use a hypothetical distribution of investment returns and assume independence.
We experiment with different value functions to model different degrees of importance attached to meeting withdrawal and bequest
targets. Subsequently, investment returns could be simulated showing the range of outcomes for an optimal strategy. An
understanding of the decision process might lead to suggested heuristic strategies and some progress might be made towards
strategies for dynamically changing a mix of two assets.
Page 80/108
Kahneman, D. and Tversky, A. (1979). Prospect theory: An analysis of Decision under Risk, Econometrica, 47, 263-291
A pdf of the project description can be found here.
Page 81/108
Planning school capacities and catchment areas
Project type
standard
Suitable degrees
Project description
In most UK cities, there is no freedom of choice when it comes deciding on where to send ones children. Instead, the designated
school is determined by the home address of the child. More precisely, each school has a predefined catchment area, which does not
overlap with a catchment area of any other school of the same type, and all children living within this catchment are obliged to go to
that school.
While this system avoids having to go through a - potentially lengthy and tedious - selection process during the summer, it severly
restricts the possibilities of school planners when facing overcrowded or underutilised schools. To deal with these situations, there
are essentially only two options: adjust the capacity of the school or change its catchment area. Both options come with obstacles.
Reducing the capacity of a school typically involves asking teachers to move schools. Extending the capacity of a school will
usually mean errecting new buildings or adding annexes to existing ones, both of which entail a considerable financial investment;
provided that there is space for an extension in the first place, which is not always the case in densly populated areas. While
changing the outline of catchment areas seems to be the easier option, and one that comes at no direct cost, this is is often as difficult
to implement as the former. The reason for this is that families with chilrdren often base their decision where to rent or buy based on
the school catchment area the property belongs to. Changing the catchment area will result in considerable opposition from affected
parents and may even impact housing prices, making the city susceptible for compensation claims from house owners. Thus, the
school planners have to find a "good" compromise between both options.
The goal in this project is to develop and solve a mathematical model that finds such a compromise. More precisely, given an
existing layout of schools and their catchment areas, the model should decide which schools to updgrade, which to downgrade, and
how to adjust catchment areas such that no school is overcrowded or underutilised and the financial invesment and adjustments to
catchment areas are minimal. There are two possible approaches that can be taken
Derive a mixed-integer linear programming formulation and implement it using a standard solver, e.g. Xpress.
Derive and implement a heuristic method.
The project will be in collaboration with Edinburgh City Council and they will provide us with data for the city of Edinburgh. It is,
however, not yet clear how much data they will be able to provide us with, as some of the data is highly sensitive, e.g., the actual
school capacities, number of classes, and ? especially ? the number of pupils per street/postal code and school year. If we are not
Page 82/108
able to get this data, then we can either "make it up" by randomly generating it, or by trying to extrapolate it from publicly available
data sources, e.g., Ordnance Survey, the Office of National Statistics, or the UK Data Service.
The project can be taken by two students, one working on a MIP formulation and implementation and the other one the heuristic.
Both students would then work together on gathering data and analysing the findings.
Prerequisites
Good programming skills.
Knowledge of Geographical Information Systems will be helpful.
Page 83/108
Portfolio Optimization: What is it good for?
Project type
standard
Suitable degrees
Project description
Optimization has been a popular tool to aid portfolio selection(i.e. how much to invest in which assets in order to maximise
profitwhile minimizing risk exposure) ever since the event of Markowitz'portfolio selection model[1]. Recently other models based
on conceptssuch as (conditional) Value-at-Risk, Robust Optimization andStochastic Dominance have been suggested to address
certain(perceived) shortcomings of the Markowitz model. Unfortunatelyobserved performance of all these models has been at least
patchy -indeed it seems very difficult for these sophisticated models tooutperform the simple "equally balanced" portfolio (i.e.
investingequal amounts in all available assets).
The aim of this project is to study how the above mentioned modelsmight be combined to obtain a portfolio selection strategy that
doesperform in practice. It is expected that the student will becomefamiliar with the standard portfolio optimization models. The
testcasewould be the international stockmarket over the lively period of the past 15 years.
Prerequisites
Optimization Methods in Finance (although could be adapted to students who did not do this course)
Page 84/108
Pre-processing for semidefinite programming
Project supervisor: Jakub Marecek
Suitable degrees
Project description
Semidefinite programming (SDP) is a broad generalization of linear programming (LP), which encompasses also convex quadratic
programming, second-order cone programming (SOCP), and convex quadratically-constrained quadratic programming (QCQP). Out
of the applications not covered LP and QCQP, one could list:
SDP provides the strongest available bound (approximation algorithm) computable in time polynomial in dimensions of instances
of many combinatorial problems, such as MAX CUT, VERTEX COVER, and maximum constraint satisfaction problems, under
plausible complexity-theoretic assumptions [Khot, 2002].
SDP also provides a principled way of solving non-convex problems in polynomial optimization [Lasserre, 2015], under minor
assumptions, albeit not in polynomial time due to the size of the instances of SDP involved.
SDP has extensive applications in statistics and machine learning, and where covariance matrices are positive semidefinite.
SDP has extensive applications in computational finance, especially when seen as optimal control. For example, the optimal
trading trajectory problem under uncertainty and transaction costs [Calafiore, 2009] can be cast as an SDP.
On classical computers, there are so-called interior-point-method (IPM) solvers with run-time to error of epsilon that is polynomial
in the dimension n, the number m of constraints, and polylogarithmic in 1/epsilon. Prior to running the IPM, one often applies some
pre-processing. The most popular method for the pre-processing is based on chordal completion (Grone et al. 1984). Another
method is known a facial reduction (Borwein and Wolkowicz 1981). Some 35 years after Grone et al (1981), Raghunathan and
Knyazev (2016) showed that chordal pre-processing leads to numerical issues (degeneracy), but did not suggest how to address the
issue, other than by not performing the pre-processing, which would limit the reach of IPM on sparse instances considerably.
Kungurtsev and Marecek (2018) suggested that facial reduction should be used after chordal-completion procedures. In
computational experiments on SDP instances from the SDPLib, a benchmark, and structured instances from polynomial and binary
quadratic optimisation, they showed that such two-step pre-processing with a standard interior-point method outperforms the interior
point method, with or without the traditional pre-processing. Their code was in MATLAB, which allowed the reuse of chordal
completion package SparseCoLo, but introduced some very distinct limitations.
The goal of this project would be to translate the pre-processing to Python and Chompack, cf:
https://github.com/cvxopt/chompack/blob/master/doc/source/examples.ipynb
Page 85/108
and implement and analyse a novel facial reduction procedure, suggested by the supervisor.
The code should be open-source.
Prerequisites
Interest in Python programming and computational optimization.
Recommended reading
V. Kungurtsev and J. Marecek: A Two-Step Pre-Processing for Semidefinite Programming, 2018.

https://arxiv.org/abs/1806.10868
L. Vandenberghe and M. S. Andersen: Chordal Graphs and Semidefinite Optimization, Foundations and Trends in Optimization,
2015.
http://seas.ucla.edu/~vandenbe/publications/chordalsdp.pdf
Page 86/108
Predicting the outcome of football matches
Project type
standard
Suitable degrees
Project description
This project will explore the use of statistical methods for predicting the outcome of football matches. Potential methods will include
ELO ranking models for tracking the relative strengths of teams, regression/machine models for handing the prediction task, and a
study of the various factors which might affect results.
Page 87/108
Presolve for mixed integer programming problems
Project type
standard
Suitable degrees
Project description
Mixed integer programming (MIP) problems are used in many practical applications. A linear objective function is minimized,
subject to linear constraints as well as integrality constraints on some of the variables. Presolve algorithms transform a MIP into an
equivalent problem of smaller dimensions. The reduced problem is solved and then the solution to the original problem is recovered
via a postsolve procedure. This project covers both the theory and the implementation of presolve for mixed integer programming
problems. Techniques for presolving and the corresponding postsolve techniques will be studied and then the algorithms will be
implemented in C++ or C#, using HiGHS to solve the reduced MIP problem.
Prerequisites
Recommended reading
Presolving Mixed-Integer Linear Programs

Ashutosh Mahajan
https://www.mcs.anl.gov/papers/P1752.pdf
Page 88/108
Progressive Hedging for structured MIP
Suitable degrees
Project description
When we model optimization problems, we often end up with a natural block structure. Each block then represents,e.g., a scenario in
stochastic optimization, one product in a supply chain model, or a day in a multi-day energy systems model. This project deals with
one method to use exploit such a block structure in mixed-integer optimization models.
The method was developed for stochastic programming: To get a good model of the uncertainty in a strochasitc program one often
needs a large number of scenarios. A classical technique to deal with these scenarios in linear or convex quadratic optimization
problems is to use a technique called "progressive hedging". However, its theoretical properties make it unsuitable for mixed-integer
problems. In a recent article, Boland et al. have discussed one method to extend this technique to stochastic mixed-integer programs.
The question is now, can we extend this technique to arbritrary block structured MIP?
To tackle this question, you will need to understand the technique and its prerequisites. Then you should identify problem types for
which you can apply it and give a proof-of-concept implementation.
Prerequisites
You should be able to use one programming language and be able to interface it to a MIP solver of your choice.
Recommended reading
N. Boland, J, Christiansen, B. Dandurand, A. Eberhard, J. Linderoth, J. Luedtke, F. Oliveira, "Combining Progressive Hedging with
a Frank-Wolfe Method to Compute Lagrangian Dual Bounds in Stochastic Mixed-Integer Programming", SIAM Journal on
Optimization 28-2:1312-1336, 2018.
Page 89/108
Real-time Forecasting the US Output Gap
Project supervisor: Miguel de Carvalho
Project type
standard
Suitable degrees
Project description
This project will apply multivariate singular spectrum analysis (SSA) methods for tracking and forecasting business cycles. The
forecasts will be produced resorting to real-time vintages, that is the version of the data that would be available to the forecaster at
the time at which the forecasts would have to be conducted. The statistical enquiry to be conducted will be based on multivariate
SSA methods which can be used to decompose a time series into several principal components of interest. To showcase the methods
in practice we focus on the US economy and contrast the obtained forecasts with the contraction periods dated by the National
Bureau of Economic Research (NBER).
Prerequisites
Time Series (MATH11131).
Recommended reading
- de Carvalho, M. & Rua, A. (2017) "Real-Time Nowcasting the US Output Gap: Singular Spectrum Analysis at Work"
International Journal of Forecasting, 33, 185?198.
Page 90/108
Relaxations and heuristics for facility layout design
Project type
standard
Suitable degrees
Project description
The facility layout design (FLD) problem consists of optimally partitioning a rectangular facility (or floorplan) of known dimensions
into a given number of non-overlapping indivisible rectangular departments with fixed areas (Anjos & Vieira, 2017). The objective
is to minimize the total cost associated with the interactions between these departments. This cost incurred by each pair of
departments is equal to the distance between their centers multiplied by the pairwise cost. This latter cost accounts for adjacency
preferences, as well as costs that may arise from transportation, the construction of a material handling system or connection wiring.
The goal is to determine the position of the centres of the department as well as their dimensions (height and width).
Two main classes of methodologies have been used to solve the FLD. First, heuristics focus on constructing a layout satisfying all
the requirements of the problem and that are near optimal (Adya & Markov, 2003; Anjos & Vannelli, 2002). Second, exact methods
look for the optimal solutions by using a branching algorithm to enumerate quickly all possible solutions (Anjos & Vieira, 2017).
These latter approaches use relaxations of the FLD to obtain lower bounds on the optimal value of the problem. The tighter the
bound is, the more efficient the branch and bound algorithm is. Using exact methods, optimal solutions were achieved only for
problems with up to 11 departments, but using heuristics from nonlinear programming, near-optimal solutions were obtained for
problems with up to 100 departments.
This project is concerned with experimenting with a new optimization approach to be able to achieve significantly strong lower
bounds for the FLD problem. More specifically, an improved mixed-integer semidefinite programming formulation wil be used,
together with new valid inequalities to improve the relaxations, and heuristic strategies to construct feasible solutions.
Computational experiments will be carried out using Matlab to test new approach on benchmark test cases.
This project will be carried out in collaboration with Professor Matthias Takouda of Laurentian University (Canada).
Can be taken by two students.
Prerequisites
- Fluency in Matlab
Page 91/108
Revenue Maximization and Customer Satisfaction on Web-Based Portals
Project type
standard
Suitable degrees
Project description
With the advances in technology, the use of internet for personal transactions is widely adopted by the society. In this work, we will
be interested in simulating the traffic in web employment portals. There are two classes of users arriving at an employment portal:
(i) employees and (ii) employers. When an employee arrives at the system, s/he checks existing job postings and if s/he can find a
suitable job s/he leaves the system. If there are no suitable jobs available, then s/he post her resume on the website. Employers also
follow the same scheme. In this project, we will be working with systems where there are multiple types of employees and
employers and the probability that a given employer and employee matches depends on the type of these users.
Our goal in this project will be to design pricing mechanisms to maximize the revenue gained by subscriptions to the web portal. We
will develop utility-based models to control the traffic in these portals and analyze how the pricing policies affect customer
satisfaction.
Prerequisites
Good command on basic simulation tools, stochastic modelling and programming.
Page 92/108
Risk in Logistics
Project type
standard
Suitable degrees
Project description
Logistics focuses on moving goods in space and time such that the right product is in the right quantity at the right time at the right
place. Classical logistics activities are the transportation and distribution of goods between facilities, e.g. suppliers, factories,
warehouses, retailers, and their storage, handling, processing, and packaging. Typical goals are to maximize customer satisfaction
(expressed through service levels, product quality, responsiveness, etc.) and-often at the same time-to minimize total costs,
environmental impact, and tied-up equity. Logistics planners face many challenges, ranging from conflicting goals, over supply and
demand uncertainties and the lack of information, to the difficulties of having to organize logistics activities across borders and
cultures in multi-national companies. Starting in the 1950s, optimization and OR techniques have been developed and used to
optimize logistics activities, with an ever increasing importance, role and proliferation since then. In 2012, the total expenditure for
logistics activities in Europe was 1,726 billion Euros and thus even small improvements may result in considerable monetary gains.
Uncertainty and risk are inherent in logistics: Customer demand can never be forecasted exactly, travel times will never be certain,
and machines or vehicles may break down, severely affecting and impairing logistics operations.
The goal of this project is to work on a logistics planning problem involving uncertainty. The specific problem or class of problems
is up to discussion and can be decided on in the first meeting. Also the exacte "scope" of uncertainty is open, i.e. the focus can be on
how to deal with uncertainty on a general conceptual level or on how to deal with a very specific type of risk and risk measure.
Moreover, also whether the focus should be on modelling and solving the problem optimally (using, e.g., Xpress) or on developing
and implementing a good heuristic is open.
Prerequisites
Having taken the course Risk and Logistics would be helpful, but is not mandatory.
Page 93/108
Robust diameter optimization of tree-structured hydrogen networks with active elements
Suitable degrees
Project description
When designing gas networks, one needs to account for the uncertainty in the future demands. At the moment, rough estimates are
made and the pipe network is designed with these estimates in mind. In a previous work, we have looked at the problem from a
robust optimization point of view. In this project, the goal is to extend the results to the case that the network contains active
elements, i.e., valves and compressors.
Recommended reading
Robinius, M., Schewe, L., Schmidt, M., Stolten, D., Thürauf, J., & Welder, L. (2018). Robust Optimal Discrete Arc Sizing for
Tree-Shaped Potential Networks. http://www.optimization-online.org/DB_HTML/2018/02/6447.html
Page 94/108
Routing Calls in Contact Centres with Agent Heterogeneity
Project type
standard
Suitable degrees
Project description
During the course of operation, contact centres receive a variety of requests and directing these requests to the most appropriate
agents is essential for efficiently operating the contact centre. Generally, the managers have information/forecasts about the
incoming requests and different agent pools that can potentially address different requests. However, due to the human nature of
agents, the performance of agents in a given pool may differ significantly. In this work, our goal will be to devise optimal or near
optimal routing mechanisms in the presence of within pool agent heterogeneity. We will analyze some recent results in the literature
that addresses the involvement of heterogeneous agents in the system and come up with methodologies to incorporate these novel
methods in routing problems.
Prerequisites
Simulation
Stochastic Modelling
Page 95/108
Solving Real Life Nonlinear Programming Problems: Optimal Power Flow
Project type
standard
Suitable degrees
Project description
The Optimal Power Flow problem is an important problem in PowerSystems Engineering which is at the base of many more
sophisticatedoptimization problems relevant to power systems operations. Theproblem is that, given an electricity network with
given demands, tofind the cost optimal generation levels of the availablegenerators. The main difficulty of the problem is that power
flows arenot "switchable": the flow follows the Laws of Physics. Simply usingthe cheapest generators may well violate network
constraints (such asline flow limits, voltage levels, reactive power limits). Indeed powerflows are AC and the applicable Laws of
Physics are the AC versions ofKirchhoffs Laws. At this point approaches vary, but power flows can bedescribed either by complex
numbers, or, if staying in the reels,Kirchhoff Laws require trigonometric functions. In any case theresulting constraints lead to a very
challenging nonlinear,nonconvex optimization problem.
The purpose of the project is to explore models and solutionapproaches that have been proposed for this problem (which coverquite
a few areas of nonlinear optimization) and if possible comparedifferent models and implementations. Students would learn quite a
bitabout modelling and solving challenging real life optimizationproblems.
Prerequisites
The project can be adapted depending on the previous knowledge of the student.
Background on power systems can be read up on quite easily. Same for background on nonlinear programming techniques. The
project can be adapted to the previous knowledge of the student.
The project would require working with different models and building solution algorithms, probably with a combination of
Matlab/Python and Xpress/AMPL.
Page 96/108
Solving the discrete p-dispersion problem with Lagrangian relaxation
Project type
standard
Suitable degrees
Project description
In the discrete p-dispersion problem p facilities mus be chosen from a set of candidates so that the minimum distance between the
selected facilities is as large as possible. This a mixed integer programming for which several formulations have been proposed in
order to solve it efficiently. One of them consists in considering the different distances and ranking them. Different formulations
have been provided in the literature for this problem. The goal of this project is to carry out a literature review of the problem and
then to use Lagrangian relaxation for solving one ore more of this formulations.
Prerequisites
Recommended reading
- "A new compact formulation for the discrete p-dispersion problem", D. Sayah and S. Irnich. European Journal of Operational
Research 256:62-67 (2017).
Page 97/108
Sparse ordinal regression: understanding perinatal depression with biological and psychosocial data
Project supervisor: Sara Wade
Project type
standard
Suitable degrees
Project description
Perinatal depression describes a clinically significant period of depressive symptoms in mothers around childbirth and can have a
severe impact on the mother, child and family. This study aims to use psychosocial data and inflammation-regulating biomarkers
based on blood samples collected during pregnancy to identify predictors of depression, both antenally and postnatally. Depression
is assessed using the Edinburgh Postnatal Depression Scale. While previous studies have ignored the discrete ordinal nature of the
response, this project will build an ordinal Bayesian regression model to more accurately model the data, with a sparsity-inducing
prior to indentify a sparse set of relevant predictors.
Prerequisites
Bayesian Theory
Recommended reading
"Bayesian Analysis of Binary and Polychotomous Response Data", Albert and Chib (1993).
"The Bayesian Lasso", Park and Casella (2012).
Page 98/108
Statistical modelling of directional data
Project type
standard
Suitable degrees
Project description
Interest in developing statistical methods to analyse directional data dates back as far as Gauss. Such data include:
- wind directions
- vanishing angles of homing pigeons - measured in range (0,2pi) or (0,360)
- times of birth over the day in hours - convert by multiplying by 360/24
- times of death from a single cause over years
This project would study the theory and application of circular data.
Topics would include: (i) basic descriptive directional statistics, (ii) common parametric models, (iii) statistical inference problems
on the circle. In addition, the student will study correlation and regression for directional data.
Application to meteorological data from JCMB weather station will provide an insight into the times of the year when KB is most
windy, and if this relates to other variables.
Prerequisites

Recommended reading
Fisher, N.I. (1993). Statistical Analysis of Circular Data. Cambridge: Cambridge University Press.
Fisher, N.I., Lewis, T. and Embleton, B.J.J. (1987). Statistical Analysis of Spherical Data. Cambridge: Cambridge University Press.
Page 99/108
Mardia, K.V. and Jupp, P.E. (2000). Directional Statistics. New York: Wiley.
Page 100/108
Statistical modelling of Olympic Games
Project type
standard
Suitable degrees
Project description
This project involves investigating data concerning the tally of Gold, Silver and Bronze medals in Olympic Games. The number of
medals a country wins in an Olympic Games is likely to depend on the size of the population of that country. For example, a country
such as the USA with a large population is likely to win more Gold medals than a country such as Luxembourg with a very small
population. This gives countries with large populations an advantage. A fairer comparison might be to take population size into
account when considering the success of a country in the Olympic Games. Similarly, the number of medals won by a country is
likely to depend on the wealth of the country.
The aim of the project is to use statistical modelling involving GLMs, and related models, to compare the performances of countries.
Prerequisites

Recommended reading
Venables, W.N. and Ripley, B.D. (2002). Modern Applied Statistics with S (4th edition). New York: Springer.
Page 101/108
Strategic Servers in Call Centers
Project type
standard
Suitable degrees
Project description
Call centres has been integral part of many organizations both in public and private sector in the last few decades. Our goal in this
project will be to understand the behaviour of agents in call centres and analyze the effects of strategic behaviour of agents. At a call
centre, agents value the idle time they experience and try to balance their effort spent to solve customer problems and the idleness
they experience. Taking the human nature of agents into account, each agent has a different way of evaluating the value of their
effort cost and idleness and this human aspect of the call centre design is generally disregarded in analysis. In this project, we will
employ a game theoretic approach to analyze the equilibrium behaviour of agents and for this purpose we will use recent results on
the analysis of systems with agent heterogeneity.
Prerequisites
Simulation
Stochastic Modelling
Page 102/108
Support Vector Machines in Financial Applications
Project type
standard
Suitable degrees
Project description
Support Vector Machines (SVMs) are a standard method in machinelearning and data mining used to separate points belonging to
two(or more) sets in $n$-dimensional space by a linear or nonlinearsurface. There exist numerous application of SVMs ranging
frompattern recognition, such as hand-written digit recognition andimage recognition, to medical applications such as findingthe
functions of particular genes, finding the most promisingtreatment for particular conditions and many others.Interior point methods
for optimization are one of possibleclasses of algorithms well-suited to solve SVM applications.
This project will require to study and implement an SVM modelfor a particular financial application chosen by the student.
Objective - study the SVM approach, - choose a particular financial application which can be modelled and solved using SVM
approach, - implement the application, - test model's performance on real-life data.
Prerequisites
- fluency in the C programming language,

- very good grasp of ODS.
Recommended reading
M.C. Ferris, T.S. Munson,

Interior-Point Methods for Massive Support Vector Machines,
SIAM Journal on Optimization 13 (2003), 783-804.
http://epubs.siam.org/sam-bin/dbq/article/37437
K. Woodsend and J. Gondzio,

Exploiting Separability in Large Scale Linear Support Vector
Machine Training,
Computational Optimization and Applications (published on line).
http://www.springerlink.com/content/100249/?k=Woodsend
Page 103/108
K. Woodsend and J. Gondzio,
Hybrid MPI/OpenMP Parallel Linear Support Vector Machine Training
Journal of Machine Learning Research 10 (2009) 1937-1953.
http://www.jmlr.org/papers/volume10/woodsend09a/woodsend09a.pdf
Page 104/108
The split delivery vehicle routing problem with stop nodes
Project type
standard
Suitable degrees
Project description
In the classical Vehicle Routing Problem (VRP) there is a fleet of vehicles and a set of customers that demand some products.
Routes must be designed and allocated to the vehicles so that all the demand is served, no vehicle exceeds its capacity, and each
customer is visited exactly once. If we allow that a customer can be visited by more than one vehicle, we then have the Split
Delivery Vehicle Routing Problem (SDVRP), which can lead to some savings.
If we are considering vehicles with a limited autonomy (e.g., electrical vehicles), they have to divide their routes into legs so that
each leg does not exceed their autonomy. We call this problem, when we allow split delivery, the Split Delivery Vehicle Routing
Problemm with Stop Nodes (SDVRPSN). The goal of this project is do a literature review to put this problem into context in the area
vehicle routing, to write a mathematical formulation for the problem, and to design and implement a heuristic algorithm to solve it.
Prerequisites
- Experience with an optimization solver (e.g, Xpress).
Recommended reading
- "The split delivery vehicle routing problem: a survey.", C. Archetti and M.G. Speranza. In B. Golden, S. Raghavan, and E. Wasil
(editors.), "The vehicle routing problem. Latest advances and new challenges", pages 103-122. Springer (2008).
- "A randomized granular tabu search heuristic for the split delivery vehicle routing problem", L. Berbotto, S. García, and F,J,
Nogales. Annals of Operations Research 222:153-173 (2014).
- "A vehicle routing problem with split delivery and stop nodes", L. Berbotto, S. García, and F,J, Nogales. Working Paper 11-09
(06), Statistics and Econometrics Series, University Carlos III of Madrid (2011). Link:
https://e-archivo.uc3m.es/handle/10016/11026
Page 105/108
Understanding the demand for social care in an aging Scotland
Project supervisor: Nicholas Cassidy
Suitable degrees
Project description
Scotland's population is rapidly changing with a growing elderly population forecast to continue to expand. By 2030 the over 65
population is forecast to grow by over 30% to 1.33 million people, meaning this group will constitute almost a quarter of the total
Scottish population. Clearly, such a change will have massive consequences for provision of services, particularly social care. As
yet, however, growth in the elderly population has not been accompanied by an increase in social care provision at the rate that
might be expected.
There are potentially a number of explanations behind this trend. Older people may be living healthier lives or innovative
preventative initiatives may be diminishing the need for long term care. On the other hand, care may be more tightly rationed and
targeted on the highest dependency cases with families and communities doing more for others, a scenario that could lead to
significantly increased demand in future. It may be a combination of all these factors and understanding what shapes the conversion
of demographic change into effective demand needs to be more fully explored and the role of rationing and cost control explicitly
discussed.
The goal of this project is to analyse a range of social care data in order to explore predictors of demand and how these interact with
both the supply of social care provision and other demographic and socio-economic factors at a local level. This is an area of key
strategic importance in Scotland, and the project provides the opportunity to help local and national government shape their
approaches going forward.
Prerequisites
Knowledge of traditional research methods and statistical procedures. The student would be required to utilise a range of publicly
available data sources, and so knowledge of these would be advantageous, particularly health and demographic indicators.
Recommended reading
Local Government Benchmarking Framework National Overview Report available here:

http://www.improvementservice.org.uk/documents/benchmarking/overviewreport1718.pdf
National Records of Scotland Projected Population of Scotland - see PDF report and data available here:
https://www.nrscotland.gov.uk/statistics-and-data/statistics/statistics-by-theme/population/population-projections/population-projecti
ons-scotland/2016-based
Scottish Government Medium Term Health and Social Care Financial Framework, for an understanding of expected conversion of
aging population into provision, available here:
https://www.gov.scot/binaries/content/documents/govscot/publications/publication/2018/10/scottish-government-medium-term-healt
h-social-care-financial-framework/documents/00541276-pdf/00541276-pdf/govscot%3Adocument
Page 106/108
Accounts Commission Social Work in Scotland, available here:
http://www.audit-scotland.gov.uk/uploads/docs/report/2016/nr_160922_social_work.pdf
Page 107/108
Validating models of conventional generation for power system reliability assessment (with Astrape)
Project type
standard
Suitable degrees
Project description
Most electric power systems, unless they have significant hydro-electric resource, principally rely on conventional generating
capacity (gas, coal, nuclear and similar) for reliability of supply. Reliability of conventional generation is primarily a matter of
mechanical availability (ie whether it is broken down or not), as fuel is usually available.
The standard model for conventional generation in such calculations makes the very strong assumption that available capacities from
different units are statistically independent. This assumption has, however, rarely been validated with respect to data on observed
outcomes of unit availability. There has also been limited investigation of the appropriate data and methods to use for statistical
estimation even if the assumption of independence were valid (i.e. as it is times of highest demand which matter most in these
calculations, there is a tension between using directly relevant data from those rare times of highest demand, and the need for a
reasonable volume of data for statistical purposes).
High quality time series data on historic generation availability will be supplied by Astrape, a N American consultancy specialising
in power system reliability studies. Astrape will further advise on application issues and current methodology for industrial studies.
Prerequisites

MSc (or equivalent background), the project would be very challenging.
Recommended reading
Presentation from Astrape (particularly Appendix):
https://www.nyiso.com/documents/20142/5020603/Astrape%20presentation%20021519.pdf/8696ef09-28fc-b782-b575-1f79f7a38da
6
Please email Chris Dent if you are interested and would like to see a more detailed Astrape presentation on conventional plant
modelling.
Page 108/108

MSC Dissertation

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

MSC Dissertation

Diunggah oleh

Hak Cipta:

Format Tersedia

Student projects

A dense simplex solver for HiGHS

Operational Research with Computational Optimization

Fluent programming skills in C++

Computational Techniques of the Simplex Method by Istvan Maros

Operational Research with Computational Optimization

Fluent programming skills in C++

Computational Techniques of the Simplex Method by Istvan Maros

- Fluency in Matlab, knowledge of linear algebra.

Statistics and Operational Research

Ideally, familiarity with applied statistical modelling, preferably in R.

Familiarity with applied statistical modelling, preferably in R.

machine learning to consider the relevant properties of that trade.

Statistics and Operational Research

Bayesian Data Analysis

Statistics and Operational Research

Ideally, familiarity with applied statistical modelling, preferably in R.

Statistics and Operational Research

Knowledge of R and Bayesian methods.

Statistics and Operational Research

Bayesian Data analysis or Bayesian Theory

Operational Research with Computational Optimization

Statistics and Operational Research

Familiarity with statistical modelling in R, and Generalised linear models.

- fluency in the C programming language,

J. Gondzio, P. Gonzalez-Brevis, P. Munari

J. Gondzio, P. Gonzalez-Brevis, P. Munari

Software available: Primal-Dual Column Generation Method (PDCGM):

Statistics and Operational Research

Knowledge of R. Knowledge of basic nonparametric techniques is useful but not mandatory.

Statistics and Operational Research

Fluent programming skills, preferably C/C++

This project will be carried out in collaboration with RTE (France).

Operational Research with Computational Optimization

The aim of this project is to follow up with more sophisticated approaches:

Statistics and Operational Research

? Analyse models for their statistical differences in population behaviour

? Fit models against experimental data of phagocytosis over time

? Perform statistical model comparison based on fit to data and model complexity

Questions about the project should be directed to Dr Schumacher.

- knowledge of stochastic processes

Statistics and Operational Research

Familiarity with statistical modelling in R, and Generalised linear models.

Statistics and Operational Research

Bayesian Data Analysis

1. Mark A. Beaumont (2010) Approximate Bayesian Computation in Evolution and Ecology

2. Approximate Bayesian Computation in Population Genetics

Ideally, familiarity with applied statistical modelling, preferably in R.

Knowledge equivalent to have take the OREI course.

 Long-term power planning with possible extension to include portfolio optimization.

 Hydro modelling capabilities

[1] See: epis.com/aurora/

OR in the Energy Industry

Fluent programming skills in C++ or C#

Fluent programming skills in C++ or C#

Statistics and Operational Research

This project requires strong programming skills.

Statistics and Operational Research

Generalised regression models; Statistical Programming.

Working with the code we have open-sourced:

An ability to work in Python. Interest in time-series forecasting or online optimisation is an advantage.

On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters

Operational Research with Data Science

Long-term power planning with possible extension to include portfolio optimization.

Hydro modelling capabilities

Design some BPP instances that favour some specific strategies

replace the sparsity constraint by a convex penalty

use mixed-integer nonlinear programs to solve the problem

Review the literature on the BPPFO

Develop heuristics to solve the problem

Build an MILP model able to solve the diet problem

Generate a diet schedule for a week for various user profiles

MOBA -- Multiplayer Online Battle Arena (e.g., League of Legends, Dota)

RTS -- Real-Time Strategy (e.g., Warcraft 3, Age of Mythology)

DCCG -- Digital Collectible Card Game (Hearthstone, Magic)

The method should allow for any results to be replicable

Derive and implement a heuristic method.

Derive and implement a heuristic method.