Motivation

It has been recently claimed that human movement is 'highly predictable' with an upper bound of 93% predictability shown.

The interest in such upper limits is fueled by the vast array of applications that benefit from accurate prediction of an individual's future location. Examples include ubiquitous advertising, service provision and intelligent agents both virtual and physical. Knowing the theoretical limits of mobility predictors provides insight into human behaviour in general. However, it also enables designers of applications to understand and work around the constraints with which they are faced at specific spatiotemporal granularities, and provides researchers with additional insights into the design of effective prediction algorithms.

However, knowing an upper bound is only useful if it is relatively tight, i.e. it is close to the true limit of predictability.

The work

In this work we reconsider the derivation of the upper bound to movement predictability and by considering real-world topological constraints we are able to achieve a tighter upper bound. Our results show that the upper bound is actually between 11-24% less than previously claimed.

Particularly at fine-grained spatial and temporal quantization, where a significant number of practical applications lie, these new (lower) upper limits raise serious questions about the use of location information alone for prediction, contributing more evidence that such prediction must integrate external variables in order to satisfy the prediction accuracy requirement of a significant number of pervasive computing (and other) applications.

Download the paper (pre-print) or get it from IEEE Xplore (published version).

Slides from the presentation at PERCOM 2014

Additional material

An expanded proof to that presented as Appendix A.

Replication of results

The code can be found on GitHub in two parts:
1. GPU based Python Library and Python interface for calculating the Entropy Rate.
2. Python code for computing the upper bound on the limit of predictability using both the original method and the new method, including code for replicating the heatmaps from the published work.

The code has been tested on Ubuntu 13.10 x64 on a machine with a NVIDIA GeForce 580 graphics card and CUDA 5.5.
The code has the following dependencies:

Python, Numpy, Scipy and Boost.Python

Ubuntu 13.10:
sudo apt-get install python-numpy libboost-python-dev python-scipy

Matlab (inc. the Parallel Toolbox)

Ubuntu 13.10:
Installed by Matlab installer. Version 2013a tested.

mlabwrap

Ubuntu 13.10:
Download from sourceforge and extract to a working directory.
Modify setup.py by setting MATLAB_VERSION = 7.3
Ensure Matlab is in your PATH variable. Note: Later if you launch from Eclipse the PATH variable in Eclipse may also need to be set.
Some dependencies: sudo apt-get install csh tcsh
Then from the working directory: python setup.py install

healpy

Ubuntu 13.10 (easiest is via pip):
sudo apt-get install python-pip
sudo pip install healpy

utm for python

Ubuntu 13.10:
sudo pip install utm

Boost.NumPy

Ubuntu 13.10:
sudo apt-get install scons
Download zip from this site
Extract and from within the ./Boost.NumPy-master directory:
scons
sudo scons install
NOTE: If you get errors surrounding libbost_numpy.so it may help to make a symlink in /usr/lib, i.e.
sudo ln -s /usr/local/lib/libboost_numpy.so /usr/lib/libboost_numpy.so

CUDA

Ubuntu 13.10:
Install via .deb package provided via NVIDIA (tested version 5.5, used .deb for Ubuntu 12.10)

[Optional] sqlite external interpreter

Ubuntu 13.10:
sudo apt-get install sqlite3

Instructions

• Download and extract both the Predictability Upper Bound project and the Entropy Rate Estimator project from Github.

• Compile the Entropy Rate Estimator library using the instructions in the project readme and copy the library into the LoP folder as libEntropyCalc.so

• Download the Geolife data set. Once downloaded, extract to a working directory and update the paths in GeolifeSymbolisation.py

• Optional: Pre-build the database cache files by running "python GeolifeSymbolisation.py". This will take a significant amount of time as it builds the trajectory database from downloaded Geolife data set and pre-computes all required trajectory quantisations. If this is not done then it will be done on first run of the main code.

• Run "python GeolifeEntropyCalc.py". This computes the upper bounds for all spatiotemporal quantisations investigated in the paper and saves them to a CSV.

• Run "python Graphing.py". This plots the results as heatmaps.

Errors?

If you get:
mlabraw.error: Error using matlabpool (line 144) Failed to open matlabpool. (For information in addition to the causing error, validate the profile 'local' in the Cluster Profile Manager.)

Then it could be you are not allowing enough workers in the Matlab Parallel Toolbox configuration. Check the command to open the pool (in openPool.m) runs in Matlab. If it does not either adjust the number of workers requested in this file, or increase the number available.