skip to primary navigationskip to content

The Vocabulary of Big Data


To start the Big Data conversation, we need a shared vocabulary!


Big Data is everywhere, spanning the entire range of academic research. No matter what you do – from humanities to natural sciences, from social sciences to engineering to medicine – you are bound to come across copious amount of data: this is the outcome of modern technology which allows us to collect, measure and sample. Yet, data on its own, no matter how “Big”, is of little use. The challenge is to distil Big Data into actionable, useful information. This requires a range of tools from mathematics, statistics and computer science, methodologies which might appear intimidating to the uninitiated.


On 26 January 2015, eight Cambridge academics gave short presentations introducing The Vocabulary of Big Data, the range of concepts and ideas which underlie modern analysis of large data sets. In a maths-free manner, light on technicalities yet rich on content, they aimed to outline the meaning and intuition behind the methodologies.

View the presentations and download the slides using the links below.


Machine Learning

Professor Zoubin Ghahramani introduces machine learning, at the interface between statistics, computer science, and computational neuroscience

Machine Learning -Read More…

Data sources for road traffic modelling

Dr Richard Gibbens demonstrates how analysing messy data can help us to understand our road traffic networks

Data sources for road traffic modelling -Read More…

Statistical aspects of big data

Professor Richard Samworth shows how 21st century data present 21st century statistical challenges, and where researchers at Cambridge can go for help

Statistical aspects of big data -Read More…


Algorithms have entered everyday vocabulary like never before. Professor Anuj Dawar describes the origins of algorithmic thinking, and how the study of algorithms is crucial to developing efficient solutions to big data problems

Algorithms -Read More…

Big data and compressed sensing: How to sample your data in a clever way

Data compression helps us to reduce file sizes so that we can move them around without losing the crucial information. But how can we retrieve the same information while collecting less data overall? Dr Anders Hansen shows us how an understanding of sparsity can help answer this question.

Big data and compressed sensing: How to sample your data in a clever way -Read More…

Large-scale Data Processing

As data volumes get larger, processing efficiency must increase. Dr Eiko Yoneki shows how concepts like scale-up, scale-out, cloud computing and graph parallel processing are revolutionising our ability to deal with massive data volumes

Large-scale Data Processing -Read More…

The blessing and the curse of the big image

Mathematics is vital to understanding the underlying properties of images, particularly in today's big data world. Dr Jan Lellmann shows how a range of mathematical techniques, can reveal new insights for science, medicine and art

The blessing and the curse of the big image -Read More…

Natural Language Processing

Turning computers into proficient readers of 'natural languages' (i.e those spoken by humans) can be surprisingly complex. Dr Paula Buttery describes how Natural Language Processing is used in training computers to read, digest and understand written text, as well as some of the challenges that such data throw up

Natural Language Processing -Read More…