Research workshops and challenge areas

C2D3 works with groups of researchers, Departments, and other Interdisciplinary Research Centres, Strategic Research Initiatives and Networks to promote collaboration, share ideas, scope out key research questions in data science and to support the development of interdisciplinary funding proposals.

Our members are eligible to apply for small amounts of seed funding for the organisation of research workshops (for more information, see the call for proposals).

Research Workshops

Research Workshops are an important part of our programme. Workshops range from half-day meetings between researchers in Cambridge Departments to discuss specific research questions, to multi-day conferences with external speakers and delegates. Workshop outcomes include strengthened networks and intellectual exchange, development of new project ideas.

We can also provide small amounts of seed funding for the organisation of research workshops (for more information, see the call for proposals).

Data Science School: Machine Learning applicaons for life sciences (Online)

17-22 September 2020

Hosted by the University of Cambridge, Bioinformatics Training Facility remotely via Zoom and using hosted UNIX environments accessed via browser. Organizer:Louisa Bellis, University of Cambridge.

Speakers: Marta Milo, Astrazeneca; John Thomas, University of Cambridge; Mario Guarracino, National Research Council, Naples, Italy; Javier Gonzalez Hernandez, Microsoft Research; Magdalena Strauss, The Sanger Institute, Cambridge; Neil Lawrence, University of Cambridge.

The course was attended by 46 participants, including 20 external participants from The Crick Institute, The Royal Veterinary Society, Kymab, Unilever, Astrazeneca, University of Oxford, UCL, KCL, Queens College London, Imperial, Sanger Institute, NHS Addenbrookes Hospital and KTH Royal Institute of Technology, Sweden.

This School aims to familiarise biomedical students and researchers with principles of Data Science. Focusing on utilising machine learning algorithms to handle biomedical data, it will cover: effects of experimental design, data readiness, pipeline implementations, machine learning in Python, and related statistics, as well as Gaussian Process models. Providing practical experience in the implementation of machine learning methods relevant to biomedical applications, including Gaussian processes, we will illustrate best practices that should be adopted in order to enable reproducibility in any data science application.

C2D3 Hierarchical Modelling Workshop

25 February 2020

Held at the Maxwell Centre, West Cambridge, University of Cambridge, led by Sylvia Richardson and Mark Girolami.

Bayesian hierarchical modelling (BHM) is one of the most powerful modern statistical techniques. It provides a unifying framework for dealing with a diversity of sources of complexity arising from the structure (e.g. dependence) of the data and its associated measurement process. Hierarchical model building strategy involves defining latent unobserved quantities of interest which are organised into a number of levels with distinct interpretations and building probabilistic between the latent quantities and the data. Bayesian hierarchical models coupled with efficient computational tools have been successfully used in a very wide range of application areas (e.g epidemiology, social sciences, education, geography, environmental sciences, biomedicine, political sciences).

By its generic character, this modelling strategy has the potential to bring together scientists from a wide range of disciplines across the University. Computationally, it also raises a number of algorithmic challenges which could provide useful topics for interactions.

C2D3 hosted a workshop to bring together academics interested in BHM to develop research areas for the programme. With an introduction to BHM from Sylvia Richardson and Mark Girolami, the meeting of interested academics facilitated a flowing and open discussion with a large unstructured half-day programme. The workshop led to a research grant application covering several application areas, bringing researchers together from across the University to work together for the first time.

Autumn School: Data Science: Machine learning applications for life sciences

23-26 September 2019

Held at Craik-Marshall Building, Downing Site, University of Cambridge and organised by Dr Gabriella Rustici (Head of Bioinformatics Training, University of Cambridge; Associate Director of Training, HDR UK), and Dr Marta Milo (University of Sheffield).

The Autumn School provided 44 biomedical students and researches the opportunity to share the principles of Data Science, using a multidisciplinary approach. Focusing on utilising machine learning algorithms to handle biomedical data, the Autumn School covered: the effects of experimental design, data readiness, pipeline implementations, machine learning in Python, and related statistics, as well as Gaussian Process models. Providing practical experience in the implementation of machine learning methods relevant to biomedical applications, including Gaussian processes, the event illustrated best practices that should be adopted in order to enable reproducibility in any data science application.

Cambridge Networks Day 2019 (6th Edition)

29 August 2019

Network Science (interdisciplinary field) has methods widely applied to problems and datasets in fields as diverse as computer science, ecology, neuroscience, archaeology, medicine, economics, social sciences and engineering. Cambridge Networks Network (CNN) brings together academics from across the university and beyond who share an interest in Network Science.

140 registrations came from a diverse range of interdisciplinary backgrounds, with technology, physical sciences, biological sciences and the humanities well represented. Early Career Researchers were well prepresented in poster presentations, the poster prize and travel grants.

CNDay 2019 was kindly supported by The Alan Turing Institute, C 2D3 and King's College Cambridge.

Machine Learning for Environmental Sciences 2019

17-18 June 2019

This joint British Antarctic Survey (BAS) and University of Cambridge organised workshop followed on from the 2017 conference on Environmental Science in the Big Data Era (also hosted by BAS and University of Canterbury).

An improved understanding of the natural environment and ability to predict future changes is crucial for society and the global economy. With ever growing volumes of data produced through both increased environmental modelling capability and technological advances in earth observation systems, techniques to harness the power of this data and extract useful information have never been more important. Recent years have seen an acceleration in the use of Data Science techniques being applied within the environmental sciences. The application of Machine Learning to this new area has also identified a number of new and interesting challenges to the data science community, with new data challenge requiring bespoke machine learning tools to deliver the next wave of scientific breakthroughs.

Activities included in the workshop:

Recognising recent achievements and proven concepts in the research area, with keynote talks from Claire Monteleoni (University of Colorado) and Emily Shuckburgh
Presentations from workshop participants on preliminary work and results
Early Career Researchers were encouraged to attend and given a reduced price
Hands-on data challenge, aimed to be inclusive of all levels of expertise and to encourage participants to work together and share expertise
A conference dinner to further encourage networking and to enhance potential collaborations.

Advances and Challenges in Machine Learning Programming of Languages

20-21 May 2019

The development of machine learning programming languages is critical to support the research and deployment of ML solutions as data-size and model-complexity grow. These languages often offer built-in support for expressing machine learning models as programs and aim at automating inference, through probabilistic analysis and simulation or back-propagation and differentiation. Machine learning languages enable models to be deployed, critiqued, and improved, support reproducible research, and lower the barrier for the use of these methods.

This workshop brought together researchers from both academia and industry, to discuss recent advances and challenges in machine learning languages development and research.

The workshop was supported by American Statistical Association, C2D3, Alan Turing Institute, Isaac Newton Institute (University of Cambridge), International Society for Bayesian Analysis

Winter School: 5th International Winter School on Big Data

7-11 January 2019

Held at the Department of Engineering in collaboration with the Institute for Research Development, Training and Advice.

BigDat 2019 was a research training event with a global scope aimed at updating participants on the most recent advances in the critical and fast developing area of big data, which covers a large spectrum of current exciting research and industrial innovation with an extraordinary potential for a huge impact on scientific discoveries, medicine, engineering, business models, and society itself. Renowned academics and industry pioneers lectured and share their views with the audience.

Most big data subareas were displayed, namely foundations, infrastructure, management, search and mining, security and privacy, and applications (to biological and health sciences, to business, finance and transportation, to online social networks, etc.). Major challenges of analytics, management and storage of big data were identified through 2 keynote lectures, 24 four-hour courses, and 1 round table, which tackled the most active and promising topics.

An open session gave participants the opportunity to present their own work in progress in 5 minutes. There were also two special sessions with industrial and recruitment profiles.

The event provided an avenue to advertise C2D3, the Alan Turing Institute, University job opportunities, and Aviva job openings.

Cambridge Big Data Research Symposium

26th November 2018, Sainsbury Laboratory Cambridge University (SLCU)

This one-day Symposium showcased cross-disciplinary research, and also highlighted research challenges, with a particular focus on projects involving biosciences and clinical medicine.

Ethics of Big Data Workshop

10 June 2016

The workshop supported an interdisciplinary conversation at the University of Cambridge about the ethics of big data research. Its aims were both to raise awareness of ethical issues associated with big data and to contribute to the development of material for the Research Group’s digital reader - a publicly accessible, interactive online resource on the ethics of big data research.

We invited speakers from the worlds of academia and policy to discuss the ethical challenges of big data research. The Ethics of Big Data team also presented an overview of our findings from the year’s programme of activities, including the development of innovative formats for developing discussions about ethics in research through the performance of a mock ethics review.

The Ethics of Big Data Research Group are developing an Ethics of Big Data reader from the discussions over the past year. This will include case studies and performance notes and scenarios for researchers looking to stage a mock ethics review panel as a tool for engaging researchers, students, or practitioners in other contexts in discussions about ethics in big data research. A journal article is also in preparation.

Human-Data Interaction (20 April 2015, 14:00 - 16:00)
What is Big Data? Discovery through a Data Walkshop (7 October 2015, 14:00 - 16:00)
Inside Snowden’s suitcase (21 October 2015)
Ethics of Big Data in practice: Health and Policy research in Africa (13 January 2016, 12:00 - 14:00)
Ethics of Big Data in practice: Patient record linkage in hospitals (27 January 2016, 12:00 -14:00)
Ethics of Big Data in practice: Administrative data (10 February 2016, 12:00 - 14:00)
Ethics of Big Data in Social Media Research (24 February 2016, 12:00 - 14:00)

Data Science for Smart Infrastructure

31 May 2016

This collaborative workshop brought together researchers in data science with the Centre for Smart Infrastructure and Construction, to address challenges in management of data from distributed sensor networks, as well as techniques for the analysis and optimisation of traffic data.

Our Digital Future - Multidisciplinary Perspectives on Long Term Data Preservation and Access

14-15 March 2016

The conference brought together 72 delegates from institutions across the UK and Europe to discuss approaches to the preservation and long-term curation of digital data in a range of disciplines.

The focus of the conference was largely around scientific and research data, but also looked at the challenges in national archives and memory institutions. Future areas for research include personal data archives, particularly those involving new forms of data, such as social media, online interactions, email and photo which have unique sensitivities, for example their vulnerability to changing commercial policies and discontinuation of services.

The specific objectives were

- To assemble a broad and diverse community of interest

- To identify key shared challenges and share knowledge and expertise in digital preservation

- To better define the required areas of research, including technology research

- To assess and define additional areas of training, education and skills development in long tern data preservation for science and research

- To inform the case for sustained investment in preservation and in education around preservation models and their associated cost

The range of disciplines covered in the talks included high energy physics, astronomy, infrastructure modelling, bioinformatics, libraries, archives, history, policy, medical research and law.

Videos of the keynote talks, and slides for the majority of the talks on both days of the conference are available at the conference website, along with a report.

Big Data, Multimodality & Dynamic Models in Biomedical Imaging

In partnership with the EPSRC POEMS Network and the Turing Gateway to Mathematics

9 March 2016

We are currently experiencing many new exciting developments in imaging technology in biology and medicine. New advances in tomographic imaging, such as photoacoustic tomography, electron tomography, multicontrast magnetic resonance tomography (MRT) and combined MR with positron emission tomography (PET), as well as new technology in microscopy such as lightsheet microscopy, only mark the beginning of an era which revolutionises the extent of what we can see. New imaging technology always goes side by side with the need of mathematical models to maximise the information gain from these novel imaging techniques.

This one day meeting aimed to bring together those working on advances in imaging technology with researchers who investigate new image analysis methods, to help address these challenges. In particular, there was a focus on the following topics:

Big data problems and solutions
Multimodality
Dynamic imaging

Manufacturing Analytics: The role of Big Data in the Future of Manufacturing

1 February 2016

IfM hosted a workshop on exploring the role Big Data will play in the future of manufacturing. The objective of the meeting was to identify research priorities that address the specific challenges encountered by Manufacturers in using data science. Four key topics were discussed:

• Challenges of Big Data analytics in manufacturing
• Best practices and applications for manufacturing analytics
• Technologies and ICT infrastructure for Big Data analytics and deployment
• New business models for the smart manufacturing systems

The workshop identified seven main challenges of big data analytics in manufacturing. These are: 1- Awareness and acceptance, resulting from resistance to cultural change, 2- the need for Big Data Standardisation, 3- challenges relating to Data management issues such as data quality and integration, 4- Financial constraints such as the perceived value of big data, 5-Knowledge and skillset needed to implement analytics solutions, 6-Policy and government support, and 7-the need for repeatable best practice Implementation processes.

A report on recommendations for future research is forthcoming.

High Dimensional Big Data Engineering

22 January 2016

This EPSRC-funded workshop brought together high-dimensional big data researchers from academia with practitioners from industry. The presentations given by the invited speakers covered state-of-the-art research and cutting-edge technologies, covering both the theoretical foundations of big data analysis and the algorithms and data structures required for high performance in analysis, indexing and search. As well as enhanced collaboration networks, outcomes include:

A publication in IEEE Transactions on Big Data, on approximation of high-dimensional data (publication)
A publication in IEEE INFOCOM conference, on how to efficiently transmit data (publication)
A working paper on optimal data structure, in preparation of submission (working paper)
A functioning system on high-dimensional data, Kvasir project
A successful proposal for a workshop on “Advances in High-Dimensional Data” at the IEEE Big Data Conference in December 2016 http://cci.drexel.edu/bigdata/bigdata2016/. The website of the previous year’s workshop can be found here.

Neurocomputation: from Brains to Machines

In collaboration with Cambridge Neuroscience

25 November 2015

On 25^th November 2015, over 70 researchers from across the University of Cambridge gathered for an interdisciplinary workshop at Corpus Christi College on Neurocomputation: from brains to machines, chaired by Professor Zoe Kourtzi and organised by Cambridge Neuroscience with support from Cambridge Big Data. The aim of the workshop was to advance our understanding of how biological and artificial systems solve sensory and motor challenges and brought together speakers from a range of disciplines, from cognitive neuroscience and brain imaging to engineering, computer vision and robotics. The goal was to encourage dialogue using a common language of computational techniques that allow us to extract informative signals from rich biological data and design artificial systems with practical applications.

A recurring theme of the workshop was the progress that has been achieved in developing computational models of the brain that are also biologically, highlighting the importance of a continuing dialogue between biological scientists and engineers to uncover, and take inspiration from, the mechanisms underlying brain function and cognition.

See here for a summary of the talks.

Big Data Methods for Social Science and Policy

In collaboration with Cambridge Public Policy

24 September 2015

Big data is expanding in its contribution across the social sciences and in public policy. While there are many practical, technical and ethical questions associated with this trend, a recent Cambridge workshop focused on its practical application across the social sciences. The workshop was hosted by two interdisciplinary research initiatives at the university, in Big Data and Public Policy. It brought together researchers from a range of departments and disciplines to present and discuss big data applications in social science research and public policy.

A full report of the workshop is available here

Data and Sensing in Extreme Environments

In collaboration with the British Antarctic Survey

7 September 2015

Remote and extreme environments present new challenges for scientific data acquisition, processing and transfer. From extremely cold and remote locations in the Antarctic to novel geotechnical sensing technologies and the Big Data challenge of the Square Kilometre Array, which will operate from remote desert locations, scientists share challenges of accessibility, physical conditions, power supply and networking.

This joint workshop between the University of Cambridge and the British Antarctic Survey brought together researchers, developers and engineers addressing complementary challenges in data acquisition, transfer and processing, to share knowledge, develop new connections and collaborations, and experience hands-on demonstrations of state of the art data acquisition and processing technology.

Big Data in Medicine: Exemplars and Opportunities in Data Science

19 June 2015

The data generated by medical care and medically relevant research are rapidly becoming bigger and more complex, particularly with the advent of new technologies. Our ability to advance medical care and efficiently translate science into modern medicine is bounded by our capacity to access and process these big data. From human genetics and pathogen genomics to routine clinical documentation, from internal imaging to motion capture, from digital epidemiology to pharmacokinetics, and from treatment pathways to life course assessment, the big Vs of Big Data - volume, variety, velocity and veracity - abound in medicine. Statistical, mathematical, visualisation, and computational approaches, from a wide range of disciplines, as well systems for innovative ICT-based interventions are needed to keep apace of the complexity in Big Data and to advance medicine.

On 19th June 2015 at the Cancer Research UK Cambridge Institute, Cambridge-based researchers from all Schools of the University and local research institutes, the pharmaceutical industry and our funding and commissioning partners met for an afternoon of talks demonstrating methods and opportunities for harnessing Big Data in medicine.

The Vocabulary of Big Data

26 January 2015

Big Data is everywhere, spanning the entire range of academic research. No matter what you do – from humanities to natural sciences, from social sciences to engineering to medicine – you are bound to come across copious amount of data: this is the outcome of modern technology which allows us to collect, measure and sample. Yet, data on its own, no matter how “Big”, is of little use. The challenge is to distill Big Data into actionable, useful information. This requires a range of tools from mathematics, statistics and computer science, methodologies which might appear intimidating to the uninitiated.

On 26 January 2015, eight Cambridge academics gave short presentations introducing The Vocabulary of Big Data, the range of concepts and ideas which underlie modern analysis of large data sets. In a maths-free manner, light on technicalities yet rich on content, they aimed to outline the meaning and intuition behind the methodologies. Click the link above to view the talks.

Research Challenges

Research challenges are areas of interdisciplinary strength at Cambridge, for which C2D3 provides ongoing support. This includes assisting in the organisation of workshops and seminar series, preparing applications for research funding, and engaging external partners to develop new collaborations and research directions. C2D3 currently supports the following Challenge areas:

Algorithms and Systems for Energy Efficient Computing

Joint activity between the Energy@Cambridge and Big Data Strategic Research Initiatives (2015- )

Algorithms and Systems for Energy Efficient Computing is a joint Grand Challenge between the Energy@Cambridge and Big Data Strategic Research Initiatives. The ICT industry has proved a major stimulus for world-wide economic growth over the last two decades, but this has come at a cost in terms of growing energy demand. Increased energy efficiency is required, not just for environmental reasons, but to exploit opportunities using Big Data concepts. This Grand Challenge focuses specifically on: energy efficiency algorithms; novel energy-efficient architectures for Big Data challenges; and energy-efficient system design, data management and programming models, with strong links to the Energy@Cambridge Grand Challenge in Materials for Energy Efficient ICT.

The Ethics of Big Data

A CRASSH Faculty Research Group (2015- )

As a society we are creating ever-larger volumes and varieties of data, which are also being shared at increasing velocities. The embedding of sensor networks in ‘smart cities’, the rapid expansion of mobile phone and particularly mobile internet use and the growth of social, political and cultural interactions on social media platforms are some of the factors behind this phenomenon. Methods and tools for the computational analysis of such massive and complex datasets are being adopted in a wide range of settings by governments, international institutions, corporations, civil society organisations and academic researchers. However, the growing prevalence of big data research across the disciplines, has significantly outpaced our knowledge of its ethical ramifications (boyd and Crawford 2012).

The aims of this interdisciplinary Ethics of Big Data Research Group are to explore these ethical ramifications and to develop concrete resources for scholars conducting big data research. In addition, the Research Group intends to contribute to our understanding of research ethics more broadly in terms of their relationship to rapidly evolving research practices and in terms of how they translate across disciplines.

The programme of the research group for the first term will focus on the ‘what’, ‘how’, ‘who’ and ‘why’ of the Ethics of Big Data, which we will explore through a series of public seminars and workshops.

What is big data?
How is big data produced?
Who creates, collects and researches big data?
Why do ethical approaches to big data matter?

These questions naturally provoke others: What is old and what is new in big data research? Are there specific ethical challenges arising from big data research? Which ethical frameworks can we use or adapt to meet those challenges?

Search form

Cambridge Centre for Data-Driven Discovery