The web is an integral part of our daily lives, whether we are shopping online, booking cinema tickets, registering to vote or checking whether or not it is going to rain today. It is also of enormous importance to researchers in the humanities and social sciences: as the site of digitised historical material, as a primary source in its own right, and as a means of promoting and communicating research to the widest possible audience. It is hard to imagine how you would write the history of the late 20th and early 21st centuries without access to all of this data. Where once we had handwritten diaries, we now have blogs; letters are superseded by Facebook status updates; our newspapers have become major online resources; and carefully curated Flickr collections have taken the place of photo albums.
The archiving of this vast range of material is increasingly occupying national memory institutions such as the British Library and The National Archives in the UK. However, as things stand, we do not have the expertise, the tools or indeed the legal framework to allow us to exploit this invaluable resource effectively. This presentation will explore some of the challenges of working with the archived web, and focus in particular on the work of the Big UK Domain Data for the Arts and Humanities project.