This is a guest post from Dan Pett of the Portable Antiquities Scheme
About 3 months ago, Matt contacted the Portable Antiquities Scheme via Twitter, to ask if we wanted to be involved in his History Hack Day event that is now imminent. Our project, which is funded via the DCMS, is based at the British Museum and has staff members based around England and Wales and records public archaeological discovery. This project has been running since 1997 and Nationally (covering England and Wales) since 2003, with our entire dataset available online from April 2003. Many of you will have come across our work due to the sensational discoveries that have made the media over the last few years - for example the amazing Anglo-Saxon Staffordshire Hoard, the Crosby Garrett Helmet and the Frome Hoard. All of these objects were found by metal detector users, and around 70% of our data comes from the hobby sector. All of these data are collected and published under a Creative Commons Non Commercial By Attribution Share-Alike licence (we have over 60 partners, so getting them to agree to this was quite a feat!), on our website.
Staffordshire hoard
Frome hoard
Crosby Garrett helmet
In March 2010, a new website was launched (only costing £48 to rebuild and the cost of two new servers), with all our web resources consolidated in one place at with these data made available in a wide variety of formats for consumption by a diverse (but very niche!) audience. Our aim is to attempt to record as many of the archaeological objects found annually as we physically can (in the last few months we have started to crowdsource records via public data entry), with geo referenced find spots that can be made available at high resolution to academics for research. At the time of writing, we have c.675,000 objects from 18,500 contributors, available for academic use, with around 500,000 objects available to the public. The nature of our records and people's privacy means that we do have some issues with releasing full resolution data online. For example, some sites are extremely sensitive in archaeological terms; for others the landowner requests we do not publish the location as to where the objects were found; our workflow process removes some records from public view as they are usually unfinished. At the moment, our data is also undergoing a huge geospatial audit to eradicate mistakes in find locations. Some of our findspots appear to be in the sea or in the wrong county; as with all databases there's always some errors apparent.
Our website and database (which is still very much a beta iteration) is built on the latest Zend Framework (php) and uses Linux and MySQL to power the services with a large sprinkling of api use and a liberal dosage of YQL to enhance enhance and supplement the data that public voluntarily reports and records online. It has been built entirely by the Scheme's ICT Adviser (Daniel Pett - @portableant). At the moment we use apis from:
- Flickr
- Geoplanet
- Amazon web services (book search and also S3 for backup and storage)
- Akismet
- Twitter
- Wordpress XML-RPC
- Google maps - using layers from OpenStreetMap and the National Library of Scotland
- Google analytics for determining most visited content etc
- Gravatar
- Delicious
- Open Calais
- The British Museum's Collections Online Opensearch module
- ReCaptcha
- They Work for You (Parliamentary data)
- The Guardian's open platform for retrieving articles related to our work
- dbPedia
- Pleiades
- Facebook graph
Our website is heavily reliant on YQL (probably a bad thing in some ways, and if you are going to use YQL on a high traffic site, consider using the oauth endpoint) for querying these apis to produce data for consumption and redisplay. Using YQL, we have been able to pull in all our images from our flickr feed and redisplay them on our website, retrieve geodata from Yahoo! Geoplanet in the form of WOEIDs and the subsidiary information that they hold (a good standalone php package without the need for YQL has been created by Tyler Bell and can be accessed on github) and find objects within an MP's constituency - for example David Laws which you can also get as KML or JSON - using data from theyworkforyou. Other webservices that have proved to be extremely useful included dbPedia; we have guides for people to learn more about coins in different periods and these have been embellished with details from dbPedia (abstract, date of birth etc) using a SPARQL query and cleaning the response up for redisplay. So for example a data base entry for Cnut the Great. Cnut is famous for trying to show that his powers extended over the sea (Henry of Huntingdon chronicled this tale) when he commanded the tide to stop. Needless to say, it didn't.
Above, it has just been demonstrated how 3rd party services have been integrated to make a website that details historical and archaeological discovery and enriched the content. However, this site can also be leveraged for the HistoryHackDay challenge. We have made available a snapshot download of our database in CSV format, which supplements the data that you can retrieve from the online site (context switched views allows for XML, KML,rss,atom and json data responses and a full api is nearly complete).
For example a search for Gold from Bedfordshire can be represented as a html response at: http://www.finds.org.uk/database/search/results/material/23/county/BEDFORDSHIRE/ and as a json response by adding the format/json parameter as shown here: http://www.finds.org.uk/database/search/results/material/23/county/BEDFORDSHIRE/format/json or as KML at http://www.finds.org.uk/database/search/results/material/23/county/BEDFORDSHIRE/format/kml (At present the RSS/ATOM feeds don't contain image data, but this might change by this weekend and some JSON feeds are being tweaked as you read this.) The XML response is compliant with the MIDAS specification that was developed by the Heritage Standards group and maps to the CIDOC-CRM (if you like that sort of thing....)
Simply, any search returned from the MySQL powered search engine can return data in formats that should be able to help you produce something interesting. If you are interested in datamining our database, then OAI-PMH might be of interest to you and this can be reached via our target; instruction at http://finds.org.uk/database/oai with various metadata schemas. Perhaps you could use Omeka and their harvester plugin, to create a site of objects from various collections for a region of England.
The data snapshots that are available include (data relating to county of discovery, object type, numismatic details (coins) etc). These should provide a good point for doing some visualisation work, but be aware that there is html in several fields:
a) a set with 1km grid references - c. 122,000 records approx - due to the privacy concerns mentioned above, the full data cannot be released and so 300,000 findspots cannot be made available. These can be manipulated for lat/lon pairs and Yahoo or geonames lookups quite easily (this has been done for the full resolution references already, but it challenges you....)
b) a second set with county level data - c. 382,000 records (not all of these have findspots)
There are a variety of other museum and university based resources that you can access to help with hacks that you might build. For example, the collaborative Pleiades project, which is currently NEH funded, has data for 31,559 ancient places, 26,060 ancient names and 31,527 ancient locations. Great museum apis can be accessed via a variety of YQL tables, for example the Victoria and Albert (test search for object ID 012345), Brooklyn Museum (needs an api key),Digital NZ (api key needed), Black Country History Museums(test search for painting), Museum of London (test search for Roman sites), and the British Museum (test search for Egypt). And then of course there is the Culture Grid api, others will cover that in more detail.
A very simple example hack shows a search for Egypt across a variety of institutions using YQL for multiple queries of disparate resources. Another great example (the mashificator) of querying various museum resources was created by Jeremy Ottevanger from the Imperial War Museum, but this doesn't use YQL and is driven by Jeremy's magic.
If you want to talk more about using our data for any of your hacks this weekend, get in touch via twitter at @portableant or find Dan on the day.