Legacy Excavations and Linked Open Data: A Virtual Vision of Sir Leonard Woolley’s Ur

Figure 1: Woolley’s large workforce in action

By: W.B. Hafford, University of Pennsylvania

Digital data plays an ever increasing role in archaeology. Archaeologists use computers for virtually every task, from artifact recording to site mapping, and the amount of data we gather is staggering. This is a good thing, but proper management and archiving of the data can overwhelm a dig crew. Take, for example, field photos. Sir Leonard Woolley, digging at the ancient city of Ur some 90 years ago, took 2,350 photos over twelve seasons. A modern excavation could easily take that many in one season, perhaps even one week. With digital cameras in the hands of every trench supervisor and potentially every excavator, no angle of the site need go unrecorded; but collecting and labeling every photo is tedious and not always accomplished in a way that allows future archaeologists to make sense of the system and recover every image of a specific area. Even Woolley’s comparatively small collection of photos has lost some of its identifying data or never had it attached in the first place.

In digital terms, the necessary organization involves the assignment of metadata, information that allows a particular image or document to be identified, its organizational schema to be reconstructed. Where once we put papers into various filing cabinets or stuck notes into artifact bags or on the back of photo prints, we now must tag digital files and folders (while continuing with notes and papers as well). Moreover, as anyone who has tried to compare information from two or more sites knows, it is helpful if there are similar tags on similar files from similar digs. But uniting the archaeological discipline under standardized nomenclature may not be possible. Even if we could identify unifying elements of a culture process, there are almost infinite variations of objects and adaptations across cultures. Furthermore, archaeologists have been studying these different cultures in different regions for so many years that typologies and sequences are embedded in our sub-fields — even if they could be pushed into an overarching scheme, we might well be reluctant to take the time to do so.

Perhaps by linking similar ideas with different terms through a semantic process, computers will eventually be able to connect different schemas for us (on the semantic web see w3 at this link). This is not yet completely possible, but it is slowly becoming a reality. And by linking such data we will hopefully be able to find patterns and differences, and come to understand our data in new ways.

Many governments around the world are embracing open and machine readable formats for their modern data. The Economist recently touted this as a game-changer, adding: “But no one has a clue what breakthroughs open data will allow” (May18-24, 2013 p.73 “Open data: A new goldmine”). Such is certainly true for open data concerning the ancient world, but that is what is so exciting about it — increased access and interconnections among scholarly data can only increase research potential and understanding in general.

In order to connect data, ancient or modern, it first needs to be accessible and machine readable. This is the goal of Linked Open Data (LOD). It essentially involves the metadata of metadata, organizing into still larger schemas for cross-analysis (on linking information for the study of the ancient world specifically, see the Linked Ancient World Data Initiative).

It is not solely the most modern data from excavations dug in the digital age itself that can be tied together in this machine readable way. Legacy data, that material from digs of the bygone days of archaeology, can and should be digitized, organized, and made available as well, to allow for reanalysis and comparison. In fact, this is a good place to start since it avoids some of the problems with recent excavations as far as timing is concerned. Many archaeologists are understandably reluctant to put their ideas and data out immediately in raw form since they wish to write them up after due consideration and analysis. Old excavations have already gone through the process of analysis and in most cases the original excavators have published at least some of the data. But that does not mean that more research cannot be done. Indeed, research should be ongoing.

The Ur excavations are a good example of the huge scale of work in the archaeological heyday and an excellent target for complete online publication to allow for continuing research. Twelve seasons of intensive excavations with an average of 170 workmen per year generated a tremendous amount of data (see figure 1). Even the ten published volumes on the excavations and the nine on cuneiform texts from the site could not cover everything. The archival documents that are rarely seen are key to filling in the holes, as well as key to understanding Woolley’s excavation and recording methods. By making them available and linking them to published discussions, museum online records, drawings, field photos, and studies both old and new, the Ur project will be facilitating research and furthering education.

Ur of the Chaldees: A Virtual Vision of Woolley’s Excavations, undertaken with lead funding from the Leon Levy Foundation, will combine the efforts of teams of scholars from the British Museum and the Penn Museum in digitizing legacy materials they acquired through joint excavations at Ur. As part of the increasing ancient world Linked Open Data movement, the Ur project will make all its information on the ancient city, excavated from 1922-1934, available in a sharable, machine readable format. This will allow for comparison but also organize it internally so that Woolley’s efforts can be reanalyzed in a way that applies modern understandings of the Ancient Near East to his fundamental work.

Though timing and preserving first rights to data are usually not problems with such legacy data, many other issues arise. First, it is expensive and time-consuming to digitize and organize. Plus, it lacks some of the appeal and immediacy of new excavation. More limiting, however, is the amount of recording done on the original excavation and the original team’s ability to describe their recording systems so that we can understand and reconstruct them in the digital representation. In essence, there will be two schemas in the new, digital online publication — the original, and the one that organizes the digital version. Ideally, the digital will reconstruct the old, but it must also make it usable and be able to incorporate later additions, such as publications under new numbers by modern researchers.

At the core of this project is the material stored in the British Museum and the Penn Museum, since these were the excavating institutions that together received half of the Ur finds. The other half resides in Baghdad at the Iraq National Museum and will be added as and when their inventories allow. But much of the archival documentation is stored in London and/or Philadelphia. For example, Woolley’s field catalogues, comprising more than 15,000 handwritten index cards, and his field notes, at least another 4,500 cards, are stored at the British Museum. These handwritten notes were used to guide publication, but many contain information that can fuel new research.

We now have scans of all of these documents, already increasing their ability to be seen. No longer must one travel to London and handle the originals; a scan of any card can be sent digitally. But we want to go much farther than that. We have already moved the catalogue cards into a database, transcribing the information contained there and marking which have original drawings so they can be found in a search. These will be linked to records of each object so that field catalogue entries as well as later museum records will be available through a single reference. The field notes, however, are a bit more problematic (see figure 2). They are much more extensive in their handwriting, and often contain sketch maps of locations and/or artifacts. Transcribing them is a huge task, requiring much time and money. Yet, there are many people who enjoy history and archaeology and who are willing to volunteer their assistance; thus, we are recruiting them on a virtual project in a virtual way, through UrCrowdsource.org. This site has around 2,000 documents currently uploaded in scanned form and around 650 have already been transcribed by volunteers.


Figure 2: One of the field note cards from Ur

The site uses completely open source software, Omeka with Scripto plugin. It can be viewed and searched by anyone, but those who wish to transcribe need to request a login. There are help files that note Woolley’s quirks of handwriting as well as list many of the terms that may be unfamiliar to those not involved in Near Eastern archaeology.

UrCrowdsource.org can be made better and we hope to do much more with it. For example, tagging of files needs to be streamlined for ease of file location and to ensure that every occurrence of, for example, a field number, is gathered in one search through the final website we are creating to present the Ur data online. We also hope to create a forum for volunteers to easily communicate with each other. Currently, there are discussion pages for individual files, but it can be difficult to find any particular query. By having a communication conduit, one person who solves a particular problem can post about that issue and others with similar problems can learn from it rather than having to invent their own solution. Transcription volunteers are essential to the Ur project and have already made a tremendous difference in the workload.

Some of the records being transcribed are not field notes, but letters, accounts, and reports from the field. The 1920s and 30s formed an interesting period for archaeology and for the Middle East, with the formation of many modern nations. As such, these documents are of interest to the modern historian, pointing as they do to the issues of the day. Thus, the data we are making available is not solely for archaeological discovery but for many other purposes — scholarly, educational and general interest.

Much of the data, however, will be of primary interest to the intensive archaeological researcher. It is currently difficult to perform unified research on Ur, since objects from a single tomb, for example, may well be in all three of the primary museums and some may even be housed in peripheral museums. Furthermore, many of the objects from Ur have lost their connection to their original field records over the years — museum numbers having taken precedence over field numbers or the notes that once sat in old artifact bags having long disappeared or become illegible. By reconnecting that information and placing it online, we are not just increasing access but also reestablishing as much context as possible.

As all archaeologists know, context is paramount. And as we move even further into the project, we will link all findspots to maps so that distribution can be observed for any single artifact or group of artifacts. Naturally this is limited by the level of detail with which Woolley originally recorded object location. Unfortunately, he did not record provenance for every object and we are increasingly finding that he did not assign field numbers to every object, but our efforts have already reconnected many artifacts to their field data and it is already beginning to reveal things we did not know before.

The main goal right now is not to reinvestigate, however, but rather to organize, interconnect, and make available all of the records of the excavation, starting with a test site within one year. Pictures of every available artifact and every original record will be online as will transcriptions of notes. If we cannot make publications available due to copyright, we can at least refer to them, providing a complete bibliography on Ur. Most importantly, as more sites are published in this digital way, our data can be interconnected beyond the site level through machine readable formats and stable URIs, and hopefully linked through semantic web or similar techniques to allow for extensive research that will enhance and multiply our understanding of the ancient Near East.

 Brad Hafford is Leon Levy Foundation Research Associate, Ur Digitization Project at the University of Pennsylvania Museum of Archaeology and Anthropology. He blogs at Travels and Travails tweets @BradHafford.


