Mark Hedges, Sustaining archives

Archives, physical or digital. All sorts of documents, but many are important to historians, e.g. scraps of paper from early days of computing can be very important later on.

Time consuming to find things. Dangers to sustainability – stuff gets lost, thrown away, destroyed by accident or fire.

Digital archives, easier to access, but often funding runs out and we need them to last.

NOF-Digitise programme, ran for 5 years, ended 6 years ago, awarded £50m to 155 projects. What happened to them?

  • 30 websites still exist and have been enhanced since
  • 10 absorbed into larger archives
  • 83 websites exist but haven’t changed in 6 years since project ceased
  • 31 no URL available or doesn’t work.

Arhives can die

  • Server failes/vanishes
  • Available but unchanged, becomes obsolescent
  • Content obsolete, new material not included
  • Inadequate metadata
  • Hidden archives, stuff’s there but no one can find it
  • Isolated (from the web of data)

Can we involve the community? Most archives have a focus, so there may be a community interested in it.

Can exploit the interest of specific groups for specific archives, e.g. Flickr tagging of photos. But this can be too libertarian, open to misuse. Not appropriate for more formal archives, e.g. tagging often too loose.

Middle way between professional cataloguers on one hand, free tagging on the other.

Split work up into self-contained tasks that can be sent to volunteers to be performed over internet. Problem with free tagging is that it’s insufficiently accurate. Use task replication to get consensus, calibration of performance, etc.

Apply this methodology to digital archives and cultural heritage collections. Want to sustain and enhance the archives. Want specific communities to adopt archives to ensure longer term prospects.

Very early stage project, TELDAP, rich archive of material relating to Chinese and Taiwanese cultural material. But doesn’t have high visibility worldwide. Needs metadata enhancement, etc.

Great Ormond St Hosp historic case notes, e.g. Dr Garrad, chronological view of his case notes. Transcription, mark up key ideas, cross referencing. Specialised knowledge required, so community is retired nurses, doctors, etc.

East London Theatre Archive Project, contains digitised material from playbills, photos, posters. Images have metadata, but there’s a lot of textual information which hasn’t been extracted and isn’t therefore accessible.

Experimenting with variety of tasks: transcription; identification of ‘special’ text,e.g. cast lists which could be linked to list of actors, or play type.

Some images have text but it’s quite complexly arranged in columns, sections, with embedded pictures. So not entirely easy. Would be useful is to divide images into their different section and classify them according to their nature.

Hybrid approach, OCR them first to produce rough draft, then get volunteer contributions rather than starting with original image.

Ephemeral materials produce very important information.

Communities. Different communities: people with intrinsic interest in topic, e.g. academic, professional; local social communities, e.g. schools; history groups, genealogists; international pool of potential volunteers with E London ancestors.

Size of community less important than having an interest in a particular topic. Important to identify people who have an interest in the fate of the archive. Small groups.

Issues to address. Open-endedness of the tasks makes it hard to asses how well it’s going. Can also attract people with malicious intent.

Want to develop guidelines for this sort of community building.

How are volunteer outputs integrated with professional outputs? Resistance from professionals to anyone else doing stuff.

Having volunteer thinkers as a stage in the project, one could have more complex processes, after the volunteers have done stuff, can get pros in to do more specialised XML mark-up, so have a ‘production line’ to make best use of everyone’s skills.

Getting communities to participate in related archives might help people preserve their cultural identity in an increasingly globalised world.