Digitisation of audio data
In qualitative research audio-visual data are typically captured through interview/group recordings, as well as through participant observation.
Sharing audio-visual data can be a challenge for social research data for a number of reasons:
- ethical and consent issues
- quality of recording
- various proprietary formats
- storage capacity
UK Data Archive currently acquires audio-visual data as part of qualitative collections, but not routinely. Both analogue and born digital data are considered. Data collections
offered to the UK Data Archive are evaluated for quality and format. Ethical and legal issues are also considered. Currently audio-visual data are not anonymised as textual data is, i.e. a transcript.
Audio collections are processed, documented and catalogued in the same way as any other data types and are disseminated over the web behind a registration system .
There are three key challenges when considering the storage and dissemination of audio-visual data:
- minimising data storage requirements whilst simultaneously maximising audio quality
- using sustainable open formats for long-term preservation
- using optimal and flexible formats for delivery
UK Data Archive deals with analogue audio data sources including:
- reel-to-reel tape
- audio cassette
- micro cassette (e.g dictaphone)
UK Data Archive undertakes digitisation of audio-visual materials. Depending on the size of the collection, all the recordings or just extracts may be digitised. The UK Data Archive has
digitised a large reel-to-reel collection as well as collections on audio cassette.
Case study of digitisation
In 2006, a large scale reel-to-reel digitisation project of the The Edwardians
, a study already held at the UK Data Archive, was
carried out by the British Library with guidelines on digitisation methods and metadata standards provided by the UK Data Archive. The Edwardians consisted of a total of 537 interviews
recorded on reel-to-reel audio tape with 453 interviews later transcribed as typed, paper documents. The interviews were open-ended (guided by a schedule) and of between
one and six hours duration. In the digitisation project, 1,203 reels were converted by an external supplier at the
recommended frequency of 48kHz/24 bits, with no filtering or post-processing applied, dead air checks and long gaps edited out, and a
Batches were returned to the UK Data Archive every three weeks, where an audit and random listening was carried out, as well as the MD5 checksum confirmed. In total, 2,517 .wav files were
produced, totaling some two terabytes of files. These will be made available for dissemination via mp3 web-based file download and streaming. This collection is too large
for the usual download zips (zip whole collection). Work is ongoing to build a selective file-based download and an open source streaming solution. Extracts of large
audio recording can easily be streamed, thus offering a taster of the audio material.
Metadata for audio visual data
The UK Data Archive aims to capture rich metadata. Audio-visual file metadata is recorded in a database rather than relying on information embedded in the audio file as this can be
lost on conversion. The Library of Congress (LoC) metadata has been chosen as current best practice.
An in-house systematic file naming convention is used for audio-visual data:
- Study numberinterview number 2000int001.rtf
- Study numberinterview number_audio file 2000int001_1.rtf
- Study numberinterview number_clip number
For further advice or specific queries, contact
See also digitisation of textual data