Data management - preservation
A distinction can be drawn between
data management and
digital
preservation.
Data management primarily occurs within the life cycle of the project and is
carried out by the research team, while digital preservation typically occurs
after the research project has finished and is usually carried out by a data
centre or archive - ESDS in the case of research funded by ESRC or The British
Academy.
Researchers should have some apprecation of what digital preservation entails,
since the value of what is preserved depends as much on the efficiency of the
prior data management as on the digital preservation strategy itself. The
Digital Preservation Coalition (DPC) provides comprehensive information on
digital preservation.
ESDS is at the forefront of digital preservation. Data management work is
needed by the grant holder to ensure first class resources for the UK and
international social science research communities. Additional advice and
guidance is provided:
Ensuring authenticity and controlling access
Digital data can be copied, altered or deleted very easily, and this makes it
very important to be able to demonstrate the authenticity of data, and to
prevent unauthorised access to data for ethical, legal and quality reasons. An
important related concept is that of the master file, a formalised and checked
final copy of the data (or other materials), or copy at a certain stage of
development (as opposed to temporary working versions of data and other files).
Key points for ensuring authenticity and controlling access include:
-
assign responsibility for master files to individual members of the project
team
-
restrict write access to master versions (to members of the project team with
responsibility for them)
-
create formal procedures for destroying master files
-
record changes to master files
-
maintain old master files (in case later ones contain errors)
Version control
It is of great importance to ensure that different copies of files, or
materials held in different formats, or information that is cross-referenced
between files are subject to version control, whereby checks and procedures are
put in place to make sure that if the information in one file is altered, the
corresponding information in other files is also altered. Key points include:
-
uniquely identify files (preferably using a Unique Resource Name (URN)
convention)
-
record version and status e.g. draft, interim, final, internal etc.
-
record relationships between items. In many cases the information contained in
a single file is supported by information held in other files e.g. between the
code and the data file it is run against, or between the data file and the
documentation or metadata that relates to it
-
track the location of all items if stored in a variety of locations
-
maintain single master files in a suitable format to remove version control
problems associated with multiple working versions being developed in parallel
Data storage
The Digital Preservation Coalition (DPC) provides details of good practice
regarding
data storage
Security
Security relates to both IT systems, and physical security.
Network security
-
ensure restricted access to files
-
confidential data should not be stored on servers that host internet services
(web or email)
-
especially sensitive material should be stored on computers that are not
connected to a network
Upgrades and patches
-
apply all relevant security-related upgrades and patches to operating systems
and applications as quickly as possible
Physical security of systems
-
locking rooms when staff are absent, limiting access to rooms where computers
or media are held to a few individuals, logging computer media or hardcopy
material that are removed from store rooms, recording who holds keys, etc.
Viruses
-
all project computers should have regularly updated virus detection software