Academics results - LSE

Document Sample
Academics results - LSE Powered By Docstoc
					                                 VIF Project (Academics) Survey Results
Section 1: About you

1. Which of the following best describes your role?
                                                                          Replies   Percentage
Professor                                                                 8.00      13.33
Lecturer / Associate Professor                                            31.00     51.67
Post doctoral research staff                                              3.00      5.00
Student (PhD or other research degree)                                    4.00      6.67
Contract / freelance researcher                                           0.00      0.00
Other (please specify)                                                    14.00     23.33
Academic Related - Student Support Officer
Cluster Associate (ie 50% teaching assistant and 50% PhD)
Dental Education Advisor
Librarian
Librarian
Non-Academic
Part time student and full time usability analsyt
R&D project manager and group manager
Reader (glorified senior lecturer...)
Research Fellow
Senior Lecturer
Senior scientist
Student MSc
Support staff
Teaching Fellow & IT Officer (2 separate roles)

Total replies:                                                            60.00

2. At which institution are you based?

Universitat de Barcelona                                                  1.00      1.67
University of Bristol                                                     1.00      1.67
Brunel University                                                         1.00      1.67
Crop & Food Research                                                                    1.00    1.67
East Asian Studies                                                                      1.00    1.67
University of Leeds                                                                     27.00   45.00
University of Lincoln                                                                   1.00    1.67
London School of Economics and Political Science                                        6.00    10.00
University of Memphis                                                                   1.00    1.67
National Centre for e-Social Science                                                    1.00    1.67
Newcastle University                                                                    2.00    3.33
University of New South Wales                                                           1.00    1.67
University of Nottingham                                                                6.00    10.00
New York University in London                                                           1.00    1.67
University of Otago                                                                     1.00    1.67
School of Oriental and African Studies                                                  1.00    1.67
University of Southampton                                                               1.00    1.67
STFC Rutherford Appleton Laboratory                                                     1.00
University of Strathclyde                                                               1.00    1.67
Swinburne University of Technology                                                      1.00    1.67
University of Tasmania                                                                  1.00    1.67
University of the West of England                                                       1.00    1.67
University of York                                                                      1.00    1.67

Total replies:                                                                          60.00

3. In which country are you based?

Australia                                                                               3.00    5.00
New Zealand                                                                             2.00    3.33
Spain                                                                                   1.00    1.67
United Kingdom                                                                          51.00   85.00
United States                                                                           3.00    5.00

Total replies:                                                                          60.00

Section 2: The question of version identification

4. In your opinion how important do you consider the following identification issues?
a) Ease in identifying definitive published/'final' version

Essential                                                                                                          42.00   70.00
Important                                                                                                          8.00    13.33
Slightly important                                                                                                 7.00    11.67
Unimportant                                                                                                        0.00    0.00
Don't know                                                                                                         3.00    5.00

Total replies:                                                                                                     60.00

Please use the boxes below to further explain your answers if necessary:

Important for linking, citing etc

Most published versions should be not too dissimilar from the definitive "final" version and its usually the
gist of the paper that will attract my attention. Radical changes would be a different issue, of course!
Publishers versions are likely to have errors introduced by editorial staff. I prefer a postprint.
The concept of a 'definitive published/"final" version' seems questionable to me - a hangover from a
publishing model in which printing forze the object in a final form. Digital objects are more flexible and
adaptable. (It is also currently used by journal publishers to obfuscate the fact that their version may contain
uncorrected authorial or copy-editing or production errors, and lack post-publication and updates by the
author.)

b) Ease in identifying all versions

Essential                                                                                                          14.00   23.33
Important                                                                                                          31.00   51.67
Slightly important                                                                                                 9.00    15.00
Unimportant                                                                                                        2.00    3.33
Don't know                                                                                                         4.00    6.67

Total replies:                                                                                                     60.00

Please use the boxes below to further explain your answers if necessary:
But being able to see any version is more important!
Updates would be important to acknowledge

c) Linking multiple versions held in different locations

Essential                                                                                                       5.00    8.47
Important                                                                                                       25.00   42.37
Slightly important                                                                                              17.00   28.81
Unimportant                                                                                                     8.00    13.56
Don't know                                                                                                      4.00    6.78

Total replies:                                                                                                  59.00

Please use the boxes below to further explain your answers if necessary:

But being able to see any version is more important!
I'm not sure that *linking* versions in different locations is particularly important. The discoverability of
different versions is what matters most to me.
If the filenames are clear, and the system isn't being used for syncing changes
If there is a final version available it would be desirable to know it from any other version
Presence of multiple versions is usually a mistake except where a preprint is present for legal reasons. The
most recent version should be the only one online.
Useful

d) Being clear about the differences between multiple versions

Essential                                                                                                       18.00   31.03
Important                                                                                                       28.00   48.28
Slightly important                                                                                              5.00    8.62
Unimportant                                                                                                     3.00    5.17
Don't know                                                                                                      4.00    6.90

Total replies:                                                                                                  58.00

Please use the boxes below to further explain your answers if necessary:
But being able to see any version is more important!
Do you mean: the author should say what content etc differences there is between the versions? (That's the
question I've answered here)
If it's clear whether it's a final/published version then not important to know the differences, unless you are
involved with the project in which case you probably know!

e) Accurately describing the object (effective metadata)

Essential                                                                                                         20.00   33.33
Important                                                                                                         20.00   33.33
Slightly important                                                                                                9.00    15.00
Unimportant                                                                                                       3.00    5.00
Don't know                                                                                                        8.00    13.33

Total replies:                                                                                                    60.00

Please use the boxes below to further explain your answers if necessary:

I mostly search for papers and there the content itself is usually a sufficient description.
Ideally a good thing to do, but who wants to spend their time doing that?
Nobody ever sees the metadata. Well hardly anybody.

f) Clear signposting to the 'best' version for a particular user

Essential                                                                                                         12.00   21.05
Important                                                                                                         25.00   43.86
Slightly important                                                                                                6.00    10.53
Unimportant                                                                                                       6.00    10.53
Don't know                                                                                                        8.00    14.04

Total replies:                                                                                                    57.00

Please use the boxes below to further explain your answers if necessary:

Absurd idea - what is 'best'?
Not sure I can answer that - "best" will depend on what the user needs.
Odd question. How do you predict what is best for particular users, given the potential diversity of their needs?

g) Trust in the version at hand

Essential                                                                                                        33.00    55.93
Important                                                                                                        16.00    27.12
Slightly important                                                                                               5.00     8.47
Unimportant                                                                                                      0.00     0.00
Don't know                                                                                                       5.00     8.47

Total replies:                                                                                                   59.00

Please use the boxes below to further explain your answers if necessary:

But only the published version can be trusted!!
Caveat emptor, just as with paper versions.
If a claim is made about the provenance of this document, I would want to be able to verify it. Otherwise
there is no point in making the claim.
If I know what the version I am holding is then I can use my own discretion.
Trust meaning guarantee of authenticity/provenance trail? I would want to believe that no one has tampered with it, or is claiming a status that it does not have (e.g. published in Natu

5. Do you think that version identification is particularly relevant to some types of digital object more than others?

No, version identification is important for all types of digital object                                          28.00    48.28
Don't know                                                                                                       15.00    25.86

Version identification is more important for the following types of digital object*:                             15.00    25.86

Total replies:                                                                                                   58.00

*Please note that, due to a bug with the Bristol Online Survey software, no results were returned for specific
types of digital object

Audio                                                                                                            0.00     0.00
Datasets                                                                                                         0.00     0.00
Images                                                                                                           0.00     0.00
Learning objects                                                                                                 0.00        0.00
Text documents                                                                                                   0.00        0.00
Video                                                                                                            0.00        0.00

If you think version identification is more important for some types of digital objects please explain
why you think this is the case:

Audio/images/video would tend to be discrete files, and different "versions" would actually be different items
altogether
Changing statements in a text usually involves greater modification than "touching up" editing of images etc.
Data is most likely to be extracted verbatim and/or analysed without further verification or discussion from
these datasources.

Data: Updated or corrected *data* is clearly crucial: data integrity underpins everything else.
Learning objects: In my own case, these are the most fluid, continuously updated, digital objects I produce.
So recency is of relevance.
Text: changes to text documents are likely to contain corrections or updates.
I tend to think of audio, image and video as static. But this could well be because I don't work with them
much, and don't understand the issues. (And audio, images and videos may be learning objects.)
Dating is important when documents or data are liable to be improved or updated.
I only marked this type because it is the one I work usually
If quoting a text need to ensure that it hasn't changed - images etc can be directly incorporated into any derivative work
In these cases versions will normally indicate progress or stage of development.
Text is more often and more easily shared and collaborated on with multiple iterations
The capture process and edit processes for those unchecked have less liklihood of there being versions.
These are likely to be the objects that will go through the most revisions.

6. To what extent have you found it easy to identify different versions of digital objects within
'single' repositories? (This question refers to versions of the same digital object within a single
repository, e.g. an institutional repository)

Different versions of digital objects are easily identifiable                                                    3.00        5.00
Some room for improvement                                                                                        25.00       41.67
Substantial room for improvement                                                                                 14.00       23.33
Different versions of digital objects are unidentifiable                                                         2.00        3.33
Don't know                                                                                                       16.00       26.67
Total replies:                                                                                                   60.00

Do you find versions of any specific types of digital objects harder to identify than others?

Audio                                                                                                            3.00       6.52
Datasets                                                                                                         1.00       2.17
Images                                                                                                           7.00       15.22
Learning objects                                                                                                 6.00       13.04
Text documents                                                                                                   7.00       15.22
Video                                                                                                            2.00       15.22
No, any issues with versions are consistent for all types of digital objects                                     20.00      43.48

Total replies:                                                                                                   46.00

Why do you find versions of these particular types of digital objects harder to identify than others?

Because sometimes the author/originator of the object doesn't clearly identify when changes have been made.
Difference in opinion of what is a version and what is an update or correction varies accross content
creators
Don't understand the question/jargon
If learning objects include computer code then this is harder to identify as it can require careful scrutiny.
More difficult to realize differences.
More of them!
Most commonly used.
Often the source of the image is not given
Poor file naming, lack of date in document, etc.
Sometimes it isn't made clear which version (ie preprint, postprint, author's corrected version) it is you are looking at

7. Which of the following do you usually use when searching for digital objects across multiple repositories?

Internet seach engine (e.g. Google)                                                                              52.00      60.47
OpenDOAR                                                                                                         3.00       3.49
OAIster                                                                                                          5.00       5.81
Intute Repositories Search Service                                                                               8.00       9.30
Do not search across multiple repositories                                                                       9.00       10.47
Other (please specify):                                                                                     9.00    10.47
Again don't understand the question
COPAC
Google Scholar -- not a specific repository search but often brings back repository documents
I use Google Scholar the most
I usually look at people's own pages, and then at the journal's pages. (Only ever looking for papers, not
other types of objects)
Local gateways such as ADT, AuseSearch. Google Scholar is also vital (ie not just plain Google)
Metalib
Overwhelmingly Google; my use of repository search services is sporadic an experimental. My search for
digital objects often goes via the author's web-pages.
Web of Science

Total replies:                                                                                              86.00

8. To what extent have you found it easy to identify different versions of digital objects across
'multiple' repositories? (This question refers to versions of the same digital object within multiple
repositories)

Different versions of digital objects are easily identifiable                                               1.00    1.79
Some room for improvement                                                                                   14.00   25.00
Substantial room for improvement                                                                            17.00   30.36
Different versions of digital objects are unidentifiable                                                    4.00    7.14
Don't know                                                                                                  20.00   35.71

Total replies:                                                                                              56.00

Do you find versions of any specific types of digital objects harder to identify than others?

Audio                                                                                                       0.00    0.00
Datasets                                                                                                    2.00    7.14
Images                                                                                                      1.00    3.57
Learning objects                                                                                            0.00    0.00
Text documents                                                                                              5.00    17.86
Video                                                                                                       0.00    3.57
No, any issues with versions are consistent for all types of digital objects                                20.00   71.43
Total replies:                                                                                            28.00

Why do you find versions of these particular types of digital objects harder to identify than others?

Again because the origin is often unspecified this make it hard to determine what is definitive
Again, most commonly used type, multiple mounting points for documents, ie. author mounted, institution
mounted etc.
More difficult to realize differences.
Multiple versions in multiple repositories.

Section 3: Current practice when creating academic material

9. Which of the following types of academic material do you currently create, or do you intend to
create?

Audio

Currently create                                                                                          16.00   30.19
Plan to create in future                                                                                  9.00    16.98
Do not create                                                                                             28.00   52.83

Total replies:                                                                                            53.00

Datasets

Currently create                                                                                          22.00   44.00
Plan to create in future                                                                                  12.00   24.00
Do not create                                                                                             16.00   32.00

Total replies:                                                                                            50.00

Images

Currently create                                                                                          30.00   58.82
Plan to create in future                                                                                  7.00    13.73
Do not create                                                                                          14.00   27.45

Total replies:                                                                                         51.00

Learning objects

Currently create                                                                                       32.00   58.18
Plan to create in future                                                                               9.00    16.36
Do not create                                                                                          14.00   25.45

Total replies:                                                                                         55.00

Text documents

Currently create                                                                                       57.00   95.00
Plan to create in future                                                                               2.00    3.33
Do not create                                                                                          1.00    1.67

Total replies:                                                                                         60.00

Video

Currently create                                                                                       15.00   28.85
Plan to create in future                                                                               15.00   28.85
Do not create                                                                                          22.00   42.31

Total replies:                                                                                         52.00

10. Thinking about revisions you make to academic material during creation, which revisions do you
personally keep, or do you plan to keep, in electronic form (i.e. on your computer or network drive)
at the end of the process?

Keep all revisions                                                                                     16.00   26.67
Keep only 'major' revisions                                                                            33.00   55.00
Keep the latest version worked on only                                                                 6.00    10.00
Do not keep a personal copy                                                                            0.00    0.00
Don't know                                                                                                        1.00    1.67
Do not produce academic material at present                                                                       0.00    0.00
Other (please specify)                                                                                            4.00    6.67
For learning objects, generally only the latest version (unless the revision involves a one-off change in
learning objectives, from which I may want to backtrack in the future). For text documents, I will keep the
latest version; but if a document is re-published in a different form, I will keep each published form (hence
all major revisions). Note that publication and republication here may be in printed form (e.g. a book
chapter, and a subsequent modified anthologisation of the same chapter): hence I don't think of versioning
as an issue specific to digital objects.
I sometimes have alternative versions. When working with a co-author I keep more of the revision history
than when I am the author. I also keep anonymised and named versions when required by journals. I often
archive discarded material in case I decide to use it later.
Keep all revisions : Usually revise and backup teaching material on a yearly basis, works in progress
papers etc copies are kept on an external medium.
Keep latest version and usually the preceeding version

Total replies:                                                                                                    60.00

11. Are you satisfied with the way in which you organise versions of your own work, on your own
computer or storage medium?

Yes                                                                                                               39.00   66.10
No                                                                                                                19.00   32.20
Don't know                                                                                                        1.00    1.69
Do not produce academic material at present                                                                       0.00    0.00

Total replies:                                                                                                    59.00

12. When creating academic material do you use any of the following to differentiate between
different versions?

Own naming conventions to describe different versions e.g. first draft, final version for a journal article or,
alternatively, rough cut, final edit for a film.                                                                  47.00   42.73
Formal naming conventions to describe different versions                                                          6.00    5.45
Date and/or time stamps                                                                                           29.00   26.36
Numeric approach e.g. the second 'major' revision of an object that has since undergone three 'minor'
revisions could be described as v.2.3                                                                        23.00    20.91
Do not produce academic material at present                                                                  1.00     0.91
Other (please specify)                                                                                       4.00     3.64
also save old versions in a different folder
Different directories named appropriately
E-mail each version and use the metadata in that as id
Unique IDs

Total replies:                                                                                               110.00

13. Does your university/insitution have a digital repository where you can deposit your academic
material?

Yes                                                                                                          33.00    55.00
No                                                                                                           5.00     8.33
Don't know                                                                                                   22.00    36.67

Total replies:                                                                                               60.00

If 'Yes', have you ever placed academic material into the repository?

Yes                                                                                                          25.00    69.44
No                                                                                                           8.00     22.22
Don't know                                                                                                   3.00     8.33
Do not produce academic material at present                                                                  0.00     0.00

Total replies:                                                                                               36.00

If 'No' please briefly explain why this is the case

Copyright issues - I believe if we want to publish something and put it in the repository we need to
correspond with the relevant journal regarding copyright-this additional hassle has deterred me from using
the repository
Not accespted from research students - restricted to staff only
Repository is still under development.
The repository is common among several universities in Catalonia.
We have back up of our work but I don't know if we have a specific "digital repository"

14. In addition to formal publication and dissemination in university/institutional repositories do you
disseminate academic material through any of the following channels?

Personal website                                                                                          23.00   27.06
Departmental website                                                                                      34.00   40.00
Other university website e.g. website for working paper or discussion paper series                        9.00    10.59
Do not make available through any other channels                                                          7.00    8.24
Do not produce academic material at present                                                               2.00    2.35
Other (please specify)                                                                                    10.00   11.76
Personal website' above means my own and personally maintained pages on the Faculty web pages.
Blog
Disciplinary repository (DLIST)
I mean my personal area of the departmental website.
International collaborative subject-specific websites.
Teaching material on departmental website (as above) and university VLE.
Virtual learning environment (not a repository) and a shared network drive with restricted access.
VLE
VLE (Blackboard)
E-mail in collaborative projects
VLE/Portal

Total replies:                                                                                            85.00

15. Which versions of the academic material you produce would you like to see made available
through an institutional repository?

Latest version only                                                                                       38.00   64.41
All 'major' revisions                                                                                     9.00    15.25
All available versions                                                                                    1.00    1.69
Do not produce academic material at present                                                               1.00    1.69
Don't know                                                                                                5.00    8.47
Other (please specify)                                                                                    5.00    8.47
Certainly the latest version. I wouldn't want earlier drafts of the same paper, but might want say PPT
presentation or poster associated with the paper at a conference.
It depends: some journals don't allow you to publish in a personal repository the revised version... I would
like to see in the repository the last publicly available version, which could not be the latest.
Nothin until the final version
only the final, published version.
This depends on the terms of copyright assignment/licensing with any given journal: publisher's pdf;
author's final version (after peer review and editorial correction); in some cases, author's final version with
subsequent updates.

Total replies:                                                                                                    59.00

Section 4: Possible solutions to version identification: Taxonomies

16. If a standard taxonomy for describing versions in the lifecycles of digital objects (e.g. uncut
footage of the film 'Bladerunner', rough edit, cinematic version and director's cut or, alternatively, a
first draft, peer reviewed and publishers version of a journal article) could be developed how useful
do you think this would be?

Very useful                                                                                                       17.00   28.81
Useful                                                                                                            27.00   45.76
Of limited use                                                                                                    9.00    15.25
Not very useful                                                                                                   3.00    5.08
Don't know                                                                                                        3.00    5.08

Total replies:                                                                                                    59.00

If you agree that this would be useful, do you think it would be of particular use when dealing with
versioning issues for any specific types of file?

Audio                                                                                                             5.00    6.33
Datasets                                                                                                          6.00    7.59
Images                                                                                                            6.00    7.59
Learning objects                                                                                                  8.00    10.13
Text documents                                                                                                    16.00   20.25
Video                                                                                                             8.00    10.13
Relevant to all file types                                                                                             30.00   37.97

Total replies:                                                                                                         79.00

Please use the text box below to expand on your answer if necessary

Could be useful to see how ideas develop - though direct contact with the author would generally be more
illuminating though of course not always possible
Difficult to answer across all fields when I work in just some. I would think that if I want information on
version for something, all workers would want the same for their areas/objects.

I don't deal much with other types, though I imagine that datasets would require a very different taxonomy.
I don't think the needs of scholarly writers (peer reviewed, etc.) are the same as those of artists (director's
cut, etc.).
I doubt whether such a convention could be made to stick. (Certainly not if it tries to take over existing
terminology, such as 'first draft', and impose a completely different meaning on it, as suggested in the
section preamble.) Think of the inconsistencies already evident in the use of the terms 'preprint' and
'postprint'.
I think it will be difficult to get academics to stick to a "universal" naming convention.
Models and datasets
Only well known
While useful, unclear how hard it will be to get people to actually use it in practice; just like metadata, people
won't see the benefit vs. effort to create
Whilst it might be relevant to all file types, the wording might need to be different for different types of object.

17. If a standard taxonomy for describing versions of digital objects in relation to other versions of
the same object (e.g a 'full size' JPEG from a digital camera, a compressed JPEG and a thumbnail
image or, alternatively, a DOC, PDF and HTML version of a text file) could be developed how useful
do you think this would be?

Very useful                                                                                                            19.00   33.33
Useful                                                                                                                 25.00   43.86
Of limited use                                                                                                         8.00    14.04
Not very useful                                                                                                        1.00    1.75
Don't know                                                                                                             4.00    7.02
Total replies:                                                                                                      57.00

If you agree that this would be useful, do you think it would be of particular use when dealing with
versioning issues for any specific types of file?

Audio                                                                                                               5.00    7.94
Datasets                                                                                                            3.00    4.76
Images                                                                                                              10.00   15.87
Learning objects                                                                                                    4.00    6.35
Text documents                                                                                                      4.00    6.35
Video                                                                                                               8.00    12.70
Relevant to all file types                                                                                          29.00   46.03

Total replies:                                                                                                      63.00

Please use the text box below to expand on your answer if necessary

But don't we already have such a taxonomy? Terms like thumbnail, DOC, PDF and HTML are in common
use.
Don't we already have this????? Maybe I don't understand this question.
I was recently given 2 versions of a video file, with the words "small" and "medium" in the titles. I assumed
this related to the resolution, which it did, but it the "small" file was actually shorter - having chunks edited
out.
I'm currently working in digital accessibility, and I would find useful to indicate if a PDF is accessible, if a
HTML is valid and complishes WCAG .... but I understand that this could be a very particular view.
Useful to indicate that there may exist another version with higher resolution

18. If a standard taxonomy for describing versions of digital objects and their relationship to other
objects (e.g. how a video file of the complete film 'Casablanca', still images of the movie and a
sound file containing only the audio relate to each other) could be developed how useful do you
think this would be?


Very useful                                                                                                         13.00   22.81
Useful                                                                                                              24.00   42.11
Of limited use                                                                                                      8.00    14.04
Not very useful                                                                                                     0.00    0.00
Don't know                                                                                                          12.00   21.05

Total replies:                                                                                                      57.00

Do you think this would be useful when dealing with versioning issues for any particular
combinations of file? (Please specify)

dataset - model - paper produced would be useful to link
I'm not sure I like this idea 'standard taxonomy' -- don't we already have terms for this? It wouldn't be a good
thing to impose categories that may not be meaningful.
Our particular interest is actually in combining quite different types of files (e.g. those that contribute to an
architectural project) so we're interested in what we might call conceptual taxonomies - A repository
resource may have a jpeg of a window, a video flythrough of a model, a written technical briefing - all of
which would be part of a larger building, building type, and architect's output - so we're trying to develop a
taxonomy that will cover a very large range of issues.
Really need something like this

That would be an increasing demand as new formats like DAISY include many forms for the same
document, and for some users could be interesting getting only part of them (only the audio, only the text...).
Useful for multimedia when it is dissected into parts e.g. audio only of a video.


Section 5: Possible solutions to version identification: Chronological and numeric approaches

19. If a standard indication could be given to show the 'version' of an object (e.g. v.1.1) how useful
do you think this would be when dealing with digital objects?

Very useful                                                                                                         14.00   24.56
Useful                                                                                                              26.00   45.61
Of limited use                                                                                                      11.00   19.30
Not very useful                                                                                                     1.00    1.75
Don't know                                                                                                          5.00    8.77

Total replies:                                                                                                      57.00
If you agree that this would be useful, do you think it would be of particular use when dealing with
versioning issues for any specific types of file?

Audio                                                                                                              2.00    3.39
Datasets                                                                                                           5.00    8.47
Images                                                                                                             3.00    5.08
Learning objects                                                                                                   5.00    8.47
Text documents                                                                                                     11.00   18.64
Video                                                                                                              3.00    5.08
Relevant to all file types                                                                                         30.00   50.85

Total replies:                                                                                                     59.00

Please use the text box below to expand on your answer if necessary

as before for this sort of question/response
I think controlled vocab would be better. I think content creators would find it easier to choose from a list of
terms, rather than just numbers they have to reference
It would involve imposing arbitrary boundaries on what is, in my experience, more likely to be a continuously
incremental process of change.
It's the semantics of the versions that is important, not the labelling.
Need to know what the latest version is ie need to know how many further versions exist to understand how
"finished" the object is

20. If a standard system could be devised for indicating versions through the record identification
numbering system in a repository (i.e. 1st version of an object is given the number 1, subsequent
upload is given the number 2 etc.) how useful do you think this would be when dealing with digital
objects?

Very useful                                                                                                        4.00    7.02
Useful                                                                                                             22.00   38.60
Of limited use                                                                                                     15.00   26.32
Not very useful                                                                                                    6.00    10.53
Don't know                                                                                                         10.00   17.54

Total replies:                                                                                                     57.00
If you agree that this would be useful, do you think it would be of particular use when dealing with
versioning issues for any specific types of file?

Audio                                                                                                        1.00    2.50
Datasets                                                                                                     2.00    5.00
Images                                                                                                       1.00    2.50
Learning objects                                                                                             2.00    5.00
Text documents                                                                                               3.00    7.50
Video                                                                                                        1.00    2.50
Relevant to all file types                                                                                   30.00   75.00

Total replies:                                                                                               40.00

Please use the text box below to expand on your answer if necessary

Date of upload (see below) would do the same job, and be richer informationally.
Deposit sequence may be unrelated to creation sequence
If a search query can return the latest one, fine. If I just get hte metadata and have to scan to find the
highest number; not fine
Order of uploading could not relate to version order.

21. If a date and time stamp information could always be provided to identify versions, how useful
do you feel this would be when dealing with digital objects?

Very useful                                                                                                  21.00   36.21
Useful                                                                                                       24.00   41.38
Of limited use                                                                                               7.00    12.07
Not very useful                                                                                              2.00    3.45
Don't know                                                                                                   4.00    6.90

Total replies:                                                                                               58.00

If you agree that this would be useful, do you think it would be of particular use when dealing with
versioning issues for any specific types of file?
Audio                                                                                                              1.00    2.17
Images                                                                                                             1.00    2.17
Learning objects                                                                                                   1.00    2.17
Text documents                                                                                                     6.00    13.04
Video                                                                                                              1.00    2.17
Relevant to all file types                                                                                         36.00   78.26

Total replies:                                                                                                     46.00

Please use the text box below to expand on your answer if necessary

But even this would not necessarily tell users everything they might want to know. (I can imagine, for
example, earlier versions of a text document being uploaded later than the final version: after all, drafts of a
classic text become interesting, and get edited and published, *after* the final version has become popular.)
Descriptive information is what we need.
If a search query can return the latest one, fine. If I just get hte metadata and have to scan to find the
highest number; not fine
This isn't as straight forward to identify as numbers. plus americans put the dates back to front.
Whilst a date and time stamp can be very useful, it might be misleading unless other versioning information
is available.

Section 6: Possible solutions to version identification: Other

22. ID tags/file properties can be used to store metadata with the object itself e.g. ID3 tags which are
used to store data such as artist name, song title, genre etc. within MP3 audio files and File
properties within Microsoft Word used to store information such as author, subject etc. If a standard
system for adding and completing ID tags for digital objects could be developed how useful do you
think this would be?

Very useful                                                                                                        15.00   27.78
Useful                                                                                                             23.00   42.59
Of limited use                                                                                                     9.00    16.67
Not very useful                                                                                                    1.00    1.85
Don't know                                                                                                         6.00    11.11

Total replies:                                                                                                     54.00
If you agree that this would be useful, do you think it would be of particular use when dealing with
versioning issues for any specific types of file?

Audio                                                                                                    4.00    8.51
Datasets                                                                                                 3.00    6.38
Images                                                                                                   3.00    6.38
Learning objects                                                                                         0.00    0.00
Text documents                                                                                           3.00    6.38
Video                                                                                                    3.00    6.38
Relevant to all file types                                                                               31.00   65.96

Total replies:                                                                                           47.00

Please use the text box below to expand on your answer if necessary

Labelling that indicated last date of modification would be useful
Might need to contain different information dependent on type of object.
Such metadata could identify the source and give information that would allow correct referencing.

23. If former versions of digital objects could be stored by repositories with audit trails created to
record changes how useful do you think this would be?

Very useful                                                                                              6.00    11.11
Useful                                                                                                   11.00   20.37
Of limited use                                                                                           19.00   35.19
Not very useful                                                                                          8.00    14.81
Don't know                                                                                               10.00   18.52

Total replies:                                                                                           54.00

If you agree that this would be useful, do you think it would be of particular use when dealing with
versioning issues for any specific types of file?

Audio                                                                                                    0.00    0.00
Datasets                                                                                                 1.00    4.00
Images                                                                                                        1.00    4.00
Learning objects                                                                                              1.00    4.00
Text documents                                                                                                1.00    4.00
Video                                                                                                         0.00    0.00
Relevant to all file types                                                                                    21.00   84.00

Total replies:                                                                                                25.00

Please use the text box below to expand on your answer if necessary

But the authors might want to keep the audit trail to themselves.
I've not thought about this, but first reaction is: a detailed record of changes won't happen; a vague one
won't be much use.
May be a disincentive to deposit some forms of data. Too cumbersome, too obvious when changes were
made and by whom.
Preservation depends on trust. Why else, but for preservation, have old versions of things in repositories?
my experience with content creators at jorum is that they absolutely do not want this

24. How useful do you think performing file comparisons (i.e. comparing data between files to
establish whether one file is identical to another) is when identifying different versions of digital
objects?

Very useful                                                                                                   19.00   35.19
Useful                                                                                                        15.00   27.78
Of limited use                                                                                                13.00   24.07
Not very useful                                                                                               4.00    7.41
Don't know                                                                                                    3.00    5.56

Total replies:                                                                                                54.00

If you agree that this would be useful, do you think it would be of particular use when dealing with
versioning issues for any specific types of file?

Audio                                                                                                         1.00    2.44
Datasets                                                                                                      2.00    4.88
Images                                                                                                        2.00    4.88
Learning objects                                                                                               4.00    9.76
Text documents                                                                                                 5.00    12.20
Video                                                                                                          1.00    2.44
Relevant to all file types                                                                                     26.00   63.41

Total replies:                                                                                                 41.00

Please use the text box below to expand on your answer if necessary

I have a firmer opinion on this since listening and participating in a brief debate in a Library User Group
meeting recently, we were talking about the importance of verifying that a Repository file (in the case of a
'final revision pre-publication') was really the same as that eventually published by the journal. Had the
author sneakily kept in some stuff an editor or referee wanted out?
I'm not sure about. Probably depends not the object type. However, if versioning is clear, why would I need
this?
It can be useful but mainly as a "desperation measure".
It this more than bitwise comparison; are the canonical forms compared?
Presuming learning objects includes computer code
Sounds time consuming

Section 7: Additional comments

25. Do you have any other suggestions for identifying versions of audio and video files, images,
datasets, text documents or learning objects in digital repositories?

Every page should include copyright and referencing information. It is often very difficult to establish
authorship.
For papers, I think it is not helpful at all to store non-final versions in repositories. What should be
distributed is the final, published version only. Otherwise, it may bypass the peer-review process to an
extent.
FRBR
Learn form the open source software community
Relate the format identification to the software that could read it. Including version.
Standard conventions are the most useful and perhaps and automatic increment system could be
developed.
Standardization of naming conventions would be very useful
26. Have you managed to resolve any problems you may have faced with versioning issues and if so
how has this been achieved?

I have never thought I had such problems.
I have not encoutered such problems.
I stumble sometimes, and have learned to develop my own conventions on my own computer.
In a digital design development environment, especially if using a variety of bespoke software packages
and modules, students can often loose track of their work. Main folders become cluttered, and duplicating
file names can cause problems, typical example being overwriting versions accidentally or when in
hindsight the version may have been useful. Concentration and good organisation helps resole most
problems.
Integrated FRBR with Handle system
Just kept everything myself on my email network - protected, backed up, accessible and secure
Just looking through multiple versions and trying to find a published copy with which to compare - the long,
hard, old-fashioned way!
Not sure I understand this question
No - Sometimes, if there is no naming convention, I have to look at all the files. This is extremely time-
consuming.
On a small scale I just keep backing up and checking date stamps. this is a pain in the neck if it's not
soemthing I wrote myself, so I know what changes I *expect* to see
Only by numeric numbering V1.1 V1.2 etc
Rename the file with new dates because Word overwrites the same file and damages it
Trial and error
Use timestamps or forensic techniques, such as comparisons.
Using dates and/or version numbers in file names
Version numbering and initials of last person who worked on it

We are still at a very early stage of developing our project. Actually, we are initially going to be looking to
work mostly with finished products in the short term, so we don't expect versioning to become an issue.

27. Are there any specific issues regarding version identification that you would like to see
addressed and which have not been covered in the questions above?

I think comparison of versions would be very helpful
Multiple versions created by multiple authors of a single document
Not exactly issues to be addressed - but background information that may throw light on my perspective.
I'm a classicist. Many categories of ancient text never had a stable definitive form; those which did have not
been transmitted in a definitive form (because they've passed through multiple manual recopyings). So the
material I deal with academically is always presented in the form of different modern reconstructions of a
hypothetical form of the text in question. This background may explain why I take a fairly relaxed view of
multiple versions: the earnestness with which this issue is taken but people I talk to in some other
disciplines always takes me by surprise. The version-fluidity that becomes possible in the digital world is
what I'm used to in the material I study, so it doesn't seem novel or threatening; and (as I think I said in one
of my earlier answers) it has positive advantages, considered as the reverse side of revisability.
I'm also a great admirer of Bruckner's symphonies. Bruckner continually revised his symphonies; the
relationship between his revisions and what appears in published editions is variable (and sometimes
opaque); a given conductor may have recorded the same symphony several times (not necessarily using
the same edition); and the same recording may have been remastered and reissued my multiple publishers.
Have a look at http://www.abruckner.com/ to get an idea of the complexity of the problem. The important
thing here is the potential multidimensionality of versionaing: no simple taxonomy is likely to be able to
capture the complexity once version proliferation acquires a bushy, rather than a unilinear, shape.
Rentability! Why waste time and effort comparing versions to put in repositories?
Standardization
The bigger issue is often the directory structure
Variants inside a format. For example indicate not only that it is a Word document, but also that it is a Word
2003 ...
Versioning of metadata vs. versioning of content.
Webpages - obviously continuously updated, which creates difficulties if I need to ref a particular version of
a webpage-can of course state the date webpage accessed
What is a version? What is an upgrade?
es not have (e.g. published in Nature).
                              VIF Project (Information Professionals) Survey Results
Section 1: About you

1. Which of the following best describes your role?
                                                                                                Replies   Percentage
Library staff - directly involved with repository                                               38.00     39.18
Library / IT staff - technical repository support                                               9.00      9.28
Library staff - not directly involved with repository                                           14.00     14.43
University senior management                                                                    3.00      3.09
Research funder or quality agency                                                               1.00      1.03
Library / IT standards development - metadata, OAI                                              4.00      4.12
Library / IT consultant                                                                         4.00      4.12
Other (please specify)                                                                          24.00     24.74
County Record Office staff - imaging specialist
Data archive staff
Data Center Staff - Digital Archivist
Data Centre staff
Data manager
Data services officer: social science data library
Digital archivist within a university
Independent Consultant/ Researcher/ Trainer in library and information industry.
Industrial researcher
Information Analyst
Information staff - not directly involved with repository
IT consultant managing JISC repository projects
IT/IS professional
Librarian working for repository software vendor in product development and customer support.
Library/IT strategy
National research institute - science manager
Programmer / learning technologist
Project manager working particularly in itembanking
Records Manager
Records Manager
Research student in area of repositories
Responsible for digitised course readings - mounting on Virtual Learning Environment, and eventually storing, with relevant
metadata, on Digital Repository
Service manager at national data centre.
SHERPA team member with admin responsibility for the local repo (within IS directorate but not directly library or IT)...
Web and intranet administrator

Total replies:                                                                                                                97.00

2. At which institution are you based?

Aberystwyth University                                                                                                        3.00    3.09
University of Auckland                                                                                                        1.00    1.03
University of Birmingham                                                                                                      1.00    1.03
University of Bolton                                                                                                          1.00    1.03
British Library                                                                                                               1.00    1.03
Brunel University                                                                                                             1.00    1.03
Buckinghamshire Chilterns University College                                                                                  1.00    1.03
University of Cambridge                                                                                                       1.00    1.03
Columbia University                                                                                                           1.00    1.03
Cornell University                                                                                                            1.00    1.03
Cranfield University                                                                                                          1.00    1.03
University College Dublin                                                                                                     1.00    1.03
University of Edinburgh                                                                                                       4.00    4.12
Ghent University                                                                                                              1.00    1.03
University of Glasgow                                                                                                         1.00    1.03
Goldsmiths                                                                                                                    1.00    1.03
"Government agency"                                                                                                           1.00    1.03
Hewlett-Packard Laboratories                                                                                                  1.00    1.03
University of Hull                                                                                                            2.00    2.06
Imperial College of Science, Technology and Medicine                                                                          1.00    1.03
Independent                                                                                                                   3.00    3.09
Information Automation Limited                                                                                                1.00    1.03
Intrallect Ltd.                                                                                                               1.00    1.03
King's College London                                                                                                         1.00    1.03
Landcare Research                                                                                                             1.00    1.03
University of Leeds                                                                                                           13.00   13.40
Leeds Metropolitan University                      1.00    1.03
University of Leicester                            1.00    1.03
Leicestershire County Council                      1.00    1.03
University of London                               2.00    2.06
London School of Economics and Political Science   5.00    5.15
Loughborough University                            1.00    1.03
Manchester Metropolitan University                 2.00    2.06
Ministry of Research, Science & Technology         1.00    1.03
Monash University                                  2.00    2.06
National Library of Scotland                       2.00    2.06
None specific                                      1.00    1.03
University of Nottingham                           3.00    3.09
University of Otago                                1.00    1.03
University of Oxford                               3.00    3.09
Queen Mary (UL)                                    1.00    1.03
School of Oriental and African Studies             1.00    1.03
Science and Technology Facilities Council          3.00    3.09
Scottish Qualifications Agency                     1.00    1.03
University of Southern Queensland                  1.00    1.03
University of St Andrews                           1.00    1.03
University of Strathclyde                          2.00    2.06
University of Sussex                               1.00    0.00
Swinburne University of Technology                 2.00    2.06
University of Tasmania                             1.00    1.03
UKOLN                                              1.00    1.03
University College London                          2.00    2.06
Waikato District Health Board                      1.00    1.03
University of the West of England                  2.00    2.06
West Sussex Record Office                          1.00    1.03
University of Wolverhampton                        1.00    1.03
University of York                                 3.00    3.09

Total replies:                                     97.00

3. In which country are you based?
Australia                                                                                                                   6.00    6.19
Belgium                                                                                                                     1.00    1.03
New Zealand                                                                                                                 6.00    6.19
Republic of Ireland                                                                                                         1.00    1.03
United Kingdom                                                                                                              80.00   82.47
United States                                                                                                               2.00    2.06
Uruguay                                                                                                                     1.00    1.03

Total replies:                                                                                                              97.00

Section 2: The question of version identification

4. In your opinion how important do you consider the following identification issues?

a) Ease in identifying definitive published/'final' version

Essential                                                                                                                   71.00   73.96
Important                                                                                                                   22.00   22.92
Slightly important                                                                                                          3.00    3.13
Unimportant                                                                                                                 0.00    0.00
Don't know                                                                                                                  0.00    0.00

Total replies:                                                                                                              96.00

Please use the boxes below to further explain your answers if necessary:

A way to identify documents (particularly datasets) that have been updated after their 'final' version is important
Essential for academic papers. Not relevant to unpublished materials unless these have been published in some form later.
Provenance is more important - the difference between derivative and current versions and versions in between. New
publishing trends potentially undermine the concept of final published version.
Only final should go into a repository - author should keep earlier drafts
Particularly if the copy is being provided under the terms of the CLA Digitisation licence
To know whether it had been peer reviewed

b) Ease in identifying all versions
Essential                                                                                                                              28.00   29.17
Important                                                                                                                              51.00   53.13
Slightly important                                                                                                                     12.00   12.50
Unimportant                                                                                                                            3.00    3.13
Don't know                                                                                                                             2.00    2.08

Total replies:                                                                                                                         96.00

Please use the boxes below to further explain your answers if necessary:

Ambiguous
Helps minimise risk of author sending wrong version from their computor
Version control is vital to support any audit trail where versions have evidential value

c) Linking multiple versions held in different locations

Essential                                                                                                                              15.00   15.63
Important                                                                                                                              52.00   54.17
Slightly important                                                                                                                     22.00   22.92
Unimportant                                                                                                                            3.00    3.13
Don't know                                                                                                                             4.00    4.17

Total replies:                                                                                                                         96.00

Please use the boxes below to further explain your answers if necessary:

Depends on the transience or otherwise of the material
I think that only the final version should be archived unless there is a good reason to archive drafts
Its importance will vary according to the location and practicality of linking the different versions. I would consider it essential
to link different versions in an institution or consortium archive. However, it may be considerably more difficult, if not
impossible to link every version available on the Internet.
Not sure if you mean identical or similar here...
This shouldn't be necessary if only finals are in the public arena
Unlikely to be relevant to repositories not dealing with academic papers. e.g. the repository I work on deals with digital archive
and manuscript material collected as primary sources for future research, e.g. personal papers of writers, politicians and
scientists. These are normally donated to a single archive or library. May be useful in the academic output repositories?

d) Being clear about the differences between multiple versions

Essential                                                                                                                               39.00   40.63
Important                                                                                                                               46.00   47.92
Slightly important                                                                                                                      9.00    9.38
Unimportant                                                                                                                             1.00    1.04
Don't know                                                                                                                              1.00    1.04

Total replies:                                                                                                                          96.00

Please use the boxes below to further explain your answers if necessary:

I'm not sure. Sequential data may be sufficient rather than comparing versions.
It is useful. However, the ability to identify differences between multiple versions will vary significantly according to the type of
resource. Quantitative changes, such as the addition of 300 images are easy to measure. However, it is difficult to measure
qualitative changes that are made to a research paper.

Question unclear: does 'multiple' just mean 'different'? are the differences those between the versions or their versioning?
Sorry I may be misunderstanding main force of the question... I can see it is important to know that there are different
versions but knowing the differences between versions might (or indeed, possibly more importantly, might not) assist in the
scholarly study of such differences

e) Accurately describing the object (effective metadata)

Essential                                                                                                                               63.00   65.63
Important                                                                                                                               27.00   28.13
Slightly important                                                                                                                      4.00    4.17
Unimportant                                                                                                                             2.00    2.08
Don't know                                                                                                                              0.00    0.00

Total replies:                                                                                                                          96.00
Please use the boxes below to further explain your answers if necessary:

But with the caveat that accuracy and effectiveness does not necessary come from too much data
Crucial to enabling users to make decisions.
esssential for searchers as authors circle is limited by nerks and through time
Good metadata requires the input of the Creator and a repository administrator responsible for the catalogue.
This is entirely separate from versioning.

f) Clear signposting to the 'best' version for a particular user

Essential                                                                                                                          25.00   26.04
Important                                                                                                                          42.00   43.75
Slightly important                                                                                                                 20.00   20.83
Unimportant                                                                                                                        8.00    8.33
Don't know                                                                                                                         1.00    1.04

Total replies:                                                                                                                     96.00

Please use the boxes below to further explain your answers if necessary:

But possibly difficult to do, given the range of users (ie, from undergrads to professional)
Deciding the 'best' version is perhaps best left to the user, assuming they have enough information to be able to make an
informed decision.
Defining the 'best' for each user may be very tricky
Hard to do, so lower priority than some of the others
How do you define what is 'best' for end-users? Clear sign posting (in the form of effective metadata) yes, but no value
judgements on the object should be made.
I think it would be up to the user to decide what is "best" - but we need to make it possible for them to identify what version
they have found.
Is this about a plain english summary, the full monty and the manager version?
Not sure of application of this in learning object repositories; it would be up to the user.
Sounds a bit subjective.
This suggests an overhead in versioning way beyond its worth. The other 'effective metadata' should provide this, separately.
Useful. However, it is difficult to define and requires some knowledge of the user. In most circumstances, the 'best' version is
the latest. In our own repository, less than 10 users have requested a previous copy of a resource.
Who is to decide (and then who will disagree?)
Who's best? impossible in general

g) Trust in the version at hand

Essential                                                                                                                           62.00   64.58
Important                                                                                                                           24.00   25.00
Slightly important                                                                                                                  8.00    8.33
Unimportant                                                                                                                         1.00    1.04
Don't know                                                                                                                          1.00    1.04

Total replies:                                                                                                                      96.00

Please use the boxes below to further explain your answers if necessary:

Again, hard to implement so better to concentrate on high-quality metadata to help humans decide
Could differ if other options are offered
credibility is paramount or it will not/should not be used
Not so important in learning object repositories; except where high stakes material is involved, like latest medical information,
or assessments.
This, too, is in the eye of the user, and dependent on far more than just identifying the version.

5. Do you think that version identification is particularly relevant to some types of digital object more than others?

No, version identification is important for all types of digital object                                                             69.00   71.88
Don't know                                                                                                                          5.00    5.21

Version identification is more important for the following types of digital object*:                                                22.00   22.92

Total replies:                                                                                                                      96.00

*Please note that, due to a bug with the Bristol Online Survey software, no results were returned for specific types of digital
object

Audio                                                                                                                               0.00    0.00
Datasets                                                                                                                            0.00    0.00
Images                                                                                                                                0.00   0.00
Learning objects                                                                                                                      0.00   0.00
Text documents                                                                                                                        0.00   0.00
Video                                                                                                                                 0.00   0.00

If you think version identification is more important for some types of digital objects please explain why you think
this is the case:


Any digital object that may be subject to editing procedures should be retained in a version structure, for reliability of recall.
Any half-decent text document intended for public consumption should include a date, at or near the top. This can serve as a
version identifier, without any additional numbering system. However, datasets and learning objects are more likely to
developed incrementally and not be explicitly dated. Images, audio files and video can easily be date stamped but the date is
often not apparent in the file itself.
Because of changes to information items over time and need to know providence and lineage and status.
because these are what is actually to be used (directly) and will be verified.
Changes after peer review implemented. Moral rights upheld for authors.
Datasets tend to be consumed by machines, and tend to be turned into results/outputs. Version metadata is essential to
explain different results. With (large) text documents, version metadata may be the only clue to a change - ie, it is difficult for
the human user to spot changes themselves.
from the research/T&T perspective it is important be clear that everyone is talking about the same 'object'. However I'm not
too certian at this stage that the costs involved would merit meeting such a need. The costs include issues of having data
available which might then be demanded by the CLA
High risk of manipulation.
I would expect it is generally critical to know which version reflects the final representation of the author's thoughts, and which
show the genesis of the work towards that final form.
In my work, this is the type of object that we are more likely to get multiple versions of. Though with some of our datasets
(such as geophysics, we will need to know which is the raw data and which is the processed data and which is the derived
data - and this is a similar issue to versioning
It seems to me that version is most important with these types for use in research, to ensure the final version of some
intellectual effort is used and cited, or the correct version of research data associated with subsequent analyses.
may relate to authenticity of the object
Need to identify final/latest version.
preprint and postprint difference, for evaluation purposes.
scholarly research papers - essential to know you have the definitive version.
Some types of content may be less dependent on clear versioning (e.g. broadcast or surveillance content is better described
by precise timing metadata), but that is not _medium_ specific (most of the categories in this list are media).
text by its very nature tends to be undated more often
Text docs and datasets often contain intellectual work that is likely to change between versions. The previous version of
text+datasets may remain important. Video data may also go through several stages of development that contain different
types of information (e.g. transcripts, broadcast notes, etc.)
Text versions and other forms without immediate version control information readily available (prior to publisher submission,
post editing/corrected proofs), if one wants to cite/identify it becomes relevant no?
The importance of versioning is more a function of what the object content is, not the object type
These are objects that I use in my work, and hence version is important.
They are more likely to have multiple editions.
To an extent this is to with my exposure to these material types and additionally I'm slow to realise that other forms of material
can be born digital and open to versioning too!
Version ID is particularly important in cases where derivatives are made, irrespective of digital object type.

6. To what extent have you found it easy to identify different versions of digital objects within 'single' repositories?
(This question refers to versions of the same digital object within a single repository, e.g. an institutional repository)


Different versions of digital objects are easily identifiable                                                                        6.00    6.52
Some room for improvement                                                                                                            21.00   22.83
Substantial room for improvement                                                                                                     40.00   43.48
Different versions of digital objects are unidentifiable                                                                             3.00    3.26
Don't know                                                                                                                           22.00   23.91

Total replies:                                                                                                                       92.00

Do you find versions of any specific types of digital objects harder to identify than others?

Audio                                                                                                                                6.00    8.11
Datasets                                                                                                                             5.00    6.76
Images                                                                                                                               7.00    9.46
Learning objects                                                                                                                     4.00    5.41
Text documents                                                                                                                       3.00    4.05
Video                                                                                                                                6.00    9.46
No, any issues with versions are consistent for all types of digital objects                                                          43.00   58.11

Total replies:                                                                                                                        74.00

Why do you find versions of these particular types of digital objects harder to identify than others?

Difficult to differentiate between versions. Such variation on ways of describing material.
I don't use data in repositories

I have not found many of these types of object in repositories, so this is supposition, but the conventional terms post-print, pre-
print etc., used for text documents do not apply and also the item itself may not include any form of citation or indication of
where it was performed etc., so there's a need to collect more contextual information at point of collection.
I only have experience of text documents
I'm usually looking for text documents only.

Images and text often have metadata which is automaticaly embeded through creation tools and/or editing software (exifdata,
pdfinfo, creation date, date last modified, etc.). Other forms are less likely to offer the opportunity for data extraction.
Images rarely have version information associated with them. Video and audio are better, but there is a similar lack of
consideration around version information. Datsets are OK where owners are conscious of versioning, but less easy where
not.
Not sure. Depends partly on whether earlier and later versions are present (or discoverable) to enable comparison if version
metadata is lacking.
Objects that contain graphical or sound information is difficult to quantify.
Often lack version information and in the case of data.
One major problem is filename conventions. Large files can no longer be identified without resorting to metadata or opening
the file. We now have new filename conventions in place that help with version identification but that does not account for all
types of versions only for the user versions (i.e. original file, large web, medium web, small web) but not archival file types
(i.e. undedited master, edited master, unedited access, edited access). With large files it helps to have shelfmark/identifier
plus version info in the filename but that is not encouraged by digitisation staff.
people do not label as diligently and the automatic systems are less than for text docs or digital filing systems for images -
software issue too
Presentations often develop over time and hard to identify timeline
Rarely sufficent metadata - software objects much better at version control.
Some are less likely to have significant variation (ie video) but metadata is often of poor quality for all sorts of objects
Textual documents can often contain dates and versioning information. other types of digital object rely on the metadata
which can be patchy.
7. Which of the following do you usually use when searching for digital objects across multiple repositories?

Internet seach engine (e.g. Google)                                                                                              61.00    37.89
OpenDOAR                                                                                                                         22.00    13.66
OAIster                                                                                                                          30.00    18.63
Intute Repositories Search Service                                                                                               5.00     3.11
Do not search across multiple repositories                                                                                       23.00    14.29
Other (please specify):                                                                                                          20.00    12.42
ARROW Discovery Service, ADT etc
ARROW Discovery Service (search.arrow.edu.au)
Australasian Digital Thesis program
BASE
BASE
BASE (Search engine)
Don't really do this - yet
Google Scholar
I'm not a researcher, but am a contributer to a cross-repository search system (IRIScotland JISC project)
internal solutions (controlled list based terms solution)
Intute isn't really a 'service' yet!
Intute Repositories Search is not yet fully released?
Multiple repositories for the data sets I am interested in don't exist yet.
NERC data grid discovery service, GCMD, GoGEO
OAI, TDWG TAPIR protocol, LSID
OCLC, COPAC but not much digital material available.
PNDS
Scirus, WoK
Te Puna - Library linkage system accross agencies and sector here in NZ
Though I expect I'll be using the others once I've had a chance to check them out, or, as in the case of Intute, when they are
up and running.

Total replies:                                                                                                                   161.00

8. To what extent have you found it easy to identify different versions of digital objects across 'multiple'
repositories? (This question refers to versions of the same digital object within multiple repositories)
Different versions of digital objects are easily identifiable                                                                  1.00    1.08
Some room for improvement                                                                                                      8.00    8.60
Substantial room for improvement                                                                                               40.00   43.01
Different versions of digital objects are unidentifiable                                                                       13.00   13.98
Don't know                                                                                                                     31.00   33.33

Total replies:                                                                                                                 93.00

Do you find versions of any specific types of digital objects harder to identify than others?

Audio                                                                                                                          6.00    8.57
Datasets                                                                                                                       6.00    8.57
Images                                                                                                                         7.00    10.00
Learning objects                                                                                                               3.00    4.29
Text documents                                                                                                                 7.00    10.00
Video                                                                                                                          5.00    10.00
No, any issues with versions are consistent for all types of digital objects                                                   36.00   51.43

Total replies:                                                                                                                 70.00

Why do you find versions of these particular types of digital objects harder to identify than others?

Again, only experience of text documents
All are hard to ID - esepcially text.
Harder to compare side by side - requires either perfect memory or tedious transcription
I'm usually looking for text documents only.
Insufficient origination data (but this may also be down to copyright issues)
Knowing that you have the latest available. Also titles for datasets change even though the phenomena described is the
same and thus in essence are versions of the data.
Often less evidence in that the others have a near physical metadata attribute
The difficulty in identifying different versions may also be affected by the delivery method. E.g. datasets may be delivered
through a search interface; videos may be streamed, etc.
The issues are not consistent. They are different. But they are all just as problematic.
This is difficult to answer as I've never had occasion to look for different versions of a work across repositories.

Section 3: Current practice within your own repository
9. What repository software do you currently use?

Archimede                                                                                                                                0.00    0.00
ARNO                                                                                                                                     1.00    1.22
CDSWare                                                                                                                                  0.00    0.00
Digital Commons                                                                                                                          2.00    2.44
DigiTool                                                                                                                                 2.00    2.44
DSpace                                                                                                                                   15.00   18.29
EPrints                                                                                                                                  21.00   25.61
Fedora                                                                                                                                   9.00    10.98
GSDL                                                                                                                                     0.00    0.00
InterLibrary                                                                                                                             0.00    0.00
OPUS                                                                                                                                     0.00    0.00
Other (please specify):                                                                                                                  32.00   39.02
Access to repository through LSE Library. Electronic records management system software would be different to these listed.
arXiv
As a researcher I have installed and worked with the "Big Three" - DSpace, Eprints and Fedora. The rest are on the list for a
rainy day! :-)
As an independent advisor I work with many different formats and have no particular preference.
BioMed Central's adaptation of DSpace
Curator
Custom-built Filemaker database
Customised harvesters of distributed biodiversity data
DS CALM
DSpace : We also support EPrints, although we only use DSpace. I am involved with EPrints only in providing Welsh
language support for the RSP team.
DSpace : We use more than 1 also a proprietary repository.
DSpace and Fedora and JSR-170 based repositories.
EPrints : And DSpace!
EPrints : eprints for the 'institutional' repository (WRRO); we're just setting up a digital library but haven't selected software for
that yet.
epubs
Fedora : Bespoke software at the moment, though we are moving to the use of Fedora.
Fedora : I wanted to click multiple things here because I work with lots. Eprints, DSpace, Fedora: Fez, VITAL, Mura)
File Manager
Home grown
In house
In house ePubs software
Inhouse system (MySQL with Access front end), don't know what other software is used.
InterLibrary : It's "intraLibrary"!! Thanks for mis-spelling our product :-)
Open Repository
Open Repository from BioMed Central
Various - mainly internally developed
VITAL
VITAL
We have a system specific to the major government agency I work for - electronic document management system. Also use
Library systems
We have no digital repository for sheet music.
We run our own digital archive that is not based on any type of software. We use ColdFusion, Oracle, Java, html to create
our own system
Wisdom 6.4

Total replies                                                                                                                   82.00

10. In your view, at present is there a particular repository software application which deals with any aspect of

Yes                                                                                                                             20.00   22.47
No                                                                                                                              6.00    6.74
Don't know                                                                                                                      63.00   70.79

Total replies:                                                                                                                  89.00

If 'Yes', which software does so and how is this achieved? (please provide links to pages if this helps)

EDRMS used in MSD New Zealand - its in house and covers about 1/4 of the public service in this country
EPrints appears to do this better than DSpace. On the other hand, it also appears to be rather overcomplicated and less user-
epubs - when a work(FRBR) is edited, the new version is saved as a workinstance.
ePubs has implemented FRBR and thus gives the opportunity to be explicit about versioning. I have already read, but have
Fedora
Fedora - handles multiple objects well.
Fedora - keeps multiple versions as they are added and also it's possible to link related items and have complex digital
Fedora enables ingest of multiple datastreams for each object, generates a new versions for each datastream that is either
Fedora has a system for versioning. Not up-to-date with capabilities of others.
Fedora provides integrated versioning of a bitstream (object or metadata). However, it does not support cross-repository
Fedora which uses multiple datastreams within an object, each of which can be versioned (say 1.0, 1.1 etc). Versions
Fedora, because it is very flexible.
Fedora, which records a separate version whenever an object is saved within the repository. This doesn't always meet
Fedora. EPrints has some provision for linking earlier and later versions, but not perhaps for more sophisticated linking of
IntraLibrary will be offering vastly improved support in v3.0 which will be out before the end of the year.
It's easy to link multiple versions in ePrints. I'm not sure how other systems compare. However, it's just a sequential link.
JSR-170 and JSR-283 based repositories
Life Science Identifier (LSID) use of resolvable GUIDs + versioning works well
Most ERMS software I've seen handles version control fairly well, but this is because the idea is to create one version which
My only experience of this is with a previous installation of Eprints software where versions of research papers could be
Versions are linked in EPrints
Wisdom 6.4, but only for electronic media


11. Which of the following types of material do you currently store, or do you intend to store in your repository?

Audio

Currently store                                                                                                                 20.00   27.78
Plan to store in future                                                                                                         33.00   45.83
Do not store                                                                                                                    19.00   26.39

Total replies:                                                                                                                  72.00

Datasets

Currently store                                                                                                                 18.00   23.38
Plan to store in future                                                                                                         42.00   54.55
Do not store                                                                                                                    17.00   22.08

Total replies:                                                                                                                  77.00

Images

Currently store                                                                                                                 38.00   48.72
Plan to store in future                                                                                               27.00   34.62
Do not store                                                                                                          13.00   16.67

Total replies:                                                                                                        78.00

Learning objects

Currently store                                                                                                       11.00   15.49
Plan to store in future                                                                                               22.00   30.99
Do not store                                                                                                          38.00   53.52

Total replies:                                                                                                        71.00

Text documents

Currently store                                                                                                       68.00   78.16
Plan to store in future                                                                                               15.00   17.24
Do not store                                                                                                          4.00    4.60

Total replies:                                                                                                        87.00

Video

Currently store                                                                                                       19.00   26.03
Plan to store in future                                                                                               36.00   49.32
Do not store                                                                                                          18.00   24.66

Total replies:                                                                                                        73.00

12. If you currently store any material outside the scope of this survey that you feel would benefit from a version
identification framework please give details below


assessment items (probably a subset of learnign objects tho)
citations
email is an achievable, but often ignored area that may be interesting to examine.
email, websites - these could be classified under the headings above depending on the data model used.
Grey literature (reports) and conference papers - often multple versions of the same paper given at different venues.
Historical documents, eg republications of early 18th c novels.
I could not answer all of the questions in (11) as some categories have not been discussed in policy meetings, and no
requests yet made. We probably would store any of these categories, however.
Large datasets - geophysics, 3D laser scanning etc. Problems identifying which is the primary raw data and what has been
done to the data in order to create new versions
maps and plans
Methodology i.e. the research process leading to the research output
Please note response to 11 based on what might need to store in an ERMS.
Policy and associated administrative documents
Software - we do not currently store this, but the issue has been raised by users.
spatial data
The links that are established between different types of materials. ie Datasets to images.
We are also looking at storign music scores in our repository.
What about printed music and music notation files?
Whilst not personally planning to store any material I suspect Music needs to be considered as a special category not least
because of the differences in the way copyright applies to music
Wouled etheses where content may be removed for copyright reasons be material for consideration. The omissions
somehow have to to be described in the metadata. This could happen in theses with regard to images, or with regard to
Appendices containing copies of published research papers (which cannot be made available online also for copyright
reasons) A future type of material might be akin to Data sets but be a mixture of research material which is in progress eg
combination of text documents, research articles, book chapters, statistical data, annotated texts which are linked together
but maybe at different stages of development. Exam papers?

13. How important do you think the following issues are when storing digital objects in your repository, specifically
audio and video files, images, datasets, text documents and learning objects?

Importance of keeping a high quality copy for archival reasons (e.g. high resolution image, high quality video,
uncompressed sound file)

Essential                                                                                                                      48.00   55.17
Important                                                                                                                      33.00   37.93
Slightly important                                                                                                             3.00    3.45
Unimportant                                                                                                                    1.00    1.15
Don't know                                                                                                                     2.00    2.30
Total replies:                                                                                                                     87.00

Please use the boxes below to further explain your answers if necessary:

but mainly for image collection
for those that need to be retained permanently - not all will qualify
I'd say essential but (1) we may never see a high quality copy (start with a jpg image not a tiff), files may cause storage
problems (broadcast digital video) due to size.
If one can control the master, why carry a copy?
The expectations of users will change over several years. The archival copy is likely to be a better quality that can be used to
generate new delivery versions.
The preservation role is to be discussed in a policy meeting this week, so this is a personal response.
This is the whole point of being an archive

Making thumbnails/previews/abstracts available to user

Essential                                                                                                                          21.00   23.86
Important                                                                                                                          39.00   44.32
Slightly important                                                                                                                 24.00   27.27
Unimportant                                                                                                                        2.00    2.27
Don't know                                                                                                                         2.00    2.27

Total replies:                                                                                                                     88.00

Please use the boxes below to further explain your answers if necessary:

Depends on what type of file it is.
NOT the case with text documents, only multimedia.

Making multiple versions of an object available in different file formats (identical apart from file type) to aid
usability?

Essential                                                                                                                          11.00   12.50
Important                                                                                                                          31.00   35.23
Slightly important                                                                                                                 37.00   42.05
Unimportant                                                                                                                       6.00    6.82
Don't know                                                                                                                        3.00    3.41

Total replies:                                                                                                                    88.00

Please use the boxes below to further explain your answers if necessary:

I think we shall stick to widely accepted and easily(?) preserved types as much as possible: eg pdf for text. Available images
will be jpg (though tiff datastream for preservation) but we have thoughts of jpeg 2000 in the future and on-demand scaling
and/or format conversion.
may assist with sustainability (contradicts a above)
Not sure how this would aid usability
or just use a single, open, widely supported format, eg. ODF
There is a huge problem with repository software that only does PDF for text - I work on a project dedicated to helping people
produce more accessible (HTML) and preservable content (XML)
This is particularly important for datasets and other complex resources. If making data available for download, an archive
should distribute data in a format suitable for ease of use (e.g. MS Access) and a platform-independent format for other
users.
Usability in the true sense of the word is hindered by different file formats -- more choices for users (when most of them will
select the default anyway). I would describe this as 'utility'.
We try to provide things in a format which we know there is a free viewer for so that we only need to supply it once
with a preference for cross-platform and openly documented formats.

File size as low as possible due to storage space

Essential                                                                                                                         7.00    8.05
Important                                                                                                                         16.00   18.39
Slightly important                                                                                                                23.00   26.44
Unimportant                                                                                                                       39.00   44.83
Don't know                                                                                                                        2.00    2.30

Total replies:                                                                                                                    87.00

Please use the boxes below to further explain your answers if necessary:

Although there may be other reasons for providing smaller file sizes, eg download times.
Depends - not so relevant for text documents but may be for extremely large datasets or image files
Lack of storage space should no longer be an issue for anyone!

No answer applies. Ideally need the best quality you can keep but clearly filesize may be an issue for compromise (see 13a)
Often not possible with the type of data we deal with.
Only where it doesn't reduce the quality of the digital object (though we may disseminate lower quality versions onine but
preserve the highest quality one)
Re file size, I'd be more concerned about end user bandwidth than local storage space.
Repository is archival, so evidential value takes precedence over file size, though this is an important practicality.

Storing all available versions of an object

Essential                                                                                                                          8.00    9.20
Important                                                                                                                          27.00   31.03
Slightly important                                                                                                                 33.00   37.93
Unimportant                                                                                                                        15.00   17.24
Don't know                                                                                                                         4.00    4.60

Total replies:                                                                                                                     87.00

Please use the boxes below to further explain your answers if necessary:

Depends how useful they are - do this on a case by case basis
If the repository has copies of every version, it is essential that they continue to store it. However, they should not be
responsible for locating every version.
It depends on what they are
May upgrade this to "essential" if policy meeting decides that preservation is a stated aim.
No answer applies. All 'useful' versions of an object? Maybe if we are talking about author's final etc. Revisions of an object
1.0 1.1 yes, but 0.5, 0.6 maybe/maybe not.
Or establishing guidelines for storing the 'important' versions.
Really difficult to answer this. All available versions of an object may be massive overkill for some types of work e.g. minor
changes to a text but may be crucial for others.
subjective - down to evidential weight
Though 'diff' or 'patch' like functions could also help - so you have the main version and list of changes to make it any of the
other versions.
Having 'most up-to-date' version of an object

Essential                                                                                                                          39.00   44.32
Important                                                                                                                          43.00   48.86
Slightly important                                                                                                                 6.00    6.82
Unimportant                                                                                                                        0.00    0.00
Don't know                                                                                                                         0.00    0.00

Total replies:                                                                                                                     88.00

Please use the boxes below to further explain your answers if necessary:

Depends on what the depositor wants to deposit and make available
It is not important that a repository possesses or makes available an 'up-to-date' version, only that it is available for access
from some location.
Though having the most authoritative version is much more important.
We can't stop our depositors playing around with their data after they've deposited with us - there is no way we can ensure
they haven't created a more up-to-date version after they've deposited

Object not being altered/updated without this being made clear to the user

Essential                                                                                                                          48.00   55.81
Important                                                                                                                          26.00   30.23
Slightly important                                                                                                                 8.00    9.30
Unimportant                                                                                                                        4.00    4.65
Don't know                                                                                                                         0.00    0.00

Total replies:                                                                                                                     86.00

Please use the boxes below to further explain your answers if necessary:

Authenticity and evidential value very important to archivists and to users of archives and manuscripts.
It is important for referencing purposes that any changes to the data are recorded in the associated metadata. If possible, the
user should be able to view the version of an object that they previously encountered, irrespective of the fact that a new
manifestation is available.
Just make it clear when the user obtains the object that it might change - else it becomes unmanageable.
We would make it clear to the depositor not the user. If we have to update or alter the object (because of typos or data quality
issues) we will get permission from depositor. The user does not need to know if changes have been agreed with depositor

Version of an object be made available persistently i.e. object is given a unique URL which will provide access to
that object in perpetuity

Essential                                                                                                                          48.00   54.55
Important                                                                                                                          33.00   37.50
Slightly important                                                                                                                 5.00    5.68
Unimportant                                                                                                                        0.00    0.00
Don't know                                                                                                                         2.00    2.27

Total replies:                                                                                                                     88.00

Please use the boxes below to further explain your answers if necessary:

Although varies enormously according to the object concerned.
But not through unique URL.
but not using URLs - preferable to use resolvable GUIDs - LSID, Handles, DOI etc.
Crucial for inter-repository linking.
Do I here say 'dream on...?' The URL ought at least explain what happened to an object if it is no longer available (taken
down, lost, ...)
Important - but depends on the planned longevity of the item.
Requires appropriate software
This depends on the material and rights issues. Material that is in copyright and can only be made available for access
inhouse as and when required should be held on a secure server and be made available to the user in a specific location
temporarily.
Very important. We have identifiers for traditional manuscripts that have persisted for hundreds of years - we need the same
for digital manuscripts.
We have replaced one object, to my knowledge, and the new version had the same persistent URL as the old. I think this
suited the depositor: of course, if they have distributed the persistent URL, it probably suits them better to have the URL
remain the same
Would be nice! We haven't solved this one yet though!
Yes, though the object's location itself can change as long as there's a persistent resolvable identifier
Relationship between versions of an object being made clear to the user

Essential                                                                                                        43.00   49.43
Important                                                                                                        40.00   45.98
Slightly important                                                                                               3.00    3.45
Unimportant                                                                                                      1.00    1.15
Don't know                                                                                                       0.00    0.00

Total replies:                                                                                                   87.00

Please use the boxes below to further explain your answers if necessary:

The status of the version (author's final...) must be clear. Not sure that relationships need to be explained?

Clear signposting to a 'more appropriate' version of an object which may be available elsewhere

Essential                                                                                                        29.00   32.95
Important                                                                                                        39.00   44.32
Slightly important                                                                                               17.00   19.32
Unimportant                                                                                                      1.00    1.14
Don't know                                                                                                       2.00    2.27

Total replies:                                                                                                   88.00

Please use the boxes below to further explain your answers if necessary:

Again, not sure of the application of this for learning objects.
Appropriate is too subjective. I think we can only indicate if there is a later version of the work.
by more appropriate it may be suggested that it is different, therefore not the same version.
except can't tell if other version is more appropriate
Yes, if it's possible to define what that is for that user


14. How would you best describe your current practice for dealing with versioning issues with digital objects?

Express relationships using Metadata
Audio                                                   9.00    10.00
Datasets                                                14.00   15.56
Images                                                  18.00   20.00
Learning objects                                        7.00    7.78
Text documents                                          35.00   38.89
Video                                                   7.00    20.00

Total replies:                                          90.00

Use of ID tags

Audio                                                   6.00    15.00
Datasets                                                6.00    15.00
Images                                                  10.00   25.00
Learning objects                                        4.00    10.00
Text documents                                          10.00   25.00
Video                                                   4.00    25.00

Total replies:                                          40.00

Utilise functionality of repository software

Audio                                                   5.00    7.58
Datasets                                                8.00    12.12
Images                                                  14.00   21.21
Learning objects                                        5.00    7.58
Text documents                                          28.00   42.42
Video                                                   6.00    21.21

Total replies:                                          66.00

Naming of actual objects using pre-defined taxonomies

Audio                                                   5.00    10.87
Datasets                                                9.00    19.57
Images                                                                                                           9.00    19.57
Learning objects                                                                                                 3.00    6.52
Text documents                                                                                                   16.00   34.78
Video                                                                                                            4.00    19.57

Total replies:                                                                                                   46.00

Naming of actual objects on an ad hoc basis

Audio                                                                                                            7.00    13.73
Datasets                                                                                                         5.00    9.80
Images                                                                                                           11.00   21.57
Learning objects                                                                                                 4.00    7.84
Text documents                                                                                                   17.00   33.33
Video                                                                                                            7.00    21.57

Total replies:                                                                                                   51.00

Use of digital object creator's own system (sticking to any system used before files came into the repository)

Audio                                                                                                            5.00    13.16
Datasets                                                                                                         5.00    13.16
Images                                                                                                           6.00    15.79
Learning objects                                                                                                 6.00    15.79
Text documents                                                                                                   11.00   28.95
Video                                                                                                            5.00    15.79

Total replies:                                                                                                   38.00

No system in place for dealing with versioning issues

Audio                                                                                                            16.00   17.39
Datasets                                                                                                         18.00   19.57
Images                                                                                                           16.00   17.39
Learning objects                                                                                                 14.00   15.22
Text documents                                                                                                   10.00   10.87
Video                                                                                                                            18.00   17.39

Total replies:                                                                                                                   92.00

Don't know

Audio                                                                                                                            13.00   17.33
Datasets                                                                                                                         11.00   14.67
Images                                                                                                                           13.00   17.33
Learning objects                                                                                                                 15.00   20.00
Text documents                                                                                                                   11.00   14.67
Video                                                                                                                            12.00   17.33

Total replies:                                                                                                                   75.00


15. Please specify any other methods you currently have in place or are thinking of using to deal with versioning
issues with digital objects, specifically audio and video files, images, datasets, text documents and learning objects

All we have done so far is think about the metadata fields we might be able to use to describe a version and express
relationships. Our repository is so far predominantly etheses and this is sometimes the expression of the relationship
between the print and eversions, in addition to any potential description of removed content in the eversion. I would
recommend metadata plus automated linking by the repository software, in a standard way.
Currently, have only the P drive to play with so naming conventions the only real option
Don't understand Q 14 at all - decided not to answer
I would prefer it to be system driven than administrator driven, repository systems built with the capability
IDs, describing in lineage field in metadata.
Including version number in title if it is a new version of a particular product. Encouraging producers to use unique,
descriptive, titles for each product that they produce.
Not my role - I monitor accross several agencies - I use own agency systems
Our versioning issues are mainly related to changes the repository makes itself through preservation actions rather than
versions that may be submitted by the actual creator. As with paper archives, we would rely on archival description, including
intellectual arrangement and descriptive metadata (such as date) to enable researchers to discern the creator's own
versions.
Recording revisions in metadata (where the change is actual, but not significant to represent a new version)
Saving a new file with a new version number manually (all file types)
Thinking about how an application profile such as SWAP could be used to assist with this.
Use of file type to distinguish tiff and jpg versions
We are just starting and have not resolved this issue yet.
We currently tag all our file content from a pre-defined taxonomy (very simple) of different object type versions.
We have two options available on the deposit form: a drop-down list for the depositor to choose from to describe the version
and a free-text box so they can describe it in their own words
We're looking at the scholarly works application profile for text based works. Hopefully this will be incorporated at the
software developer end.
where possible I embed metadata. I also extract id3 tags, exifdata, etc. to associated data sets.
will usually be through file naming and through file level metadata received from depositor (though they often don't tell us
everything we need to know) or metadata created in-house.

16. To what extent is any system you currently have in place for dealing with versioning issues automated?


Mostly automated                                                                                                                     11.00   15.07
Some automation                                                                                                                      18.00   24.66
No automation                                                                                                                        36.00   49.32
Other (please specify)                                                                                                               8.00    10.96
Mostly automated : Automated

No automation : Experience with content management system has shown that it is not important to provide user interface to
versioning system, but let users get an administrator to find lost versions etc. Version control is very hard for people to grasp.
No automation : It is up to the creator of the metadata record to decide on the relationship between versions.
No automation : this is a problem at the moment
No automation : We don't often receive multiple versions of things ...if we do, we normally query this with depositor because it
often means they have sent something in error.
Planned to be automated. Adding metadata as part of preservation migration actions, etc. For creator versions will be more
manual, as human judgment generally required.
Still setting up
The storage is automated for changes made to an item but addition of new related items is manual

Total replies:                                                                                                                       73.00

Please explain the form this automation takes
Any changes to any datastream (eg metadata or files containing content) are recorded by retaining the 'old' versions. But only
the latest version is exposed to users
any document is automatically versioned on the production of changes to its content via a save instruction
Automatic creation of jpg files from tiff at point of ingest - both stored as instances associated with same metadata
changes in digital object data/metadata automatically create LSID tagged versions
command line embeding, extraction and editing of metadata within digital objects
Convention for theses Linking for versions (older/newer)
ePrints automatically identifies duplicates using metadata and displays title after search.
EPrints has a function to link later with earlier versions
EPrints links records for different versions of objects, in date order, and these appear in search results identified by phrase
'this is a newer version of ......' or there is a newer version of .....'. When a compressed version of a video file, or an excerpt is
included but the full-size file is NOT included in the repository, I describe what kind of file and the relationship to the original,
textually in the 'Additional information' metadata field.
ePrints offers the option to add a new version of a work. The works are then linked by the software.
Feature within eprints to relate records
In Fedora the versioning is done by the system whenever an object is uploaded
In taking content developed by an author using the repository for storage we may have the sequence of revisions (0.1, 0.2 -
>1.0). These could be kept as our software clones the author's object into the library-managed 'public' space.
Most recently submitted datastream becomes the next version and, by default, becomes the active version, which can be
changed manually, if needed.
Not personally able to comment
one has to learn where to point and click to make it happen
provided that the source of the file provides the relevant version information from a pull-down list we ensure that this
information is used throughout the system to tag the file, to display the version to end-users, and to influence sorting options
and other preference related options.
System creates a new version whenever a work is edited, stores it with an automatically generated id.
The Fedora system currently being implemented will handle versioning to some extent in an automated manner. However, it
is likely to require staff intervention for many tasks.
Users can add new versions within an upload workflow, and the versuions are automatically linked within the repository so
end users can "View All Versions", with the most recent version highlighted. External links made to objects within the system
always link to te most recent version. When adding a new version, metadata is copied over so that the user can update it for
the new version.
Users upload different versions of papers based on copying existing metadata and Eprints creates links

17. Do you have any input at the digital object creation level before objects come to the repository?
Yes                                                                                                                         13.00   16.05
Occasionally                                                                                                                33.00   40.74
No                                                                                                                          27.00   33.33
Don't know                                                                                                                  8.00    9.88

Total replies:                                                                                                              81.00

If 'Yes' or 'Occasionally' what form does this dialogue take?

Advice on file types to use                                                                                                 38.00   44.19
Advice on naming conventions                                                                                                25.00   29.07
Advice on using ID tags                                                                                                     8.00    9.30
Other (please specify)                                                                                                      15.00   17.44
(1) We may require a file type (theses must be pdf). (2) If the object is developed using the repository we can require
metadata be submitted with it (for text we can pre-populate this fairly fully, less so for other MIME types).
Advice on documentation, intellectual property issues, and appraisal.
advice on metadata creation
Advice on what metadata to supply alongside data
Advise on metadata
Advising on file structure and organisation
Create as well
Creation of cover sheets which can carry some metadata
Metadata
Quality control; recommendation of digital objects to be deposited with the repository.
Scanning files (when object not born digital)
So far we have intervened in one case where two files were not internally referenced to each other properly (i.e. not
repository metadata but actual files) and repaired this for the user, who had not got the technical ability.
Sometimes we have contact with archive creators prior to transfer and provide advice, sometimes not. We can't mandate any
of these things regardless.
What metadata elements to use to specify, for example, relationships.

Total replies:                                                                                                              86.00

18. Do you exercise any form of 'quality control' over files before they can be allowed into your repository? (e.g.
specifying minimum resolution for images, article must be peer reviewed etc.)
Yes                                                                                                                                  44.00   55.70
No                                                                                                                                   22.00   27.85
Don't know                                                                                                                           13.00   16.46

Total replies:                                                                                                                       79.00

If 'Yes' please give an example(s) of the types of control that are in place

A full staging area for repository content in which copyright is checked, file versions are checked to be accurate, metadata is
verified and augmented, and other rights information is gathered. Once a library administrator is happy with the object it can
then be made public.
Accuracy of citation to repository object
Aim to stick with common file types for articles and presentations, e.g. HTML, Word, PDF.
All entry into the IR is moderated by select library staff
All files are screened before entering the repository.
Check any licensing issues e.g. includes 3rd party data/objects and in the case of data, that it is what it said it is.
Check publishers OA policy
Content must be within scope of the repository. Where possible, files should meet good practice for accessibility, and for
preservation, but at the moment will not refuse files on these grounds.
Copyright compliance
depends how you use the temr repository - by law in NZ all needs to be kept - its weeded by Archives NZ later on using legal
criteria
Depends on publisher copyright restrictions. Peer-reviewed post-prints preferred.
Each collection has its own appraisal process, which each object must meet. The repository also reviews each object to
ensure that it has been appraised at the collection level and that it meets repository standards.
Ensure appropriate file type
File format conforms to standard. Metadata conventions. file size checks.
File types are specified and specific type of document specified with direction to alternative repositories for content other than
eprints.
Image resolution, colour management procedures, cropping procedures
inserting copyright information and spilitting files into parts
Material will be checked for metadata and for appropriate rights declarations etc. There may eventually be a non-checked
express route for open-access text.
minimal: appropriateness which includes format/size to some extend
Must be compliant with author-publisher copyright agreement. Would make text files into PDFs
need ot be signed off my the itembank manager
No: but are considering this at a policy level later in the year
Our product allows users to do this, if the customer sets up the system accordingly.
PDF files must OCR enabled + searchable, but must be below a 10meg limit
Peer Review Selected by Faculty in liaison with library, verbal discussion
peer reviewed
quality check (peer review), format check (pdf), verity check (authors)
resolution
Review by librarian
Size, copyright issues, metadata quality
Specific metadata must be attached to the record
Specific metadata must be attached to the record
specify minimum resolution for images emphasise the importance of storing uncompressed images and audio. define codecs
for audio and video compression
Text must be peer-reviewed or published version, not preprint
The depositor must clarify copyright clearance and demonstrate the resource is useful for the research community.
Tiffs are resized for web display to three predefined sizes, resolution 72dpi, thumbnails sharpened
Very minimal in technical terms. Making sure that the file is readable and that it is a pdf. Plus control of size of files. Give
advice on conversion of files to pdf format which we use as a standard. Quality control of non thesis material will be controlled
by channelling material through a local research publications database before it enters the repository. So most material will
be peer reviewed, published material, and we still have to draw up guidelines for "other" content eg working papers, more
ephemeral publications.
We have different repositories forr different content types. The repository for published peer-reviewed material only contains
this material.
We only accept certain file types, we only accept data if we have appropriate documentation to make it useful and re-usable
in the future (ensuring independent utility of data)
We only except certain file types i.e. pdf in some areas

Section 4: Possible solutions to version identification: Taxonomies


19. If a standard taxonomy for describing versions in the lifecycles of digital objects (e.g. uncut footage of the film
'Bladerunner', rough edit, cinematic version and director's cut or, alternatively, a first draft, peer reviewed and
publishers version of a journal article) could be developed how useful do you think this would be?


Very useful                                                                                                                         47.00   50.54
Useful                                                                                                                               41.00    44.09
Of limited use                                                                                                                       4.00     4.30
Not very useful                                                                                                                      1.00     1.08
Don't know                                                                                                                           0.00     0.00

Total replies:                                                                                                                       93.00

If you agree that this would be useful, do you think it would be of particular use when dealing with versioning issues
for any specific types of file?

Audio                                                                                                                                11.00    7.33
Datasets                                                                                                                             12.00    8.00
Images                                                                                                                               14.00    9.33
Learning objects                                                                                                                     10.00    6.67
Text documents                                                                                                                       22.00    14.67
Video                                                                                                                                12.00    8.00
Relevant to all file types                                                                                                           69.00    46.00

Total replies:                                                                                                                       150.00

Please use the text box below to expand on your answer if necessary

...but of course it would only be useful if people used and stuck to it...
Although the standard would need to relfect different terms used against different media.
Any standard taxonomy might well cover the issue of different published versions e.g. for Music identification of the Paris
versus the Dresden versions of Wagner;s Tannhauser (plus date of of publication)
but the final not the working versions of text and data are what should be publicly available
I have chosen these material types, because I am aware that there is informal definition of these phases (as shown by the
question!)
I think it's useful, but it's only one part of the puzzle. For example for a text document to say that one document is a pre-peer
reviewed draft and another is a peer-reviewed accepted manuscript doesn't actually tell you whether the document has had
one minor edit or a major re-write.

If it were possible (a very big 'if'), the categorizations would need to be against types of content, not against different media.
It has to be simple enough and clear enough that people assigning metadata can get it right. Often this will be a non-
specialist, so it has to be obvious.
It presumes a predictable set of variants. Fine for published articles, but much less the case for e.g. datasets or learning
objects
It would be useful but so would unniversal metadata
It would need to be clear to the user that we were using a taxonomy with specialist vocabulary, and what the vocabulary
meant
May need to consider context as well as filetype. Not sure how well taxonomy created in open access academic output-type
framework will apply in other areas, such as archives and manuscripts, learning objects, etc.
mostly video although DVD now have numerous deleted scenes/alternative endings etc
Pre-print and post-print may be sufficient for text.
relevant to all, but difficult to achieve something which is both standardised and relevant to all dig object types - easy to
imagine taxonomy explosion.
The concepts are too loosely defined, and in practice would be used so inconsistently by different people that they would not
help users at all. Simply adding date to all files and versions would be better, though not so easy for datasets, which would
have to have periodic snapshots archived. For *most* types of content only latest version matters and early ones should be
discarded - to simplify and enhance large-scale object management.
The hard part would be enforcing the taxonomy, once you'd spent three years arguing about it of course.
The main difficulty of this would be different uses of terminology by different community
The RMS "Bulletin" published paper this year on Version control - authored by Paul Dodgson
The taxonomies would have to be different for each file type.
The version control method may vary according to the environment in which it is created.
veru difficult to get this information clearly stated from the submitter, however
We don't deal with a lot of audio/video and where we do, there is only ever one version.


20. If a standard taxonomy for describing versions of digital objects in relation to other versions of the same object
(e.g a 'full size' JPEG from a digital camera, a compressed JPEG and a thumbnail image or, alternatively, a DOC, PDF
and HTML version of a text file) could be developed how useful do you think this would be?

Very useful                                                                                                                     36.00   38.71
Useful                                                                                                                          46.00   49.46
Of limited use                                                                                                                  7.00    7.53
Not very useful                                                                                                                 1.00    1.08
Don't know                                                                                                                      3.00    3.23

Total replies:                                                                                                                  93.00
If you agree that this would be useful, do you think it would be of particular use when dealing with versioning issues
for any specific types of file?

Audio                                                                                                                                5.00    5.21
Datasets                                                                                                                             3.00    3.13
Images                                                                                                                               12.00   12.50
Learning objects                                                                                                                     1.00    1.04
Text documents                                                                                                                       9.00    9.38
Video                                                                                                                                5.00    5.21
Relevant to all file types                                                                                                           61.00   63.54

Total replies:                                                                                                                       96.00

Please use the text box below to expand on your answer if necessary

Again I think this is useful as part of the overall picture that could be built up through a framework of version identification
approaches.
I suspect the stable door has been opened on this one.
It would be especially useful if the taxonomy also related to appropriate file types for given uses (eg for web/screen, for print,
for full-colour print...)
smaller JPEG's display quicker, not everybody has Microsoft Word (so i'm told - hard to believe now though)
The taxonomies would have to be different for each file type.
This should be achievable
Very important for a good user experience.
Very, very hard to define, and even harder to keep up to date.
We do this by ingesting multiple datastreams for each object and selecting specific ones for dissemination.
What is described is a categorization of formats for the same content, not versioning.
when loading to a website this becomes critical
With any digital objects a clear and consistent file naming system is very useful. However, most people seem to be very poor
at designing and following such a system.
Would need to see more examples to comment.

21. If a standard taxonomy for describing versions of digital objects and their relationship to other objects (e.g. how
a video file of the complete film 'Casablanca', still images of the movie and a sound file containing only the audio
relate to each other) could be developed how useful do you think this would be?
Very useful                                                                                                                     34.00   36.17
Useful                                                                                                                          49.00   52.13
Of limited use                                                                                                                  8.00    8.51
Not very useful                                                                                                                 0.00    0.00
Don't know                                                                                                                      3.00    3.19

Total replies:                                                                                                                  94.00

Do you think this would be useful when dealing with versioning issues for any particular combinations of file?
(Please specify)

All datastreams for an object are ingested as part of the object.
But can you design one?
Complex objects of various kinds - multimedia and web especially.
For us, more useful for some of our more obscure datasets
However, ability to display hierarchical and other relationships between objects in interface would be even more useful.
I presume that we are moving into FRBR territory here? Certainly their relationships should be the basis...
Linking papers with presentations based on or derived from them.
Lots of different permutations of the example above!
Not as vital as poreceeding options.
Off the top of my head, this seems trickier to do.
Off topic: we should be aware that whatever happens, a taxonomy like this must be extensible in some way.
Sounds like FRBR territory to me.
Standardized forms of such an ontology would be very powerful; RDF/XML sets out on this path, but a shared vocabulary for
it might be a bigger (and more difficult) step forward.
This idea is subject to subjective decisions on when the items are related and when they are not.
Use of the FRBR model would help to address this, as was the experience of the SWAP work. I'm not sure a taxonomy is the
most useful approach here.
Useful to the extent that all these versions are related in a FRBR sense to the parent work, which could be quite an abstract
context. But in practical terms it might end up a step too far?
Useful, but again difficult to find standard but descriptive terms.
video/image

Section 5: Possible solutions to version identification: Chronological and numeric approaches
22. If a standard indication could be given to show the 'version' of an object (e.g. v.1.1) how useful do you think this
would be when dealing with digital objects?


Very useful                                                                                                                         28.00   30.43
Useful                                                                                                                              43.00   46.74
Of limited use                                                                                                                      15.00   16.30
Not very useful                                                                                                                     2.00    2.17
Don't know                                                                                                                          4.00    4.35

Total replies:                                                                                                                      92.00

If you agree that this would be useful, do you think it would be of particular use when dealing with versioning issues
for any specific types of file?

Audio                                                                                                                               1.00    1.18
Datasets                                                                                                                            6.00    7.06
Images                                                                                                                              2.00    2.35
Learning objects                                                                                                                    5.00    5.88
Text documents                                                                                                                      11.00   12.94
Video                                                                                                                               1.00    1.18
Relevant to all file types                                                                                                          59.00   69.41

Total replies:                                                                                                                      85.00

Please use the text box below to expand on your answer if necessary

but just a chronological or number version system will not give a clear idea of authority and there is no way of knowing if a
later version exists - whereas "First Draft" would imply something may exist that supercedes this version...
But we would need to know, I think, about all the versions in order to apply this scheme. This could be difficult.
but with the caveat that it might be difficult to use in respect of versions historic material (see my Wagner Tannhauser answer
to previous question)
Date stamp would be more appropriate for datasets
have this for text docs now
It is at least a convention that is understood, but the gray area of version vs revision would need to be treated very carefully.
It may need to be more complex than simply a version number - to be able to demonstrate 'final as published in print' / v2.3
as deposited by me / v2.3 as deposited [elsewhere] by co-author
It partly might depend on the nature of the new version - major change (in meaning or content) or minor change (eg spelling
error)
Need to consider whether this wld clash with creator's own versioning system. Would favour whole numbers over decimals.

Repository staff would not know which version they had. Depositor might deposit their versions 1 and 4, but not 2 or 3.
Versions in repository would therefore get numbered 1 and 2, but author's no. 2 might appear later on.
This feature may not be enough for all types of material.
This kind of versioning leads to long development times and is too hard for real users. Consider using the a Subversion style
repository revision number and allow users to 'fork' documents if they need to by creating a new file.
This would give some idea of chronology but not changes of content.
version numbers are good as long people don't overload thier own models of versioning onto them automatically. v1.0 for a
document given to a software engineer is likley to be interpreted as the first release, not the first version.
We ask producers to designate each release with a version number or, in some cases a descriptive term, to differentiate it
from previous versions.
What is meant by a 'standard indication'? Either that the opportunity for an indication becomes standard (to supplement
numerical versioning), or that the data in the indication is standardized (which is the big 'if', again).
You would obviously need to understand the version number, which may not be straightforward, especially for non-expert
users. I have doubts as to whether it's actually possible to compress the version information in this way. For example, how
would you version number: Published, publisher's copy, peer-reviewed article in an understandable way. v2.3.1 wouldn't tell
you much useful.

23. If a standard system could be devised for indicating versions through the record identification numbering system
in a repository (i.e. 1st version of an object is given the number 1, subsequent upload is given the number 2 etc.)
how useful do you think this would be when dealing with digital objects?

Very useful                                                                                                                     17.00   18.48
Useful                                                                                                                          39.00   42.39
Of limited use                                                                                                                  20.00   21.74
Not very useful                                                                                                                 10.00   10.87
Don't know                                                                                                                      6.00    6.52

Total replies:                                                                                                                  92.00
If you agree that this would be useful, do you think it would be of particular use when dealing with versioning issues
for any specific types of file?

Audio                                                                                                                             1.00    1.56
Datasets                                                                                                                          3.00    4.69
Images                                                                                                                            2.00    3.13
Learning objects                                                                                                                  2.00    3.13
Text documents                                                                                                                    4.00    6.25
Video                                                                                                                             1.00    1.56
Relevant to all file types                                                                                                        51.00   79.69

Total replies:                                                                                                                    64.00

Please use the text box below to expand on your answer if necessary

Fedora automatically assigns a new version number for each datastream that replaces a current datastream.
I am supposing that keepers of datasets have systems in place for keeping track of versions.
I don't think this approach would be useful as it is linking the deposit in the repository rather than changes in the base
document, someone could put an older version in second and this approach would indicate that it was more recent

numbering does not indicate what is different, it just indicates more than one version
Objects may not be uploaded on chronological version order
Problem with this approach is that a depositor might upload versions in a different order, e.g. latest version first, then some
older ones
better to jsut go for a repository-wide version number anything else is too hard for users.

seems far too simplistic approach. My gut reaction suggests that you would end up papering over a lot of cracks this way.
The issue is how to automatically link the object to be ingested to pre-existing versions already in the repository.
This solution will be wrecked by corrections and recoveries.
This would have to be a system you would use for all objects in repository
Too much of a blunt object, doesn't deal with the subtleties of version vs revision.
What is it conflicts with my current system?

24. If a date and time stamp information could always be provided to identify versions, how useful do you feel this
would be when dealing with digital objects?
Very useful                                                                                                                        42.00   45.16
Useful                                                                                                                             35.00   37.63
Of limited use                                                                                                                     9.00    9.68
Not very useful                                                                                                                    5.00    5.38
Don't know                                                                                                                         2.00    2.15

Total replies:                                                                                                                     93.00

If you agree that this would be useful, do you think it would be of particular use when dealing with versioning issues
for any specific types of file?

Audio                                                                                                                              4.00    5.06
Images                                                                                                                             4.00    5.06
Learning objects                                                                                                                   1.00    1.27
Text documents                                                                                                                     1.00    1.27
Video                                                                                                                              4.00    5.06
Relevant to all file types                                                                                                         65.00   82.28

Total replies:                                                                                                                     79.00

Please use the text box below to expand on your answer if necessary

All our revisions and versions will have date-time stamps.
as long as there was flexibility within the system to change it if upon checking you discovered that the preferred version my
be older and more detailed than a later version
can be forged
challenging data to obtain and guarantee accurate
Date and time stamps are useful. However, embedded stamps in files provided by the creator should not be trusted.
date info is always useful but this isn't necessarily a good indicator of the version of a work. An older version could be added
later.
Date very useful, time usually (not always) irrelevant.
Fedora records the date and time of ingest for each datastream.
If it's good enough for YouTube it should be for repos!
It is unlikely that versions will be created at exactly the same time, this would be very useful in identifying one version from
another.
Need to be careful, as date stamps may not be forever, nor do they indicate actual precedence - for instance you might get a
publisher version of an article before a preprint, even though it is a later version of the work.
Providing there was agreement on what the date was referring to. e.g. in case of data - release date or date data was
captured.
So long as it is accurate. Can sometimes be misleading.
Stop talking about 'file types'! Be specific: file formats (not display formats), content/information types, media etc.
There is a problem of which date to use. Date of file creation, processing software creation, date of arrivel, etc.
This feature may not be enough for all types of material.

This model works very well in existing CVS systems for program source code, and so I expect it would transfer very well to
other object types. With CVS, I can retrieve a previous version by remembering on what date I made the edits. (CVS also
tends to allow the user to put in a comment along with the date - this is also very useful for retrieval, especially when there
have been many minor edits.) I also use CVS for text documents because I find it so simple and useful.
This would enable all users to compare the version they have with others they have found, from whichever source, providing
that the other versions were also dated.
Timestamps can be fiddled and upload does not necessarily happen in a timely manner or the "right" order
Usually depends on when you save the file so you have to be careful not to save over it.
Why not datasets? Snapshots of datasets are time-dependent and this information would give valuable context to the
environment in which the dataset was collected.
Would depend on what the data and time represented. eg "Date of deposit" in the repository could be misleading?

Section 6: Possible solutions to version identification: Other

25. ID tags/file properties can be used to store metadata with the object itself e.g. ID3 tags which are used to store
data such as artist name, song title, genre etc. within MP3 audio files and File properties within Microsoft Word used
to store information such as author, subject etc. If a standard system for adding and completing ID tags for digital
objects could be developed how useful do you think this would be?

Very useful                                                                                                                       32.00   35.16
Useful                                                                                                                            37.00   40.66
Of limited use                                                                                                                    9.00    9.89
Not very useful                                                                                                                   1.00    1.10
Don't know                                                                                                                        12.00   13.19

Total replies:                                                                                                                    91.00
If you agree that this would be useful, do you think it would be of particular use when dealing with versioning issues
for any specific types of file?

Audio                                                                                                                                  9.00    10.34
Datasets                                                                                                                               3.00    3.45
Images                                                                                                                                 8.00    9.20
Learning objects                                                                                                                       3.00    3.45
Text documents                                                                                                                         3.00    3.45
Video                                                                                                                                  7.00    8.05
Relevant to all file types                                                                                                             54.00   62.07

Total replies:                                                                                                                         87.00

Please use the text box below to expand on your answer if necessary

Any automation of metadata construction would be helpful
anything that reduces user input would be a very good thing
Do all digital types provide ID tags and are they consistent?
Fedora captures information to describe each datastream during ingest.
How hard was it to get Dublin Core used (not agreed)? Just think how big a mountain you are suggesting.
It's potentially useful; it depends what else is being captured elsewhere.
Problem is, in a digital archive you need to be able to migrate an object into different formats over time, we would rather
metadata was separate from the object rather than embedded as then you can be sure it will survive whatever file format the
object is migrated into in the future
Simplistic tags will not be usefulf for all types, e.g. the current mp3 works with pop music i.e. songs that have artist, song title
etc. 'Classical' music would require more information that may be more difficult to integrate.
Such things already exist for (at least) audio, images, and video.
Think there could be problems with data as there would have to be agreement by commercial and open source software
suppliers on how to handle such tags when reading and exporting the data.
This should make use of existing schemas for recording information rather than create something additional.
This sounds interesting. I hadn't come across this before.
Use of ID tags is patchy at best for digital objects (other than mp3 etc)
USeful,not use it can you even ask this? You can't compromise the authenticity of the object, but can see it might be useful for
Would but how in an archival context as it would change other people's file formats.
some contexts. Still require metadata separate from object.
26. If former versions of digital objects could be stored by repositories with audit trails created to record changes
how useful do you think this would be?


Very useful                                                                                                                       24.00   26.67
Useful                                                                                                                            49.00   54.44
Of limited use                                                                                                                    11.00   12.22
Not very useful                                                                                                                   1.00    1.11
Don't know                                                                                                                        5.00    5.56

Total replies:                                                                                                                    90.00

If you agree that this would be useful, do you think it would be of particular use when dealing with versioning issues
for any specific types of file?

Audio                                                                                                                             0.00    0.00
Datasets                                                                                                                          0.00    0.00
Images                                                                                                                            1.00    1.52
Learning objects                                                                                                                  0.00    0.00
Text documents                                                                                                                    1.00    1.52
Video                                                                                                                             1.00    1.52
Relevant to all file types                                                                                                        63.00   95.45

Total replies:                                                                                                                    66.00

Please use the text box below to expand on your answer if necessary

Audit trails are important but archives would want to keep copies of the different versions anyway.
Could be useful: but I am not sure that we (and depositors) will always want previous versions kept? That would be a
question to ask!
Could not the taxonomy in its provide sufficient information such that a separate audit trail is in effect unnecessary (for the
majority of users)?
Fedora keeps each version of a datastream along with the date and time of ingest and any other information associated with
each datastream.
I am torn on this - an audit trail is useful for rollback of changes, but might create a lot of complexity and data which is
unnecessary. Perhaps only worthwhile for significant changes, or significant resources?
Most content is not stored in its original/development location (which usually allow some sort of such functionality).
Probably need to record in metadata what the change represents.

This is what we hope to be able to do with DSpace at some point in the future. It explicitly part of the new architectural design.
This is what we plan to do with repository created derivatives using the premis metadata standard.
Would be useful for some projects

27. How useful do you think performing file comparisons (i.e. comparing data between files to establish whether one
file is identical to another) is when identifying different versions of digital objects?

Very useful                                                                                                                          23.00   25.56
Useful                                                                                                                               38.00   42.22
Of limited use                                                                                                                       13.00   14.44
Not very useful                                                                                                                      4.00    4.44
Don't know                                                                                                                           12.00   13.33

Total replies:                                                                                                                       90.00

If you agree that this would be useful, do you think it would be of particular use when dealing with versioning issues
for any specific types of file?

Audio                                                                                                                                3.00    4.84
Datasets                                                                                                                             1.00    1.61
Images                                                                                                                               3.00    4.84
Learning objects                                                                                                                     1.00    1.61
Text documents                                                                                                                       4.00    6.45
Video                                                                                                                                3.00    4.84
Relevant to all file types                                                                                                           47.00   75.81

Total replies:                                                                                                                       62.00

Please use the text box below to expand on your answer if necessary

A file comparison is unlikely to assist you to identify the version number of the object.
But lots of implementation challenges
But only necessary if all other methods of version identification fail. File comparisons "on the fly" and performed as part of a
deduplication process at the discovery service, and transparent to the user, might be handy.
Could be useful to detecting duplicates. You'd need to know which was the more recent instance. It might be too blunt for
some types of material e.g. identical text could be presented in a completely different way.
I don't know whether automated file comparisons will catch all differences relating to audio files or music notation.
I find diff essential with source code and text documents. It might be hard to implement with audio and video but doubtless
would be just as useful.
I think this would be most useful where there is most likely to be disputes over the original of an object.
Is this something that would be meaningful to users?
It can be useful, but it will only match identical files, and will not tell you anything about files which are actually the same thing
presented in a different format. Should not be relied upon.
Some repositories (such as patents) need to show that the information has not been tampered with.
To me this seems the most important thing of all in order to 'save the time of the reader'
Useful for de-duping or finding copies in other repositories but not necessarily for an end-user
USeful tool for repository managers, cataloguers and researchers.
Useful, but few users have the skills to make reliable inferences.

Would be hard particular as new versions of the data might be supplied in different file formats and file content structures.
Would be nice if it was free too!

Section 7: Additional comments

28. Do you have any other suggestions for identifying versions of audio and video files, images, datasets, text
documents or learning objects in digital repositories?

As long as its easy to use, well explained and supported by lots of repositories, I'll go with what ever is suggested.
Audio file versions can be complex, especially when considering archival material. More work on version identification would
be welcome.
Enabling users to add and share tags could be useful.
hash identifers
I think the subversion model is pretty good. We have a system that uses it at the back end, but users do not seem to demand
a user interface
I've seen visual clues used - for example a green tick to indicate a peer-reviewed, published article (the implication being this
is safe to use for research) and also a rosette system...
if the metadata in a complex record aggregates several format types how should this parent record be described
It would be worth if possible getting version standards out to creators. I realise it's not easy, but if the metadata around the
objects is good to start with, it will make control versions much easier in the repository. Records Managers might be helpful in
this sort of work.
Keep it simple, otherwise it is doomed to fail. People understand dates. Version numbers, e.g. of software, only make limited
relative sense.
Maybe some interaction with the OAI-ORE work on describing complex objects might be useful? Most versioning probably
needs to be addressed by repository software developers; whatever framework is agreed needs to be simple and transparent
to the end user even if some fancy relationship stuff is going on in the background!
No
possibly 'special' issues surrounding music
see Version control article in RMS Bulletin
Sensible metadata and documentation from the depositor is key

The crux of the issue is establishing consensus on what any particular versioning data (numbers or vocabulary) mean. This
will be dependent on shared meaning of content/information types. Start there and limit the scope to a specific community.
The question on relationships probably covered this, but there is the need to work out how complex objects are managed,
where the complex object itself will have versions, and so will the constituent components.
this survey has a very narrow sense of what constitutes a digital repository. There's a lot going on in this area outside the
realm of 'libary oriented digital repositories'.

29. Have you managed to resolve any problems you may have faced with versioning issues and if so how has this
been achieved?

As outlined earlier, we have used FRBR to give a framework for relating versions of the same thing together. This has been
useful but needs quite a bit of user education for sucessful end-user uptake.
Bought an electronic document management system for $$$$$$.$$
By establishing with both contributors and readers what the different steps are in the lifecycle of different types of content and
what they are called -- but only within a small, fairly homogenous community.
De-duplication is still carried out with sql queries.
Due to technical limitations with P drive, only naming conventions really useful.
For over 99% or publications it is not an issue. Only the final (or very-near final) version of an article is published, and that is
all that anyone wants. Very few publications or objects warrant forensic analysis.
For repository generated versions we will use a number in sequence and create relationshio metadata using premis. For
access versions (e.g. thumbnail, medium, high res) would use Fedora's object model which would contain these different
disseminations within the same object. For creator versions we will rely on descriptive metadata.
For research papers, we indicate explicityly if that it's the author version of the work but it's a very blunt approach and is not
captured in stuctured metadata.
Happy with what Fedora offers at the moment, though conscious we need more real life usage to clarify issues.

I wrote a couple of scripts that would link (automatically) two versions of a paper together using the qualified dc:relation fields
(versionOf I think it was) but that simply linked versions - giving a "see also" link rather than a "this is the version for you"...

Just about. We have a very simple taxonomy which was agreed by committee to cover the bases. it is not sophisticated, but it
is easy to use and gives the end user all the information they probably need. Is no good for archival quality, though, IMO.
Just numbering versions, and creating a DC relation is a version of etc
Manual control of new versions. Perhaps inefficent time wise but with a high quality audit trail to previous versions.
No examples
No, I think adding a textual description as I do, is too variable though better than nothing
Not with audio. Images are given automated version filenames but this is very limited.

not yet, but I'd like to use relationships more. Also, I think that current repository software blurs this in its user interfaces - I
think a lot could be solved with better user education from creation and software interfaces that ask th e right questions.
Pre-release versions are assigned names (and sometimes numbers as well if necessary) such as XXXv3 alpha 1,
XXXv3beta 1, XXXv3beta 2, etc.
See previous answers
simply using file name with increasing numbers file ver 1, file ver 2 etc also used specific dates from which the latest can be
chosen.
Speaking to our depositors - often we find they sent us extra versions in error and we can just delete them, meaning we do
not have a problem any more!
Still struggling but the lack of data (and repositories) in our area means that users are happy just to get data. Concerns about
versions will rise as the idea of sharing and reuse matures.

We make it clear in the metadata if we have archived an author's postprint, and make it clear where the published version is.
We have archived one piece of software, and replaced the old version with a newer one, at the same persistent URL.
We only add one version of a file, the most 'complete' or 'finished' version we can obtain eg publisher PDF in preference to
author's postprint if possible
We use v1, v2, v3 etc and our users seem to understand this. We have considered a branching version structure to indicate
author and publisher updates but I think that would likely cause too much user confusion.

30. Are there any specific issues regarding version identification that you would like to see addressed and which
have not been covered in the questions above?
Adoption and application of a standard ISO date format, e.g. 2007-09-25 or 20070925, would solve most issues and
problems with version identification. The solutions are largely cultural, procedural and human, not technical, just as with the
problems of metadata interoperability in general.
Apologies if I ramble a bit in this box! I do wonder a bit about the use of the word 'version' and exactly what you mean by a
version. I think the survey covers different types, but doesn't necessarily say so - I'm sure you are aware of this. For example,
with a text document, there might be different versions (or revisions) which are significantly textually different and it's
important to make these revisions explicit to users; but then there are 'versions' which have minor typographical revisions -
you might want to timestamp these somehow but not make all of these available to the user. An image might be exactly the
same pictorially but different file sizes might be considered versions.

As will be apparent from the foregoing answers I am not closely involved in digital repository developments but I do have
certain responsibilities for copyright licences and hences concerns about the way copyright issues may restrict the sensible
development of digitised approaches to L&T and Research (development of repositories and VLEs). It may be desirable to
'be seen to' involve the UUK Copyright Negotiating Group in any national developments in relation to repositories.
Comparison of files using a file comparison tool to establish what the differences are between versions and presenting these
to the reader.
I think it would be good to be able to stamp metadata and version info into downloads (metadata in PDFs, filenames)
I think the language in this survey may cause confusion and that you need to distinguish very carefully between
developmental versions like 0.1, 0.2... and versions like author's final, published...
I think there will be slightly different issues involved with archival material; there might be problems making certain all the
versions in an ERMS are transferred to the repository.
I would like to see full version identification for each type of material discussed.
It's fine
No, you seem to have picked up on the main issues

One problem I see is the difficulty in persuading academics to identify versions at the point they are either submitted to
repositories or appear on their web sites. Also the issue of the version publishers allow to be deposited in repositories - I
believe this is not always the final PDF. How will you track revisions that academics add to their papers on their web sites.
Probably! I'll tell you when they come up.
The difference between a version and a revision.
The relationship between raw/processed/derived data - this is a tricky one!
This all seems very comprehensive.
Tools which facilitate version comparison.
What about user identity / ownership of versions? It might be useful to know who made what edit, and why - although there
may be privacy issues etc. Maybe interesting to explore.
When data or learning objects are created by merging (parts of) other already existing data or learning object.
yes - how to deal with minor corrections (eg typos) and major corrections (eg a wrong answer); minor revisions (eg an update
to an image file in a learning object), major revisions (eg the re-writing of an item to change the context)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:7/25/2011
language:English
pages:81