Oregon State University (OSU) CONTENTdm Controlled Vocabulary Analyzer Modified: April 26, 2004 Terry Reese 541-737-6384 firstname.lastname@example.org Background: The Controlled Vocabulary Analyzer really came about out of a need for some sort of authority control mechanism in our CONTENTdm collections. One of the difficulties with CONTENTdm is maintaining a single, master vocabulary list for specific collections – particularly when multiple users are creating and adding metadata. The Controlled Vocabulary Analyzer then allows me to quickly generate a series of reports that reflect metadata terms currently setup up as controlled vocabulary and for what project – as well as view metadata terms that are actually in use within a project or projects. The script can be run in two modes: list mode and report mode. In list mode, the Script will generate two files. The first file is a master list of all available controlled vocabulary terms for a particular metadata element. The second file is a master list of all controlled vocabulary terms actually used within the project for a particular metadata element. The second mode, report mode, generates 9 separate reports generating data on both the controlled vocabulary lists and actual usage of metadata elements. Video Examples: Analyzing Selected Collections -- Size: 880 K Analyzing all collections -- Size: 398 K Potential Uses: Maintaining a master list of controlled vocabulary elements for particular Dublin Core fields. This is probably most useful in the cases of Creator, Contributor, Coverage.Spatial and Subject. Identifying conflicting terms or names between projects (which would affect cross project searching) or within an individual project. System Requirements: PERL Arguments: server: Name of the CONTENTdm server that you are looking to query. projects: (OPTIONAL) You can limit the analyzer only to a specific list of projects. The projects argument should be constructed as a colon delimited list: i.e.-- dna:archives:streamsurvey. This argument is checked against the servers catalog file, and so long as the collection is present, it will be analyzed. dc: Name of the Dublin Core element that you wish to query. The element can be passed in CONTENTdm shorthand or in Dublin Core notation, including the dc: or dcterms: namespace (example: dcterms:spatial) nicks: This term specifies a project and a CONTENTdm nickname. If the nicks value is present, the data in projects and dc are ignored. debug: Set to 1 to turn on report mode. Set to 0 (or don’t use) to activate list mode. Generated Files (List Mode): Master Controlled Vocab. list: This is a complete list of all Controlled Vocabulary Terms from all projects on the designated server. Master Controlled Vocab. list for actual in use items: This is a complete list of all Controlled Vocabulary Terms for all projects on the designated server that are currently used in a project’s metadata. Generated Files (Report Mode): Tab Delimited Term/Count list: Creates a tab delimited list noting each defined controlled vocabulary term, as well as the number of times the term has been defined within one’s CONTENTdm projects. Tab Delimited Cross Term usage list: Creates a tab delimited list that notes the controlled vocabulary term and projects that this term is defined in. This file only includes elements that are defined in multiple projects (i.e., just because a term is defined as a controlled vocabulary element does not mean that it has actually been used within the project) HTML Info list -- Displays Term and Projects: This is an HTML generate table that includes the same information as found in the Tab Delimited Term/Project list. The file basically notes what terms are defined in what projects. Controlled Vocabulary Usage Table: This is an HTML report that notes if a project utilizes a controlled vocabulary for a particular Dublin Core Element and the number of elements that do not use (or use) a controlled vocabulary. Tab Delimited Term/Project list: This is a tab delimited list that notes what terms are defined in what projects. In Use Tab Delimited Term/Count List: The terms that are in this list are terms that are actually in use (appear in the metadata) of a project. This counts the number of times that a term has been used in a project. This does not count the frequency of use within a project, but the number of projects that the word has actually been used in. In use Tab Delimited Cross Term usage list: This is a tab delimited list that notes cross term usage between collections. So if in project1, you’ve used Rivers within the subject element of your metadata and in project2, you’ve used Rivers within the subject element of your metadata and in Project3 you’d only defined Rivers as an available controlled vocabulary term the, the term Rivers would be outputted, in addition to Project1 and Project2. In use HTML Info List -- Displays Term and Projects: Displays the same information as found in the In Use Tab Delimited Term/Project list. Basically, its an HTML representation of a report that notes each Controlled Vocabulary term currently in use and the projects that the term appears in (i.e., is used within the project metadata). In Use Tab Delimited Term/Project list: Basically, its an Tab Delimited representation of a report that notes each Controlled Vocabulary term currently in use and the projects that the term appears in (i.e., is used within the project metadata).
Pages to are hidden for
"Setting up a shopping cart in CONTENTdm"Please download to view full document