SEARCH   ECU WebsitePeople GO
 
Joyner Library

BlackBoard
Help Desk
PirateID
Index
Email
OneStop
Calendar
BlackBoard IT Help Desk PirateID Index Email and Phone OneStop Calendar
UNC RM Presentations - Data Warehousing: The Use of DSpace to Create Institutional Repositories for Universities

Harry Keiner, Ph.D., CA
University Archivist & Director of Records Management
Appalachian State University

September 14, 2004

Introduction:

Good afternoon. Thank you for the opportunity to present for discussion an issue which, I believe, is an important concern for all of us with some measure of responsibility for the preservation of records created in various digital formats. My goals are to provide you with an introduction to digital Institutional Repositories; to understand the relationship between data warehousing and institutional repositories; to explain how a software product, DSpace, has become one of the driving forces behind the development of institutional repositories; and finally to present some ideas on the importance of institutional repositories to archivists and records managers and the opportunities that I see for implementation of an IR strategy based on DSpace.

Definitions:



I want to begin with some definitions as a way of introducing you to some of the terminology I will be using in this presentation. These definitions will also serve to narrow the focus of my remarks, as it is my intention not to present solutions to all our problems with electronic records; but rather to focus on a one solution to a certain class of records that we have traditionally been asked to manage and preserve.

First we must deal with the concept of data warehousing. This term has it origins in the IT units of major US corporations, government departments, and large non-profit institutions. At the heart of these entities are massive data bases of information that allow their employees to conduct day-to-day transactions with customers and clients, to translate these transactions into accounts payable and accounts receivable, to provide data for the research and development of new products and services, and to provide managers with reports to track current operations and predict future business trends. As these data bases have grown, mutated, migrated, and been reborn into new information resources, it has been recognized by IT managers that data is an institutional asset that must be carefully managed. Data warehousing is the solution to this problem: the creation of secure electronic storage systems where data bases and the data sets that underlie them can be maintained, protected, and preserved. The IT managers at universities within the UNC system are dealing with or will have to deal with this issue soon. Already many campuses have established Student Information Systems to manage data related to an array of student activities from admission, through attendance, to graduation and the transition from student to alumni. And, with the increasing development of Banner applications in areas such as payroll and accounting, the need to think about establishing formal programs for data warehousing on all campuses will increase. These developments, of course, have major implications for university records management programs and we can only hope that the IT managers and campus administrators will save a place at the table for their record managers when the issues of data warehousing are discussed.

In the university setting, Institutional Repositories may be seen as a subset of data warehouses. Their primary purpose is to provide a virtual record center where the intellectual output of the university in various digital formats may be deposited, stored, indexed, and preserved for use by the university community and the scholarly world beyond. This intellectual output begins with the work product of faculty members, and can include journal articles, presentations at conferences, research notes in written and statistical formats, course syllabi, and bibliographies. Student work is another type of output in the form of papers, presentations, theses, exhibitions, and audio-visual productions. A third category of output includes: administrative reports, publications, photographs, catalogs, video productions, and press releases produced and distributed electronically.

The most important tool that has been developed to manage Institutional Repositories is DSpace, a software product that grew out of a collaboration between MIT and Hewlett Packard, beginning in 2001.

DSpace-Purpose:

DSpace was developed to address two issues. First, it provides an Institutional Repository that captures, preserves, and communicates the intellectual output of a university’s faculty, researchers, students, and administrators. Its design allows the management, and preservation of digital material in a variety of formats. Second, DSpace was designed from the outset as an “open source” system that would be made available at no cost to other institutions. The goal is to extend the benefits of DSpace technology to a federation of Institutional Repositories and thereby make available the collective intellectual resources of the research institutions around the world. It is hoped, then, that the adoption of DSpace by other institutions will address a number of problems that currently impede scholarly communication.

• By adopting the open source model, DSpace will be developed into a robust and sustainable system as useful developments by the federation of Institutional Repositories will be freely shared among all members.

• By creating a federation of like minded members, DSpace will have an important voice in the development of standards for describing digital objects, while influencing development in the commercial sector that might wish to make use of digital assets within the federation’s Institutional Repositories.

• Federation will also positively influence the development of new solutions to digital collection management, preservation, and scholarly communication by supporting interoperation among institutions running DSpace. One area of great promise is to develop a new model of scholarly communication that addresses the growing dysfunction of the current system because of the high cost of scholarly publishing. By brining scholars together through the DSpace federation, it is hoped that new publishing ventures such as faculty-based journals or e-communities will result.

Characteristics of a DSpace Repository:

An Institutional Repository built on DSpace requires the organization of Communities within the university to control the contribution of content. These communities might be departments, labs, research centers, or administrative office. Once formed the community would choose a community administrator. His job would be to oversee the submission of defined types of content, paying particular attention to see that they were correctly tagged using a metadata scheme based on Dublin Core. The content could include:
• Documents, such as articles, preprints, working papers, field notes, technical reports, conference papers
• Books
• Theses
• Data sets
• Computer programs
• Visualizations, simulations, and other models
• Publications
• Administrative records
• Bibliographies
• Images
• Audio files
• Video files
• Digital library collections
• Web pages
• Course curriculums and syllabi
Within the Institutional Repositories, DSpace provides three levels of digital preservation based on file formats and ongoing software support and development. In other words, no one can control what formats community members will use to produce content. Rather, the administrator of the Institutional Repository will identify three levels of preservation that DSpace will provide:
• Supported formats will be functionally preserved using either format migration or emulation techniques. These “open” formats currently include TIFF, XML and PDF.

• Known formats are those that are proprietary but which are so popular today that migration tools will likely be available in the future. These formats include Microsoft Word and PowerPoint, and WordPerfect.

• Unsupported formats are those which are highly specialized or custom designed by the content provider. Content submitted in these formats will be preserved at the bit level, that is the bit streams will be maintained without change, but it will be up to the content provider or successors in the future to develop tools to read the content of files or tables.

University Libraries are the natural home for a DSpace Institutional Repository. Traditionally, these libraries have served as the central repository of knowledge for those supporting the University’s teaching, research, and related activities. Their collections of print and non-print materials represent the intellectual output of 3000 years of human history; and they also provide custodial care over many collections of research materials produced by the university’s members in the form of theses, dissertations, and publications. Moreover, because these libraries are often the home of university archives and records management programs, they hold unpublished manuscripts, images, and audio-visual files that document the life and work of the university in “hard copy.” As the traditional university library transforms itself into an information commons by providing its members with access to published electronic resources beyond its walls, it is imperative that the library host an Institutional Repository that will preserve, in the electronic formats chosen by its members, the result of their investigations, studies, and research, results for which there will most likely be no print alternative.

For the University Library to take on the role of supplying the digital Institutional Repository has advantages for both the library and the communities supplying content.

• By setting standards and maintaining the content submitted, the Library retains its primacy as the academic community’s central repository

• By forming communities and submitting content with the attached metadata, faculty, students, and administrators are guaranteed the preservation of their work and continued access both within and without the academy’s walls.

To pay for the Institutional Repository a system of charge backs based on the number of gigabytes stored by each community would seem to provide a simple solution.

DSpace and University Archivists and Records Managers:

University Archivists and Records Mangers would benefit from contributing their time to advocating, establishing, and administering an Institutional Repository.
• A DSpace Institutional Repository provides archivists and records managers with a powerful tool to help solve the problem of documenting the work of faculty and students in the digital age. By appraising the contents of the repository and scheduling certain electronic files for retention, or for the transfer of copies to the Archives, a system can be set up on sound Archival and RM principles that replicates the disposition of records in paper formats.

• By organizing an Administrative Community to deposit electronic records, the archivist can guarantee the continuity of certain important records series: annual reports, fact books, digital photographs, the general and graduate catalogs, e-newsletters, and the University’s web site, which exist increasingly only in digital formats.

Institutional Repositories and University of North Carolina:

It seems obvious to me that Institutional Repositories whether based on DSpace or some other software are on the near horizon or already in our midst. Our large, private neighbor, Duke University, is already using DSpace on a trial basis. And UNC Chapel Hill is studying its potential, indeed a team of SILS students recently developed a DSpace application as a project for a graduate seminar.

There seems to me to be three avenues for the development of Institutional Repositories within the University system.

• First, the Board of Governors might provide system-wide sponsorship for such an effort, offering the opportunity to set up an Institutional Repository to each University Librarian. That is, the UNC system would bear the cost of development, implementation, and training, and develop common metadata standards to guarantee interoperability across all 16 campuses. UNC, in a sense, would become its own DSpace federation. I think the outcomes of such an initiative would be of great value. For example, think of the benefits to the state’s archaeologists to be able to share field notes, reports of excavations, and artifact data and images so that students and faculty could participate as virtual teams in uncovering and analyzing the state’s archaeological resources.

• Second, if the Board of Governors does not step up to the plate, then perhaps groups of university campuses will band together and jointly develop a DSpace Institutional Repository. For example, Western Carolina, UNC Ashville, and Appalachian State, already joined together in the Western North Carolina Library Network, might adopt for such a solution as a way to control costs

• Third, DSpace may be developed independently by the system’s two flagship research institutions, UNC Chapel Hill and NC State, leaving the other 14 campuses to struggle to do their own thing. My guess is that that some forms of Institutional Repositories would emerge but without the coordination and strength of more unified efforts.

So, in conclusion, who knows how this story will be written. Thank you.


 
ecu logo
Joyner Library, East Carolina University
East Fifth Street | Greenville, NC 27858-4353 USA
© 2008 | terms of use | Last Updated: 04.18.2006