Table of contents
- First things first – a digital object
- Digital library – an overview
- Models, frameworks and software for digital libraries
- Interoperability of digital libraries
- Searching digital libraries
- Examples of popular digital libraries
First things first – a digital object
Introduction
A library provides its users with various books, newspapers, journals, etc. A user can borrow a book or read it in the reading-room. Analogically, a digital library serves its users by providing them with various digital objects. A user can download a digital object for immediate or later use, or view it instantly on the web browser. Although this comparison is very trivial and does not give detailed information, it aims at providing an overall feeling how a digital object should be understood. The basic conclusion that comes out from this comparison, also stated in the article entitled "A framework for distributed digital object services" written by Robert Kahn and Robert Wilensky, published in 2006 in the International Journal on Digital Libraries, is that digital libraries are built of digital objects [source]. This is crucial in terms of further study related to digital objects and elements that compose them.
A digital object can represent a book, a newspaper issue and even a multi-volume book such as encyclopedia or a whole newspaper. The basic idea behind the digital object is unique identification of the digital data. It means that the digital object is defined by the data and metadata, where metadata must include a unique identifier of the digital object. While the metadata (unique identifier) allows discoverability of the digital object, the data (also referred to as content) holds the intellectual value/information. Going further in details, it is possible to distinguish the following elements of the digital object:
- Unique identifier
- Metadta
- Content
Although the unique identifier can be understood as a part of the metadata, it is worth listing it explicitly in order to underline its importance.
Identifiers
In order to use a unique identifier for a digital object resolution it is necessary to define a standard way for identifier interpretation. There are various standards related to identification of digital objects. In the world of the Internet, the most frequently used identifiers are URIs (Uniform Resource Identifier). Another example, this time related to books, is ISBN (International Standard Book Number). In the world of digital objects, there are several standards, such as Digital Object Identifier System (DOI® System) or OAI identifier format.DOI® System, maintained by the International DOI Foundation , is defined as a system for identifying content objects in the digital environment. DOI® names are assigned to any entity for use on digital networks. They are used to provide current information, including where they (or information about them) can be found on the Internet. Information about a digital object may change over time, including where to find it, but its DOI name will not change [source] . In order to register a new digital object in the DOI® System (assign a DOI name to the digital object), it is generally required to pay a fee to a so called registration agency. This fee is for guaranteeing that the assigned DOI name will persist over time. An exemplary identifier is as follows: 10.1000/182.
The OAI identifier format is strictly related to the well-known and public protocol for metadata harvesting (OAI-PMH) which is intensively used by digital libraries. The OAI identifier format is intended to provide persistent resource identifiers for items in repositories that implement OAI-PMH [source]. There are no fees for registering new identifiers, but at the same time the institution that assigns new identifiers has to maintain them itself. An exemplary identifier is as follows: oai:www.wbc.poznan.pl:21323.
Metadata
Metadata of the digital object may describe its various aspects. The library of Congress in its early research activities related to metadata, and also later when defining the METS standard, distinguished three basic types of metadata:
- Administrative metadata describe aspects related to preservation and management of the digital object in the digital library.
- Structural metadata describe the way a digital object is structured. This kind of metadata helps a digital library to appropriately store and present digital objects.
- Descriptive metadata is used for discovery of digital objects. This kind of metadata is usually prepared, so that it can be easily understood and read by users. On-line catalogues are examples of Internet/Intranet systems that provide descriptive metadata for library users.
Having the overall view on the metadata issue it is possible now to get deeper into the standards and formats that encode these types of metadata in a standard and strictly defined way. Probably the most known example of descriptive metadata standard is MARC – Machine Readable Cataloguing which is dedicated for bibliographic information. There are also various other metadata standards, such as Dublin Core, METS, MODS, MPEG-7, EAD, CDWA, TEI and many more. From the perspective of digital libraries, worth mentioning are at least two: METS and Dublin Core.
The XML based METS standard , defines a way for encoding and transmission of administrative, structural and descriptive metadata. Although the standards is quite extensive, it provides means for encoding all kinds of metadata. One of its strengths is the ability to embed other standards, which results in high flexibility for its application in different situations.
In the world of digital libraries (and not only) the Dublin Core Metadata Element Set (DCMES) is a very popular standard for encoding descriptive metadata. Its main strength is simplicity and therefore easy application in Internet systems such as digital libraries. DCMES is a set of fifteen core properties for describing a digital object, which was published in the following standards:
- ISO Standard 15836-2003 of February 2003 [ISO15836]
- ANSI/NISO Standard Z39.85-2007 of May 2007 [NISOZ3985]
- IETF RFC 5013 of August 2007 [RFC5013]
An excerpt from the above standards can be found on the Dublin Core Metadata Initiative web page under: http://dublincore.org/documents/dces/ . Please take a few minutes and get familiar with this standard (only in English). Since January 2008, following the Semantic Web principles, DCMI has extended properties defined in DCMES with elements related to domains and ranges. In this way a new document has been created – “DCMI Metadata Terms”, called in short DCTERMS. Currently it is encouraged to use semantically more precise DCTERMS.
Content
Content of the digital object is the real intellectual information and the value of the digital object. In order to present the content of a digital object it is required to facilitate one of the available formats, such as text file, PDF or MP3. The content is usually presented by one of the formats, but it is not limited to only one. It is possible that the same content is presented by several different formats, depending on the context of the presentation purposes, e.g. a mobile device vs. regular PC vs. long term preservation. For a mobile a text file can be used, for PC an image, and for long term preservation high quality image is required.
An important distinction within the context of the digital object content is the master and access files of the digital object. Master files of the digital object are represented by the original files that are created during digitization, e.g. scanning. These are large and high quality files created for long term preservation purposes. Access files are those presented on the digital library web page and dedicated for users. Access files are usually derived from master files by decreasing the quality and therefore the size of the digital object for better user interaction with the digital library system. When creating access files multiple image enhancement transformation can be applied. There are multiple guidelines related to the digitisation and formats for master and access files, such as Digital Imaging Guidelines for Hudson River Valley Heritage or Technical Guidelines for Digitizing Cultural Heritage Materials from Library of Congress.
Digital library - overview
Digital objects build a digital library, but these are not digital objects only. Beside digital objects there are people, equipment, software systems, procedures, communities, etc. A good definition of the digital library gives a working definition of a digital library that can be found on the Digital Library Federation portal [source] :
Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities.
The most important fact is that a digital library is not a software system or a collection of objects. It is not a web portal that only serves digital content. Digital libraries are organisations which maintain and provide all of the resources that are necessary to present and preserve digital objects over time for future generations. From the definition above it can be concluded that a digital library needs to address the following points:
- Specialised staff for activities related to building and maintaining a digital library.
- Availability of digital objects for users over time.
- End users that are the focus group for a digital library.
Specialised staff is the key to success. The knowledge itself but also the ability to acquire new knowledge are the crucial factors for a good team of people developing a digital library. The staff is responsible not only for a day to day duties related to digital objects production, but also for a definition of procedures, standards and rules, so that each part of the digital library production process is completely persistent and explicitly defined.
A very important issue in the context of a digital library is the long-term preservation of digital objects referred to in the above definition to availability of digital objects. There are many initiatives and projects related to long-term preservation, including Digital Preservation Europe , the PLANETS project, the Digital Libraries Federation and the National Digital Information Infrastructure and Preservation Program of the Library of Congress . Basically, there are two important aspects of long-term preservation – accessibility and readability . In case of accessibility it is necessary to keep the digital content available and ready to access over a long period of time, which in fact means infinitely. Readability is about making the digital content readable over time – it is an additional requirement – not only keep the digital content accessible, but also readable for interested users in the future. These aspects have to be considered, as with time there are both technology changes and software changes, which finally can lead to data loss. Currently few people remember DOS or Windows 3.1 environment, although it was popular only two decades ago (in 1992). 5¼ inch floppy disks are also currently seen as dinosaurs in the world of storage devices and probably not many remember any kind of floppy disks. Because of these changes constant focus on accessibility and readability of digital resources is a must.
Finally, end users – the target audience for a digital library characterises people/machines which will use digital objects available in a digital library. It is highly important to have a common view on the targeted audience of the digital library in order to fine-tune procedures within the digital library building process, so that they suit preferences of the audience.
After an overall introduction of the digital library definition it is possible to investigate deeper the issue of digital libraries core elements in a practical way. So far it has been shown how the digital library is organised and what the core building blocks and ideas are. Now it is time to focus on practical aspects of digital libraries. This is nicely shown in the DELOS digital library manifesto, which focuses more on the practical aspects, such us IT systems, specific actors and relations, rather that a general definition from the Digital Libraries Federation. According to this manifesto, a Digital Library (DL) is an organisation (possibly virtual), but it is stressed that the digital library itself is supported by the software systems required for online presentation and long-term preservation of the digital content. In fact, the software system has been split into two basic elements. The first one is the Digital Library System (DLS) which is the access point for the end users of a digital library, and the second one is a generic software system for providing appropriate software infrastructure and core functionality for the DLS – it is called the Digital Library Management System (DLMS). This idea is depicted in the graphic below:
Figure 1: DELOS Digital Library organisation – the figure comes from the DELOS Digital Library Reference Model document.
The overall definition of the digital library from DLF and a more detailed description from DELOS digital library manifesto gives the following elements, as necessary for a digital library to handle and care of:
- Specialized staff for running a digital library, preparing and fine-tuning various procedures/rules.
- A software system (framework) for distribution and storage of the digital content over the Internet or intranet.
- End users' awareness for making the digital library useful for the society.
Having a clear vision of what a digital library is, it is possible to go deeper into the issue related to benefits coming from a digital library both to its users and maintaining organisation. The following advantages seem to be mostly important:
- Increased accessibility of a digital object by means of the Internet and 24-hour availability of the digital library portal.
- Easy searching and browsing similarity to online catalogues.
- An unlimited number of copies – each digital object is an identical copy of the original digital object and there is no limitation regarding the number of copies that can be made.
- Small physical space required in comparison to traditional books, digital ones require very little physical space.
- Information exchange – easy and automatic information exchange related to collections of digital objects available in the digital library
Advantages and profits coming from digital libraries has also been noticed by the European Commission, which advocated creation of the European digital library in its i2010 strategy related to digital libraries. The main benefits that were identified included online accessibility, digitisation of analogue collection and preservation and storage, which also aligns with the above listing.
Apart from the definition of the digital library there are other aspects that need to be considered when creating a digital library. One of the most significant aspects is the legal one. There is a wide discussion about the problem of availability and licensing of the digital objects whose sources are in the so called public domain. Although there are legal regulations that define time-frame for intellectual property rights, there is no obvious interpretation related to the intellectual property rights of the digitized content. There are various institutions that take part in the discussion, including Europeana which published Europeana Public Domain Charter that represents Europeana view on the public domain content and its digitized representation. Shortly, the Europeana Public Domain Charter highly underlines that:
1. Copyright protection is temporary. Copyright gives creators a time-limited monopoly regarding the control of their works. Once this period has expired, these works automatically fall into the Public Domain. The mass of knowledge over recorded time is in the Public Domain; copyright offers an appropriate and time-limited exception to this status.
2. What is in the Public Domain needs to remain in the Public Domain. Exclusive control over Public Domain works cannot be re-established by claiming exclusive rights in technical reproductions of the works, or by using technical and/or contractual measures to limit access to technical reproductions of such works. Works that are in the Public Domain in the analogue form continue to be in the Public Domain once they have been digitised.
3. The lawful user of a digital copy of a Public Domain work should be free to (re-) use, copy and modify the work. The Public Domain status of a work guarantees the right to re-use, modify and make reproductions, and this must not be limited through technical and/or contractual measures. When a work has entered the Public Domain, there is no longer a legal basis to impose restrictions on the use of that work.
All of the above issues related to the definition of the digital library, a practical view on it, and legal issues constitute the context of the digital library environment and have an influence on the various aspects of the digital library creation process.
Models, frameworks and software for digital libraries
This section presents a detailed insight into the digital libraries world. While the first part of this section describes theoretical models and frameworks for building digital libraries, the second one focuses on software systems that are dedicated to digital library building. The software systems are given special attention, because they are usually the core part of the whole digital library as they both interact with digital library employees that upload digital content and digital library users that utilize the digital content.
Digital Library models
There are several interesting models for digital libraries. In this section three well known models are presented: The DELOS Digital Library Reference Model, Reference Model for an Open Archival Information System (OAIS) and 5S Framework for Digital Libraries.
The DELOS Digital Library Reference Model
This model was prepared and is maintained by the DELOS Network of Excellence on Digital Libraries. It is highly related to the DELOS Digital Library Manifesto, and deals with three basic systems:
1. The Digital Library (DL) which is responsible for creation, management, and long term preservation of digital content.
2. The Digital Library System (DLS) is responsible for interaction with end users both those utilizing digital content and those creating it.
3. The Digital Library Management System (DLMS) acts as a software infrastructure for production and management process of the Digital Library System: it can be understood as a lower layer software as in other domains are databases or operating systems.
From a broader perspective, the DELOS model defines are six basic concepts defined, which are related to the Digital Library:
- Content understood as the data and information that is available in the digital library. The content is composed of a set of collections which organise it appropriately to the needs of a particular digital library. The content means not only the raw data, but also metadata and other information related to digital objects.
- A User which covers any actor interacting with the digital library, including humans and machines. This concept cover all of the elements such as management of users, rights assignment user preferences, etc.
- Functionality which is the whole set of actions that is possible to perform on a digital library, including searching and browsing capabilities, new objects registration, etc.
- Quality defines a set of parameters that characterise and evaluate the content and behaviour of the digital library. This is related not only to the functionality of the digital library, but also to the quality of content.
- Policy represents rules, procedures and regulations related to the interaction digital library and its users, both virtual and real.
- Architecture is related to the Digital Library System element and concerns its internal composition and building blocks.
The above description is depicted in the following image:
Figure 2: Concepts related to digital libraries – the figure comes from the DELOS Digital Library Reference Model.
The model is completed by the following groups of identified users:
- DL End-Users exploiting a digital library and consuming digital objects.
- DL Designers defining the way a digital library is defined and maintained, so that DL End-Users are satisfied.
- DL System Administrators selecting digital library software components, responsible for high quality of digital library software and accessibility of the digital resources.
- DL Application Developers developing software components both for a digital library system and a digital library management system.
The model describes in detail a digital library software system and can be easily used when developing a digital library system. The important aspect here is the strict recognition of the software components, actors identification and interaction between particular elements of the model. This model has also been used as a basis to prepare The Digital Library Reference Model within the DL.org EU-funded project.
Open Archival Information System
The OAIS model is recommended and published by the Consultative Committee for Space Data Systems, National Aeronautics and Space Administration from USA. It is also an ISO standard – ISO 14721:2003. According to the OAIS reference model, an OAIS is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. It is underlined that the model is dedicated to any archive that needs to preserve content over a long time. It is also stressed that the model does not show any details related to software implementation, but gives a general overview on the archive information system and the way it should be organised.
The model concepts related to the OAIS archive environment include:
- Producers providing information which are to be stored in the archive system.
- Consumers which search, browse and download information from the archive.
- Management responsible for a definition of the rules and procedures for the archive, not directly connected with day to day activities of the archive.
- Producers transfer data to archive which stored the data.
- Consumers retrieve the data available in the archive.
These interactions define the basic functional scope of the OAIS model whose archive should offer external actors. Beside these basic functions, the OAIS model identifies needs for reports generation, data redundancy, exceptional situations handling, consistency checks and hierarchy of digital objects.
5S Framework for Digital Libraries
The model is a formal theory for digital libraries, which identifies five complementary dimensions including:
- Stream is a sequence of elements which defines content, both static or dynamic, such as characters, bits, pixels of images, etc.
- Structure indicates how the information in the digital library are organized, which includes relations between objects, but also a digital object structure.
- Space defines logical and presentational views on digital library components and are distinguished from stream and structure by operations that are related to digital library objects. For example, it can be a retrieval view on a digital library, a browsing capabilities view or a user interface view.
- Scenario which is a story that describes an operation that can be performed on a digital library: they show how the system can be used by end users.
- Society defines a range of different actors that interact with a digital library. This includes end users, administrators, library staff, and also external services. In general, these are actors that either make use of a digital library or support its creation.
Software for digital libraries
There is a wide range of software systems used for building digital libraries, including those most popular such as Eprints, DSpace, Fedora and Greenstone. Each of them provides a similar set of core functions with support of additional features dedicated to particular applications of the software. A short summary of each can be found in the following paragraphs.
DSpace and Fedora
These two software solutions are maintained within the DuraSpace organisation that has been initiated to provide leadership and innovation in open source and cloud-based technologies primarily for libraries, universities, research centers, and cultural heritage organizations [source].
According to the information on DuraSpace web page (technology section), DSpace is an out-of-the-box open source repository application for delivering digital content to end-users. DSpace with over 750 digital repositories deployments is the most widely used open source repository software for institutional repositories and open access repositories. DSpace stores any type of content and offers built-in workflows for content submission and review. Organizations can easily make their digital collections available on the Web using DSpace's customizable end user interfaces along with many community-developed features and utilities. On the other hand, Fedora is a robust, modular repository system for the management and dissemination of digital content. It is especially suited for digital libraries and archives, both for access and preservation. It is also used to provide specialized access to very large and complex digital collections of historic and cultural materials as well as scientific data. It also provides the ability to express rich sets of relationships among digital resources and to query the repository using the semantic web's SPARQL query language. Fedora is a set of components that provide means for building a digital library rather than a software system that is ready to install and use [source] .
Eprints
The Eprints software is developed and maintained by the University of Southampton in the UK, publicly available under a GPL license. It is described by the creators as the first professional software platform for building high quality OAI-compliant repositories. It is underlined that the major aim of the Eprints platform is to be the easiest and fastest way to set up repositories of open access research literature, scientific data, theses, reports and multimedia. The software is mainly focused on the open access repositories and develops its new features in this direction [source].
Greenstone
Greenstone is developed and maintained by the New Zealand Digital Library Project at the University of Waikato in cooperation with UNESCO and the Human Info NGO. From the Greenstone main page we may read that it is a suite of software for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet or on CD-ROM. The software is open-source distributed under the terms of the GNU General Public License. The aim of the software is to give an easy and inexpensive way of creating and building digital libraries in various countries of the world, including the developing countries. This is why it was also intended to run on a CD-ROM with no Internet connection available [source].
Interoperability of digital libraries
Introduction
Interoperability is a very important element of the worldwide infrastructure and it is not only constrained to the digital libraries environment. Paul Miller in the article published on Ariadne issue 24 defines interoperability as follows:
to be interoperable, one should actively be engaged in the ongoing process of ensuring that the systems, procedures and culture of an organisation are managed in such a way as to maximise opportunities for exchange and re-use of information, whether internally or externally
There are also several aspects of interoperability identified by the UK Interoperability Focus. These include:
- Technical interoperability related to communication between systems, data representation or storage standards. All of this is usually based on protocols, standards, formats, etc.
- Semantic interoperability which adds a meaning to the data that are transferred between systems. The aim of this interoperability area is to determine which of the different technical terms relate to the same concept. For instance, the same concept is described by terms such as “Author”, “Creator” or “Composer” and the task of semantic interoperability is to detect and relate those different terms.
- Political/Human interoperability which relates to the process of deciding which resources of the organisation are to be available for others.
- Inter-community interoperability related to the inter-domain and inter-disciplines interoperability.
- International interoperability defines the interoperability issue on the international level with respect to all above aspects of interoperability.
An example of the interoperability in action in the world of digital libraries is the Europeana portal which aggregates metadata from a number of diverse software systems and, thanks to that, enables its users to browse and search European cultural heritage resources available in different countries across Europe.
Interoperability in digital libraries
Digital libraries are also in need of interoperability. The importance of this issue can be confirmed by various initiatives, including DL.org, Europeana and Digital Preservation Europe. An interesting insight into the interoperability of digital libraries is given in a briefing paper from DigitalPreservationEurope by Stefan Gradmann. There are many standards, formats and protocols letting digital libraries to be interoperable. A short introduction of those important ones, from the digital libraries' perspective, is given below.
Metadata standards
Metadata standards are used to have a common understanding of the meaning of the data that are described by metadata. Metadata are nothing else but information about data, often defined as “data about data”. Popular metadata standards include:
- EAD – Encoded Archival Description, an XML-based format maintained by the Library of Congress, used to describe finding aids.
- MARC – MAchine Readable Cataloging for representing bibliographic information.
- TEI – Text Encoding Initiative for representing texts in the digital form.
- Dublin Core – a general purpose metadata standard focused on Internet resources. It is probably the most known standard.
Conceptual models go one level up in comparison with the metadata standards. They try to introduce semantic interoperability for the digital resources. In the digital libraries area there are two important conceptual models: the FRBR and the CIDOC Conceptual Reference Model. There is also another model trying to harmonise these two, called FRBR-CRM or FRBRoo.
FRBR is a conceptual entity-relationship model maintained and developed by the IFLA, intended to be independent of any cataloguing implementation. FRBR assumes three groups of entities. The first one includes items, manifestations, expressions and works. The second one includes persons and corporate bodies. The third one includes concepts, objects, events and places. These entities are connected by so called relationships, on top of which users tasks are defined [source].
CIDOC CRM, developed by the ICOM/CIDOC Documentation Standards Group, provides definitions and a formal structure for describing implicit and explicit concepts and relationships used in cultural heritage documentation [source]. In 2006 it was established as an ISO 21127:2006 standard.
While FRBR originated from the library community, the CIDOC CRM originated from the museum community. The idea to harmonise these two approaches appeared in 2000. As a result, a common formal model called FRBRoo has been created. In general, its aim is to make it possible to express information coming from various cultural heritage institutions, including libraries and museums, by means of the same notions.
Communication protocols and standards
Communication protocols and standards enable software systems to exchange data. Those related to digital libraries (but not only), include OAI-PMH, RDF, ATOM, Microformats and OAI-ORE.
Probably the most known communication protocol in the world of digital libraries is the OAI-PMH (Open Archives Initiative – Protocol for Metadata Harvesting). According to the standard it provides an application-independent interoperability framework based on metadata harvesting [source]. There are two basic concepts in the scope of the OAI-PMH:
- Data provider which uses the repository to expose metadata to service providers.
- Service provider which uses the harvester to collect metadata from selected data providers.
Data providers and service providers communicate with each other to exchange metadata of the digital objects (items) stored on the data provider's side. Technically the OAI-PMH standard is very simple and straightforward in use, yet powerful and flexible enough to be applied in various conditions. The simplicity and flexibility at the time caused its common application and utilisation. What OAI-PMH enables is the possibility for information systems to share metadata about stored data, and gather metadata about the data stored in other systems. This is one of the basic capabilities in terms of interoperability of systems [source].
RDF, ATOM and Microformats are means for semantic metadata and data representation in a uniform and simple way. These XML-based complementary standards aim to represent metadata and data, so that these are easily understandable and accessible by the external services, not only on the technical level, but also on a higher level of semantic. RDF is usually used for data modelling, ATOM for data syndication, and Microformats for an easy information exposure.
OAI-ORE is quite a recent development of the Open Archive Initiative. It is defined as a standard for the description and exchange of aggregations of Web resources. The main idea behind OAI-ORE is to prepare a simple way for exchanging information about compound objects. The exchange is done by means of so called aggregations and resource maps. An example of a usage standard for OAI-ORE would be the exchange of information about the structure of the digital objects stored in a digital library [source].
Searching digital libraries
One of the core functionalities in the digital library is search capability. When considering a single digital library, there is a range of efficient technical solutions for the search function and there are no significant problems in this area. It gets complicated when several digital libraries need to be searched in one single access point. It is quite a popular situation where, for instance, libraries across a country/region maintain their own digital libraries. With time these institutions want to create a national single access point for cultural heritage of this particular nation. The problem that arises is how to make it possible for a user to search all of these libraries using one single access point. There are two possible approaches:
- Distributed search where the search engine simultaneously performs a number of searches in all single search engines, then gathers results and presents them to the user. The consequence of such an approach is that the user has to wait until all of the searches are completed. If one of the single search engines is very slow, the response time will also be slow, despite the fact that other engines give their responses a lot faster.
- Aggregation is based on periodical gathering of the data from particular repositories of data and indexing them in the aggregator’s search engine. This solution gives search results in time independent of the load of search engines of particular repositories. The consequence here is that the search engine of the aggregator does not always have the updated data, because the data are updated periodically, e.g. daily.
Since the cultural heritage repositories do not require frequent updating of data, it was commonly accepted to use the aggregation approach for cross-digital libraries searching. The prominent examples of such search engines are:
- OAISter which gives access to freely available, digital resources stored in digital libraries, institutional repositories, and online journals. Communication between OAISter and content repository is performed by means of the OAI-PMH protocol.
- Europeana is one of the flag projects of the European Union, which aims at worldwide popularization of the cultural and science heritage available in the digital form that comes from various European countries. Multilingualism, Web 2.0 capabilities and timeline are just a few of the great features of Europeana. Communication between Europeana and digital libraries is based on OAI-PMH.
- PIONIER Network Digital Libraries Federation (DLF), developed and maintained by Poznań Supercomputing and Networking Center, is an example of the national metadata aggregator for Polish digital resources coming from various cultural institutions. Communication is also based on the OAI-PMH protocol. The Digital Libraries Federation acts not only as a metadata aggregator and search engine for Polish digital resources, but also as a repository for Europeana which harvests information from DLF at a national level.
Beside aggregators and metasearch engines, there is also a set of Internet portals which maintain registry of OAI-PMH repositories, and provide means both for adding new OAI-PMH enabled repositories and finding those already registered. In the list of most popular registries one could find:
Figure 3: Usage of Open Access Repository Software – Europe. The-up-to-date version of the chart can be found here.
- OpenDOAR– The Directory of Open Access Repositories - was funded by the Open Society Institute (OSI), the Joint Information Systems Committee (JISC), the Consortium of Research Libraries (CURL) and SPARCEurope. It maintains the list of open access repositories with a variety of options for finding criteria. An interesting function of the directory are statistical charts that can also be embedded into an external web page. For example, the usage of repository software in Europe is presented in Figure 3.
- ROAR – Registry of Open Access Repositories, part of the EPrints.org network, is hosted at the University of Southampton, UK and is funded by the Joint Information Systems Committee (JISC). ROAR provides a simple and clear way of searching and browsing through open access repositories, including country, type and software browsing.
- The University of Illinois OAI-PMH Data Provider Registry has many interesting features, such as RDF Site Summary (latest changes in the registry), extensive statistics and information related to each repository, list of other known lists of OAI-PMH repositories, or a graphical map of all repositories and relations between them.
Data discovery is a crucial element of the digital libraries environment. There are several methods for content discovery. The most straightforward is searching in a concrete digital library (repository). But when the number of digital libraries grows, there is a need for metasearch engines or metadata aggregators that enable users to find digital objects coming from various digital libraries using a single search point. Beside these possibilities, there are also registries of OAI-PMH repositories which give access to the searching and browsing interface for repositories discovery.
Examples of popular digital libraries
There is a tremendous number of digital libraries all over the world, several of which are worth mentioning because of the historical role, geographical scope or authority of the hosting institution. Below there is a short description of interesting initiatives related to creation of digital libraries.
Project Gutenberg (http://www.gutenberg.org/)
It is commonly presupposed that it was the first digital library ever created. Project Gutenberg was started by Michael Hart in 1971 with the digitization of the United States Declaration of Independence. In the mission statement written by Michael Hart it is stated that the mission is to encourage the creation and distribution of eBooks. The portal is powered totally by volunteers and gives a lot of freedom to them. In general, everyone is welcome to contribute to Project Gutenberg. Currently Project Gutenberg gives access to over 32 000 free ebooks.
World Digital Library (http://www.wdl.org/)
The World Digital Library (WDL) is an initiative of UNESCO and the Library of Congress, launched in April 2009. According to the mission statement of the WDL, “The World Digital Library makes available on the Internet, free of charge and in multilingual format, significant primary materials from countries and cultures around the world”. Objectives for WDL include promotion of the digital resources, expanding variety of digital resources available on the Internet, equipping researchers, educators and other interested entities with heterogeneous digital resources and knowledge exchange between participating institutions. The total number of digital resources is currently near 1300, which includes the most prominent digital resources from all over the world. The WDL has a very interesting approach for browsing resources. Even the main page depicts a map of the world with a possibility to instantly browse resources from a particular part of the world.
Europeana (http://www.europeana.eu/)
Europeana was initiated by the European Commission, and is constantly supported by EC in various fields of activities. It is a Thematic Network funded by the European Commission under the eContentplus programme, as part of the i2010 policy. Europeana gives access to European cultural heritage, including texts, images, videos, audio recordings. The tremendous number of digital objects accessible through Europeana is dynamically increasing. In the middle of 2010, it reached 6 million digital objects. Digital objects are complemented by a numer of interactive and innovative features, such as My Europeana or ThoughtLab [source].
Internet Archive (http://www.archive.org/)
The Internet Archive was founded by Brewster Kahle in 1996 and it is described on the official web page as follows: “The Internet Archive is a non-profit organization that was founded to build an Internet library. Its purposes include offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format.” An interesting feature of this digital library is the WayBack machine (digital library of the Internet websites) which is able to present to the user a specific web page as it was appearing in the past, e.g. 10 years ago. Currently there are over 150 billion web pages, and around 3 million other resources [source].
Google Books (http://books.google.com/)
Google Books is a service from commercial company Google, which makes it possible to search full texts of books that were scanned, processed and indexed for searching purposes by Google. Google Books provides free access to all of the resources that are in the Public Domain, and gives means for locating or buying those which are still licensed. Currently, Google Books has over 7 million books searchable in the service. Recently it has been announced by Google that a new book store will be launched soon. The Google Editions Book Store will complement the Book Search service by providing users with online access to all books that reside in their bookshelves. The idea is that a user can buy a book, and then have online access to it whenever it is needed – according to the Google Editions Book Store tagline – “buy anywhere, read anywhere” [source].