In the past months the Virtual Library team has been exploring the options for harvesting of document repositories. The purpose of these efforts is to turn availability of publications into accessibility.
Institutes and system-wide programs have been maintaining publication lists or databases, provide links to documents that are available electronically, and they have often made efforts to make the repositories (“OAI-PMH”) harvestable. We felt that now is the time to be more proactive, and take action to make the publications more accessible by bringing this information to where the potential users are. The first step is to
bring the information from these different databases together, and then create value-added services.
As a first step we have experimented with methods of harvesting the different publication databases into a central one, and we have successfully used two different systems (PKP Harvester and DLESE). An
experimental preview of the PKP system is available at http://cgiar.perpustakaan.net/ (but be aware that this is a test environment where things may appear and disappear). We have also investigated methods to make the different repositories harvestable; some databases are using software that comes with native OAI-PMH harvestability (Dspace, NewGenLib), and we have looked at (and found two) methods to make use of the Inmagic databases that many institutes use. For two other database engines (SQL Server, Aigaion) we still have to find solutions.
As we said this is only a first step. To improve accessibility, we have to think about the value-added services that we can create (and which would be much more difficult to create for the individual publication
databases). We are thinking of methods to get the information in important search engines, for example by providing Google sitemap files, or by promoting harvesting by important scientific databases, like
Scirus / Scopus, CABI etc. We are also thinking of providing topical (RSS) news feeds, canned searches etc. The system may also be able to serve reporting needs within the CG system. Some services will require
further data harmonization across the different source databases (e.g. a common scheme for document types. Unique author identification might be quite a challenge. The information managers are
working at a document to take stock of where we are now, and where we want to go.
An idea we should explore for value-added services is how to link between CG publications and documents (and whatever harvested) and the projects they originate from. In CGMap, we’re carrying out a similar effort, that is to collect project information in the Medium Term Plans in a central location for querying and analysis. As projects move into implementation, aggregating publications/documents/IPGs produced as part of the projects would enable 1. a more updated view of the project status, 2. reporting in the context of project implementation.
Interesting point. Projects would have to use unique identifiers, and we should see if it is feasible to assign those codes to outputs (documents and their likes) In fact at Wageningen University (sorry that I come back with that experience, it’s what I have got) the project system data collection channel is also the data collection channel for publications, so people do not have to enter things on two different systems.
Will contact you directly to be enlightened about the CGMap system.