Child pages
  • OER World Map phase I - Draft proposal
Skip to end of metadata
Go to start of metadata

The proposal was submitted on 2014-01-10. The submitted version (pdf) is attached.

This work is licensed under a Creative Commons license: CC BY 4.0.

The hbz

The North Rhine-Westphalian Library Service Center (hbz) is a central service and development organization for university libraries in Northrhine-Westphalia. It was founded in 1973. At the center of hbz's services are the provision and management of a union catalog for university libraries and other libraries. Thus, you will find a lot of expertise in data aggregation, data normalization and provision of discovery interfaces within the hbz.

The hbz is active in the area of Open Access (OA) since 2002, beginning with hosting of institutional repositories and since 2004 with providing an OA journal platform Digital Peer Publishing (DiPP). For more than four years, the hbz is actively promoting web standards and the open licensing of data published through libraries. The hbz was one of the library organizations worldwide pioneering with opening up library data. As Peter Suber put it in his Open Access Newsletter for 2010:

[In 2010] libraries around the world began lifting restrictions and putting their bibliographic data into the public domain, usually under CC0. The movement seems to have started in Germany, with six libraries in Cologne plus the Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen...

<style type="text/css"> .FootnoteMarker, .FootnoteNum a { background: transparent url(/s/en_US/7901/4635873c8e185dc5df37b4e2487dfbef570b5e2c/1.1.6/_/download/resources/com.adaptavist.confluence.footnotes:footnote/resources/footnote.png) no-repeat top right; padding: 1px 2px 0px 1px; border-left: 1px solid #8898B8; border-bottom: 1px solid #6B7C9B; margin: 1px; text-decoration: none; } </style> <link rel="stylesheet" type="text/css" href="/s/en_US/7901/4635873c8e185dc5df37b4e2487dfbef570b5e2c/1.1.6/_/download/resources/com.adaptavist.confluence.footnotes:footnote/resources/footnotes.css"> <script type="text/javascript" language="JavaScript" src="/s/en_US/7901/4635873c8e185dc5df37b4e2487dfbef570b5e2c/1.1.6/_/download/resources/com.adaptavist.confluence.footnotes:footnote/resources/footnotes.js"></script> 1

Connected with the publication of open data, hbz is working in making open data easily discoverable, accessible and reusable on the web. Thus, starting in 2009, we're promoting the use of web standards in the library world and especially linked open data (LOD) technologies. Together with the ZBW - German National Library of Economics / Leibniz Information Centre for Economics, hbz established the SWIB (Semantic Web in Libraries) conference which developed from a German event to being "the premiere conference for semantic web technologies in libraries" worldwide.

2

hbz has now several years of experience with linked data and developing open source software for LOD applications, especially in the context of hbz's linked open data service lobid.

Today, Germany lags behind regarding the OER development, as a significant number of OER activities only have been emerging during the last two years. The hbz participated early on in the German OER activities, since 2012 the hbz is exploring the possibilities of open educational resources and tries to contribute its experience with metadata, web standards and technology. It has become one of the leading institutions from the German library world that are involved in the OER movement.

This proposal is submitted in partnership with Felix Ostrowski, graphthinking. This is the first consultancy hbz and Felix are offering in the international area. Over the last years, the consultants have made a lot of experience in collaboration on an international scale. For example, hbz members have been participating in working groups within the World Wide Web Consortium (W3C), the Open Knowledge Foundation, the Eclipse Foundation, UNESCO and of course in library-centric contexts. The consultants are generally setting their concrete work in an international environment and are generally using English as the default language for communication and publication on the web (see for example the blog of the lobid team or the issues on github). 

The consultants

Pascal Christoph (Data Management)

Pascal (@dr0ide on twitter) studied computational linguistics at the University of Cologne. He has been working for many years as linux system administrator and software developer with the focus on data storing and retrieval. Since 2008 he is employed at the North Rhine-Westphalian Library Service Center (hbz) as developer and administrator of Open Source software focusing on search engines and Linked Open Data. He is very experienced with the use of the search engine FAST ESP and nowadays of the NoSQL database and search engine elasticsearch. Pascal knows the SPARQL database 4store and RDBMS like MySQL and Postgres well. He is an active contributor to the metafacture software, an ETL (Extract, Transform and Load) tool written in Java, optimized for fast transformations of complex data from various formats.

Jan Neumann (Project Management)

Jan (@trugwaldsaenger on twitter) is working as head of legal affairs and organisation at the hbz. He studied law, economy and systems thinking and has more than 10 years of experience within (international) project management for different publishing houses and libraries. He attended the 2012 OER Worldcongress, is member of the educational advisory board of the German UNESCO chapter concentrating on OER and member of the POERUP advisory board. 2013 he joined the review team of the first major German OER conference OERde 13. Jan blogs about OER on oersys.org.

Adrian Pohl (Metadata & RDF Vocabularies)

Adrian (@acka47 on twitter) has been working at the hbz since 2008. He is the metadata and RDF vocabulary expert for the development of lobid and responsible for the project management. He has been actively promoting open knowledge, web standards and the use of open source software in the international library community, e.g. by coordinating the Open Knowledge Foundation's Working Group on Open Bibliographic Data since June 2010 or recently by initiating the development of the Libraries Empowerment Manifesto. Adrian has been on the organizing team for the SWIB conference since 2011. Adrian recently initiated the establishment of a German-speaking working group for OER metadata within the Competence Center for Interoperable Metadata (KIM) of the German Initiative for Network Information (DINI) and he took over the moderator role for this group. The working group deals with professional communication in German-speaking countries about metadata for OER, covering metadata standards, data conversion tools, data publication on the web etc.

Fabian Steeg (API Development)

Fabian (@fsteeg on twitter) is a software developer with 10 years experience in building information systems, API design, and open source software. While studying information processing, linguistics, and geography, he developed E-learning and text mining tools at the University of Cologne and worked on a web-based information system at DIMDI, the German Institute for Medical Documentation and Information. After completing his Masters degree in 2008, Fabian developed parts of the digital preservation services, the service registry, and API of Planets, an EU project led by the British Library. He later developed collaborative text digitization tools in a project with Swiss partners. After a detour into the startup world, where he helped to launch the backend of doo.net, a cloud platform for digital document management, he joined the LOD group at hbz in 2012 to work towards an open library infrastructure. He implemented the Hadoop-based data processing and the Play-based Web API for Lobid.org. Fabian is an active open source contributor, Google Summer of Code participant, and Eclipse committer.

Felix Ostrowski (Frontend Development)

Felix (@literarymachine on twitter) acts as hbz's partner in this project. He is a web engineer, linked data technologist and knowledge management consultant who has been on the web since the mid-'90s. After graduating in communication studies and computer science he worked as a software developer and repository manager at hbz from 2008 - 2010, helping to build OPUS 4. He was also a driving force behind the institution's Linked Open Data (LOD) strategy, creating the initial data models and implementing a Linked Data based frontend for library data for the first version of lobid.org. With a map of library institutions in Germany and a visualization of the hbz data as part of the LOD cloud he also created visual representations of the data. In 2010 he moved on to work as a research assistant in a joint project on digital long term presevation of the The Berlin School of Library and Information Science, Stanford University Libraries and the German National Library. Finally, he founded graphthinking in 2013. One of the projects he has been working on since is Edoweb, a repository system for which he is building a Drupal-based frontend.

The OER World Map consultation & the hbz: a perfect match

For hbz, the Hewlett Foundation's request for participation for the OER World Map comes at the exact right time. For some months we have been planning a project to be carried out this year to develop a search engine for open educational resources in German. We identified the following aspects to be crucial for a successful project:

  • Promoting the use of open licenses not only for OER but for the accompanying metadata to legally enable its aggregation and re-use.
  • Promoting the use of open web standards (especially linked open data) and the adoption of the LRMI (Learning Resource Metadata Initiative) metadata schema for describing OER.

We see these aspects as the basis for optimally aggregating OER throughout the web and building an OER search engine. The Hewlett Foundation's request for participation emphasizes the need for a data set of OER projects, institutions and repositories that could act as an ideal starting point for crawling the different OER project sites to create a OER search index. Another thing the OER World Map adds to our thinking is the focus on user experience. As we put data, technology and best practices for metadata provision in the foreground, the OER world map starts from the perspective of the user who wants to interact with the data in the registry through a map interface and by browsing and searching agents and institutions in the OER field. We think that our plans and our approach and the Hewlett Foundation's goals and view point blend together well.

While we are able to provide a performant, flexible and scalable infrastructure for the OER World Map, we are aware that the overall success of the of the OER World Map not only depends on the technology but even more on other aspects like finding appropriate incentives for engaging the community.

Basic development principles

In general, we believe in these five principles for providing access to data on the web:

  1. Publish data according to the open definition. (This includes providing a full dump of the data with incremental updates.)
  2. Develop services as Free/Open Source Software.
  3. Use open web standards for the publication of data on the web, especially acknowledge the best practices of the linked open data community.
  4. Provide an open web API (see below) for web developers to easily interact with the data.
  5. Provide a simple and intuitive interface to the data for people to easily and intuitively explore and interact with the data.

The hbz's principles for publishing data on the web resonate well with Hewlett Foundation's criteria for openness. To consider them worthy of funding, Hewlett foundation expects from projects to meet the following criteria: "free and legal to use, revise, remix, and redistribute", "formats that are usable, sharable, revisable, and remixable with free and open source software", "offer accessibility to a diverse body of users", "follow standards developed by the OER movement to enhance discoverability, interoperability, quality, and accessibility". These criteria reflect the thinking we developed and promoted during the last five years.

A flexible and scalable approach

Benefits of linked open data: Web integration and flexibility of data model

Our prototype will serve the OER world map data as 5-star linked open data

3

. For some years we have been promoting publishing data as linked data. Using HTTP-URIs as identifiers and linking resources together will make the OER world map - as any other data set - part of the growing web of data.

4

By now, even search engines consume linked data as soon as it is provided using the schema.org vocabulary. While the OER world map data uses carefully crafted terms from vocabularies mentioned below, it will also be matched to the less precise schema.org terms and be embedded in the HTML (using RDFa or JSON-LD) to be harvested by search engines like Bing and Google. This enables better results looking for OER agents and activities in the search engine of their choice.

One great advantage of linked data and RDF is the flexibility and scalability of the data model. New data elements can easily be added as needed and the data can be published and interlinked between different web sites at web-scale. Also, there already exists a wide range of established RDF vocabularies that can be used to store and expose the OER map data.

It isn't completely defined yet what kind of resources the OER map shall cover. Mentioned in the request for proposals are projects and initiatives as well as people (OER experts). The report summarizing the international conversation that led to this consultation process also mentions journals, repositories and services.

5

Of course the information on projects will also contain information on institutions the project is run by. There are some more resources that would be valuable to also have information about, like events (congresses, workshops, hackdays etc.) as well as publications and (software) tools in the OER world or job offers.

In this context, we take a deeper look at the connections of four different kinds of resources it would be useful to collect information on: organizations, projects, persons and services (including repositories). These kinds of resources are linked to each other in various ways, e. g. organizations run projects that are carried out by persons who use and produce services. The image shows these resources and some of their connections.

The great benefit of the linked data approach is that an essential part of describing a thing is to state the links it has to other things - whether these things are described on the same web site or in a different context. Thus, linked data is the best fit for describing and interlinking different kind of things that are relevant to map the development of the OER community. Returning to RDF vocabularies, one can say that for a lot of these different kind of resources RDF vocabularies already exist, most of them widely used, e.g. there are FOAF , Organization Ontology and the Academic Institution Internal Structure Ontology (AIISO) to describe persons and institutions.

In developing the prototype, we would focus on the three kind of resources that are coloured in orange and would for the time being not include services or other things. In effect, the prototype would enable the registration of an organization, a person or a project as well as the interlinking of these resources. The nature of linked data and the RDF data model would make it easy enough, though, to easily add other types of resources that are considered valuable enough to become part of the OER map data corpus.

Benefits of an API-based approach

As already noted, we are not only providing linked open data but follow an API-driven approach. That means we will first focus on building an HTTP-based application programming interface (API) for querying and adding to data behind the OER World Map. Based on this API, we will provide an interface for humans which enables interacting with the data and adding new or editing existing entries. This section explains why we think this is the best approach for the OER World Map as for other data-centric websites.

The API is a technology-independent layer for applications to interact with the underlying data. The first two applications built on top of the API would be a web form that lets people edit the data as well as the OER world map to explore the data. The following image gives an impression of an API-driven approach that de-couples the machine interface from the data and the user front end.

A web API has lots of benefits, one of them being the modular approach: data management, data presentation and data manipulation are clearly seperated modules that can be adapted and replaced independently without directly affecting the others. This means, that there is no dependency on any technology as the API-layer abstracts from the concrete database technology used. In effect, if one wants to switch to another technology at one point, API-users would at best not even notice this change.

Another benefit of this approach is that it provides maximum re-usability for third-party users. Providing openly licensed data in a standard format like JSON-LD via a well-documented and stable API enables web developers to easily build applications on top of the data without having to set up the whole infrastructure locally. Nonetheless, an API won't compensate the need for providing the complete data of a service in bulk in case somebody wants to work with the whole data locally.

Technology stack

For the prototype, we will use and adapt the free software we developed and published openly on github to provide a linked open data API for the OER World Map data. On top of the API, we will build a front end using the free content management software Drupal, to search, browse and visualize the data and to provide a web form for adding data to the OER world map. The following image gives an overview over the technology stack.

In this context, we will not go through the whole data flow or describe the interplay of the individual software components in detail. A recent presentation at the SWIB conference gives some insights into the rationals behind the API approach, the software and hardware used and its operations.

6

Nonetheless, here is a brief explanation of the software components.

  • Metafacture is a versatile tool for extraction, transformation and loading of (meta)data developed by the German National Library with help from a growing community. Find the code and issues on github at https://github.com/culturegraph/metafacture-core/.
  • Hadoop is is a distributed programming model (MapReduce) and file system (HDFS). It's an open source project hosted at the Apache Software Foundation and has established itself as the industry standard for distributed computing. While not required for the amount of data used in the prototype, Hadoop is part of our existing infrastructure at hbz and therefore allows us to implement a scalable approach without additional effort.
  • Elasticsearch (ES) is a search engine and NoSQL data store based on Lucene. While ES is easy to administer it has rich and complex functionalities like high availability, near linear scaling, near real time search, built in geo location support, APIs for many languages and a REST API, lots of plugins and an fast growing user base.
  • The Play Framework is a framework for developing HTTP-centric web applications and APIs. It combines the rapid development process of frameworks like PHP or Ruby on Rails with the performance and security of Java for server-side development.
  • Drupal is a widespread, extendable open source content and data management system (CMS) with a large and vibrant community. In contrast to wordpress, it has an extremely flexible data model and a storage layer that makes it easy to integrate into the proposed decoupled data management architecture. The frontend for displaying and editing the OER worldmap data will be created as a Drupal plugin that communicates with the backend (API). The suitability of this approach has already been proven by the Edoweb-project, which has a similar architecture, albeit a different data model.
  • Leaflet.js is a javascript library used to create interactive maps.
  • Open Street Map (OSM) is a community project that provides an open geographic information system. It's Nominatim API provides the means to find coordinates based on street addresses. OSM also provides map tiles that can be used to display maps for end users.

The selection of the software components ensures that the system will scale with the growth and maturing of the data set. Using stream-based data processing and cluster computing, Metafacture, Hadoop and Elasticsearch enable scalability. Along with the RDF data model we get a system that is adaptable to the maximum for future challenges.

Potential data sources

Exploring data sets that could be used to initially populate the data store of the prototype we discovered the following:

The first work package of the work plan deals with identifying from the listed data sources the initial data set that we will choose to populate the data store. As the Serendipity data is openly available and covers a lot of information we tend to choose this data set as the initial set for our prototype. Provided that we have the necessary resources left, we will also add one or more other data sets to show the potential of the prototype to aggregate data from different sources.

Supported use cases

Here are some possible use cases that would be covered with the described prototype:

  • Creating/editing descriptions of organizations, persons and projects.
  • Easy interlinking of projects, people, organizations through auto-complete functions for linking data.
  • Browsing the descriptions of different resources.
  • Querying the data and get the results in different formats (JSON, XML etc.).
  • Get a list of OER initiatives by country or language and load it into a spreadsheet for post-processing.
  • Embedding the interactive OER world map on other web sites via JavaScript.
  • Various possibilities to customize the editorial workflow for adding and editing descriptions (being part of the Drupal CMS software).
  • Build upon the API additional ways of visualizing the data, e.g. with d3.js.
  • Pull in regular updates from different data sources.

Project management

Work plan & timeline

We worked out the following project plan in order to reach our goal of building an OER world map prototype.

Work package

Time frame

Effort/Days

Assigned to

Comments

Identification of data sources

03-04 February

2

Adrian Pohl

Selection of initial data source.

Application profile development (AP)

05-10 February

4

Adrian Pohl

Identification of core data elements & development of RDF application profile

Data transformation & indexing

11-18 February

6

Pascal Christoph

Transform initial data according to AP; technology used: Metafacture, Elasticsearch

API development

19 February - 18 March

20

Fabian Steeg

Build read/write web API; technology used: Play framework, Elasticsearch

Web forms development

5-18 March

10

Felix Ostrowski

Technology used: drupal, reading and writing data from / to the API

Map interface development

19 March - 01 April

10

Felix Ostrowski

Based on Open Street Map and data delivered by the API

Creating project report

2-15 April

10

Jan Neumann

 

Project management and communication overhead

02 February - 15 April

18

Jan Neumann/All

We add an additional 30% for project management and communication overhead

Sum

02 February - 15 April

80 


The project timeline is visualized in the following Gantt diagram.

Budget

As the workplan indicates, a total of 80 person-days is needed to develop the prototype. The daily rate is assumed to be $500.

The hbz expects that it can gain valuable experiences from the development of the OER world map which can be applied in future own projects and products. It is thus willing to subsidize the 60 daily rates for its employees by 50% ($250), which results to $15.000 being left to be funded. The daily rate for the 20 person-days of the external collaborator is $500 (including 19% VAT), adding up to $10.000 and totalling the expenses to be covered at $25.000.

Hard- and software costs are not incurred as we will set up the prototype on open source software running on existing hardware.

Potential partners

The hbz is associated partner within the European LinkedUp – Linking Web Data for Education project. We will seek to communicate our progress and receive feedback from the LinkedUp network and may as well collaborate with LinkedUp partners.

Endnotes

Footnotes
Ref Notes
1 Suber, Peter (2011): SPARC Open Access Newsletter, issue #153: Open Access in 2010. URL: http://legacy.earlham.edu/~peters/fos/newsletter/01-02-11.htm#2010
2 Salo, Dorothea (2013): Linked Data in the Creases. URL: http://lj.libraryjournal.com/2013/12/opinion/peer-to-peer-review/linked-data-in-the-creases-peer-to-peer-review/
3 See Tim Berners-Lee: Linked Data Design Issues, URL: http://www.w3.org/DesignIssues/LinkedData.html and the very nice overview of the 5 star linked open data scheme at http://5stardata.info/.
4 Just recently, the W3C announced the W3C Data Activity - Building the Web of Data to further promote the growth of the web of data.
5 D'Antoni, S. (2013): A world map of Open Educational Resources initiatives: Can the global OER community design and build it together? Summary report of an international conversation: 12–30 November 2012, p. 6. URL: https://oerknowledgecloud.org/sites/oerknowledgecloud.org/files/OER%20mapping%20discussion%20Summary%20Report%20Final.pdf
6 Pascal Christoph and Fabian Steeg (2013): From strings to things: A Linked Open Data API for library hackers and web developers. Presentation slides: https://speakerdeck.com/lobid/from-strings-to-things-a-linked-open-data-api-for-library-hackers-and-web-developers. AV: http://www.scivee.tv/node/61578
  • No labels

10 Comments

    • Promoting the use of open licenses not only for OER but for the accompanying metadata to legally enable its aggregation and re-use.
    • Promoting the use of open web standards (especially linked open data) and the adoption of the LRMI (Learning Resource Metadata Initiative) metadata schema for describing OER.

    Gehört LRMI nicht eher zu Punkt 2, d.h. zu den metadaten, als zu den Webstandards?

  1. Das sieht schon echt super aus - Respekt! Wie sieht es mit dem Datenmodell aus? Müssen wir irgendwelche Aussagen dazu treffen, welche Metadaten wir erheben wollen und zu welchen Zweck?

    1. In diesem Zusammenhang auch: Welches Daten werden auf der Karte wie visuell angezeigt, bzw umgesetzt? Die Ausschreibung kommt ja aus einer Endnutzerperspektive und ich kann nicht genau einschätzen, ob wir hier Erwartungen enttäuschen wenn man diesen Punkt eigentlich ausspart...

  2. Ein Use-Case wäre vielleicht auch die Unterstützung von systematischen Wissensdatenbanken über die OER-Bewegung, wenn man die Projekte mit OER-Literatur verknüpft die sich auf sie beziehen ("OER-Bibliographie").

  3. Mir fehlt hier noch der Redaktionsworkflow. Wie arbeitet die Community mit dem Tool? Wie soll dieses Formular aussehen? Wer kann was wie eingeben? Wie sieht schematisch der Informationsfluss bei der Erfassung/Änderung von Projekten aus? Mit was für Problemen muss eventuell gerechnet werden? Wie kann eine Qualitätskontrolle erfolgen? Brauchen wir ein Userverwaltung für das System? Was für eine Rechtsteuerung brauchen wir innerhalb der Registry? Kann jeder alle Daten überschreiben, oder ist der Autor "Owner" seiner Eingabe? Wir müssen sicherlich hier keine Antworten auf Detailfragen geben, aber vielleicht wäre es gut, wenn wir das Basismodell einmal skizzieren würden und darauf hinweisen, dass hier im Hauptprojekt Antworten auf wichtige Fragen gefunden werden müssen?

    1. Ja, den Workflow sollte man zumindest ansprechen. Ruhig auch mit dem Hinweis, dass aus technischer Sicht der Einsatz von Drupal ein flexibles Rechtemanagement mit sich bringt.

  4. Vielleicht sollten wir alle Links auf wichtige externe Inhalte als Fußnoten aufnehmen. Wenn so etwas wie ein Literaturverzeichnis entsteht, dann sollte m.M. nach da z.B. der Verweis auf das 5Star-Model nicht fehlen.

    1. Habe mir heute die Liste von Mitbewerbern mal angeschaut, wobei mir insbesondere http://www.nuams.com/ ins Auge gestochen ist, da sie offensichtlich sehr Drupal-affin sind. Es wäre ggf. sinnvoll, unsere Orientierung am 5-Star-Model, d.h. u.a. unsere Datenhaltung in RDF in der API, nochmal deutlicher hervorzuheben. Neben Drupal erlaubt unsere Architektur potentiell beliebige andere Frontends.

      1. Wobei das insbesondere durch die erste Grafik auch evtl. auch schon deutlich genug ist.

  5. Wir sollten irgendwo auch klar sagen: "Unsere Technologien sind schön und gut aber letztlich hängt der Erfolg und die Vollständigkeit einer solchen Datenbank/Karte hauptsächlich von ganz anderen Dingen ab." Da kommen dann Fragen nach Anreizsystemen etc. auf. M. E. hat das ja z.B. bei der LOD-Cloud ganz gut geklappt, dass Leute ihre Daten ergänzt und aktuell gehalten haben. Das wäre evtl. ein Beispiel.

    Es gibt ja bereits mindestens drei Services, die so etwas probiert haben: