Blog

Blog from May, 2013

We have recently updated the organisations data in lobid.org using the Culturegraph Metafacture software (see the morph-mapping) and - in order to represent even more data from the German ISIL registry - published some new controlled vocabularies in RDF. To be more appealing for re-users we chose to switch to a sustainable URI namespace at purl.org for the lobid vocabularies that are maintained at github.

The changes

So what did we actually add to the organisation descriptions?

  • Added information about the organisation type (using the libtype vocabulary).
  • Added information about the stock size and the type of funding organisation (using the newly published stocksize and the fundertype vocabularies.
  • Added opening hours information.
  • Added subject headings.

Some of these changes were already present in the triple store data but weren't reflected on the lobid.org frontend. Now you can get all this data in HTML and RDFa via your web browser or - using content negotiation - in other RDF formats.

Monthly updates

We've automated the updating process so that from now on the organisations data will be updated on a monthly basis.

TODO

If you want to make use of the new data by querying subject headings, say: "Give me all institutions which have 'Karten' (German for 'maps')", that would translate into SPARQL as follows:

 curl -L -H "Accept: text/turtle"  --data-urlencode 'query=
SELECT ?s
WHERE {
GRAPH  <http://lobid.org/organisation/> {
  ?s <http://purl.org/dc/elements/1.1/subject>  ?o
         FILTER regex(?o, "^Karten") }
}
LIMIT 70000 ' http://lobid.org/sparql/

You will be disappointed, because this simple query (about a small dataset of only 350k triples) took 10 minutes (at the first time, without cache) and will not bring full result because of "hit of complexity". The problem is not SPARQL per se, but that you deal with literals, for which a triple store is (understandable) not optimized. This directly leads to another desiderata:
Subject Headings should not be literals, but URIs. That's already the case in the lobid data describing bibliographic resources but not in the organisation descriptions.
URIs as subject headings have other positive side effects. Using e. g. dewey decimal classification you can have direct access to translation of each class into many languages. You have a hierarchy of classes, and, whats most important, you have a non ambigous identifier from a controlled vocabulary rather than a plain word which could have different meanings in different contexts.
Thus, the transformation of these literals into URIs is a TODO.

Of course, an API based on a search engine would be also fast and would bring some extra benefits, e. g. auto suggestions. We are working on that!

We do much work with the German authority file GND (Gemeinsame Normdatei), specifically its RDF version. The underlying GND ontology was published last year and I must admit I haven't been taking the time yet to get an overview over the ontology. Today I started by getting to know the hierarchical structure of the ontology's classes. As I didn't find an overview on the web, I created my own and publish it here.

First I took the current GND ontology turtle file from the GND namespace http://d-nb.info/standards/elementset/gnd#, extracted only the classes and their rdfs:subProperty as well as the few owl:EquivalentClass relations, converted the result to dot format using rapper and created the following graph image from this:

(Link to high resolution image)

As one can see on the image, GND ontology's class hierarchy extends over three levels (except of gnd:Person which adds another level to distinguish between differentiated person and undifferenitiated persons, i.e. person names). The top class is gnd:AuthorityUnit which has seven subclasses that have several subclasses themselves. Additionally, there is the seperate class gnd:NameOfThePerson that isn't directly linked (with rdfs:subClassOf) to other GND classes at all. Here is a bulleted list providing another view of GND ontology's the class hierarchy. (For better readability I put spaces into the GND URI class names.)

  • Name of the Person
  • Authority Resource
    • Corporate Body
      • Fictive Corporate Body
      • Organ of Corporate Body
      • Project or Program
    • Conference or Event
      • Series of Conference or Event
    • Subject Heading
      • Subject Heading Senso Stricto
      • Characters or Morphemes
      • Ethnographic Name
      • Fictive Term
      • Group of Persons
      • Historic Single Event or Era
      • Language
      • Means of Transport with Individual Name
      • Nomenclature in Biology or Chemistry
      • Product Name or Brand Name
      • Software Product
    • Work
      • Manuscript
      • Musical Work
      • Provenance Characteristic
      • Version of a Musical Work
      • Collection
      • Collective Manuscript
    • Place or Geographic Name
      • Fictive Place
      • Member State
      • Name of Small Geographic Unit Lying within another Geographic Unit
      • Natural Geographic Unit
      • Religious Territory
      • Territorial Corporate Body or Administrative Unit
      • Administrative Unit
      • Way, Border or Line
      • Building or Memorial
      • Country
      • Extraterrestrial Territory
    • Person
      • Differentiated Person
        • Collective Pseudonym
        • Gods
        • Literary or Legendary Character
        • Pseudonym
        • Royal or Member of a Royal House
        • Spirits
      • Undifferentiated Person
    • Family