Page tree
Skip to end of metadata
Go to start of metadata

This page is build upon Converting the Open Data from the hbz to BIBO. We explain the mapping process from hbz MAB2 to our lobid.org datamodel which uses Bibo, Dublin Core and other Ontologies.

We took care of the considerations made in RDF representation of series and multi volumes .

Vocabularies

See the current list of vocabularies.

There are two namespaces in addition to the ones mentioned in Converting the Open Data from the hbz to BIBO :

@prefix wdrs: <http://www.w3.org/2007/05/powder-s#> .
@prefix isbd: <http://iflastandards.info/ns/isbd/elements/> .

and now we use the Prefix dcterms instead of dc for the namespace "<http://purl.org/dc/terms/>" .

Mapping of fields

Note: for convenience, if we speak of "field" we only give the field-ID, for example 037b_a would be <rdfmab:field/037b_a>.

We have mapped to fields from the record-centric RDF/ISO2709-format to a resource-centric BIBO-description as follows. Note that the original field names used below may contain wildcards for single characters (. and quantifier (?)as used in regular expressions).

Resource-URI

The URI of the resource that is to be described is derived from identifier of the record, to be found in 001_.?.a. We decided to also mint URIs for ZDB-IDs. These ZDB-IDs come mainly from 026_.?a (the value have to start with "ZDB", there are other values there). Note that some ZDB-IDs for lobid.org were generated via post processing and are not contained in the original open data dump. Read more at HTTP URIs with ZDB-IDs

dcterms:title

The title of the resource, combining main title and other title information. There will be only one dcterms:title. Sequentially the following fields will be used (if they are available): 310..?a,331..?a,333_.?a and these as well as these fields. (See also isbd\:P1004 and isbd\:P1006 below.)

dcterms:language

The language of the resource, found in 037b.a.

dcterms:issued

The year the resource was issued, sanity checked using <rdfmab:field/425..a>.

dcterms:subject

Subject-Links. These are derived from several fields:

  • 9.._.?9 fields contain identifiers from the subject authority file of the German National Library(DNB), which are available as Linked Data since April 2010.
  • 700b.a contain DDC-Notations. In order to link to the Linked Data Version of the classification, these numbers are truncated to the first three levels. If the full classification where available, we would be very happy to link to deeper levels.

bibo:issn

The ISSN of the resource, found in 542..?a. The ISSN is deliberately provided as a string, not a URI, since it is the string that is the identifier, not some resource identified by <uri:ISSN:ISSN>. This conforms to the range defined in the BIBO.

dcterms:extent

The extent of the resource, usually the number of pages, as found in 43[3457].?.?.?.

rdf:type

The type of a resource is derived from several fields, thus possibly resulting in multiple types for the same resource. The current mapping is a little a whole lot more elaborate since the start of lobid. org but will be subject of a further analysis. Have a look a the actual mapping fields .

bibo:volume

The volume number of the resource, found in 090_..?,which holds the sortable form, and the descriptive form in field 089_?.?a is used.

dcterms:isPartOf

Fortunately, the original data already includes many links from subordinate to superordinate records which can be used to link the corresponding resources:

  • 010_?1?a? contains the record-id of a direct superordinate
  • 453.?.?.? contains the record-id of the first series title
  • 599..?a contains the record-id of the record describing the journal that this resource is published in.

dcterms:creator

1...19 fields contain authority numbers of the authors of the resource. We decided to no more use the bibo:authorlist because of the simplicity of dcterms:creator (no need of blank nodes) - thus it can be ideally handled by generic Linked-Data-Displays such as pubby. Note that there are basically two types of authority numbers in the data: those maintained by the DNB (which are available as Linked Data) and local hbz-numbers, which are not available as Linked Data. In the first case, the resulting link leads to the Linked Data Service of the DNB, in the latter case the link unfortunately leads nowhere. Note two: somewhere at the end of the year the hbz-PND will be merged into the dnb-GND.

dcterms:publisher

There are a lot of fields, look at the mapping: {publisher_name}} is the name of the publisher and {publisher_place}} the place of the publisher. To conform to the range of the dcterms:publisher predicate as defined in the DCMI Metadata Terms, we have introduced blank nodes for the publishers, typed as foaf:Organisation. The place of the publisher is attached as another blank node via geo:location. That blank node is typed geo:SpatialThing and has the name of the place attached by geonames:name, since we lack a mapping of the place names to geonames-identifiers. We are aware that this seems overly complicated, but we are trying to identify and properly model the entities that are referenced in the original data, even if that results in blank nodes in the first run. As soon as an authority file for publishers is available, we will try to link there. We might even have a look at the resulting blank nodes and see if the information is clean enough to form the basis of such a file.

frbr:exemplar

In the current state of the raw data, holding information is only implicitly available. Since the records are segmented into packages by instutition, we know that an institution is the frbr:owner of at least one frbr:Item of the described frbr:Manifestation. Since we currently do not have signature-information, those items are once again modelled as blank nodes.

The following predicates are totally new:

dcterms:format and dcterms:medium

050 , 652_a . The MAB2 values will be mapped according to mapping , look under format. Here is work to do: for now we do not have (a known) controlled vocabulary. Also at the moment the values of both properties are a bit mixed up.

owl:sameAs

For now there are some owl:sameAs links if the fields 026_.?a exist. If it is a ZDB-ID a second owl:sameAs is created with this ID suffixed to http://lobid.org/resource/ . Have a look at HTTP URIs with ZDB-IDs.

wdrs:describedby

An URL to the local hbz-OPAC view is generated using the field 001_.?a .

dcterms:source

There are a lot of fields, look at the mapping under source. Mostly there will be just literals so we cannot provide dcterms:source or something similar.

isbd\:P1004

Main title of the resource. (Is used in parallel to dc:title (which combines main title and other title information). Sequentially the following fields will be used (if they are available): 310..?a,331..?a,333_.?a and these

isbd\:P1006

The subtitle or any other remainder of the title of the resource. There can be many dcterms:alternative , coming from these fields

bibo:isbn10 and bibo:isbn13

The ISBN 10 an ISBN 13 of the resource, found in 540...?. The former used bibo:isbn is given up to these more specific predicates. The ISBN is deliberately provided as a string, not a URI, since it is the string that is the identifier, not some resource identified by <uri:ISBN:ISBN>. This conforms to the range defined in the BIBO.

dcterms:abstract

That's the property we use in the rare but happily cases the fields of description_abstract (look at the mapping) exist.

bibo:doi

If field 552b.a exists the value is mapped to bibo:doi

bibo:oclcnum

Thats the field 25o[12]a.

bibo:edition

Thats the fields 400 1a, 403 1[an], 510 1a.

dcterms:source

Thats the field 021 1a. The ID points to the internal ID of the original source from which the resource is derived. From this ID a lobid-resource link is assembled.

dcterms:hasFormat

All resources which are linked to through dcterms:source will be enhanced with that predicate, linking to the otehr resource so that here will be a reziprocal relation. As this information is sadly missing in the underlying datasets this triple will be produced after the complete data transformation (sigh ) (not yet implemented).

dcterms:hasPart

Thats the field 529z*9a. It's a link to a resource supplement.

The resulting model 

  • No labels

2 Comments

  1. Ich schreibe hier einfach mal rein, was mir so auffällt bei einigen Datensätzen:

    More to come...

    1. zu dcterms:alternative : sollen wir die Angaben zum "Gesamttitel" (MAB 451-496) ganz ignorieren, da die Datensaetze ja sowieso verlinkt sind via dct:isPartOf ?
      zur Überordnung : ich bekomme ja nur das geliefert was unter wdrs:described zu finden ist (da mal auf "Feldnamen" oder, für die MAB-Darstellung "Feldnummern" anklicken). Da also in dem Überordnungs-Datensatz "Philosophie der symbolischen Formen" keine Unterordnungen dranhängen gibt es das Wissen nur implizit (da die Unterordnunge tatsächlich mit der Überordnung verbunden sind). D.h. es müsste über eine externe Postprozessierung das implizite Wissen explizit an die Überordnung dranghehangen werden. Das geht natürlich prinzipiell, ist nur ein bisschen aufwendiger.