LOD Mapping 201107

This page is build upon Converting the Open Data from the hbz to BIBO. We explain the mapping process from hbz MAB2 to our lobid.org datamodel which uses Bibo, Dublin Core and other Ontologies.

We took care of the considerations made in RDF representation of series and multi volumes .

Vocabularies

See the current list of vocabularies.

There are two namespaces in addition to the ones mentioned in Converting the Open Data from the hbz to BIBO :

@prefix wdrs: <http://www.w3.org/2007/05/powder-s#> .
@prefix isbd: <http://iflastandards.info/ns/isbd/elements/> .

and now we use the Prefix dcterms instead of dc for the namespace "<http://purl.org/dc/terms/>" .

Mapping of fields

Note: for convenience, if we speak of "field" we only give the field-ID, for example 037b_a would be <rdfmab:field/037b_a>.

We have mapped to fields from the record-centric RDF/ISO2709-format to a resource-centric BIBO-description as follows. Note that the original field names used below may contain wildcards for single characters (. and quantifier (?)as used in regular expressions).

Resource-URI	The URI of the resource that is to be described is derived from identifier of the record, to be found in `001_.?.a`. We decided to also mint URIs for ZDB-IDs. These ZDB-IDs come mainly from `026_.?a` (the value have to start with "ZDB", there are other values there). Note that some ZDB-IDs for lobid.org were generated via post processing and are not contained in the original open data dump. Read more at HTTP URIs with ZDB-IDs
dcterms:title	The title of the resource, combining main title and other title information. There will be only one dcterms:title. Sequentially the following fields will be used (if they are available): `310..?a`,`331..?a`,`333_.?a` and these as well as these fields. (See also isbd\:P1004 and isbd\:P1006 below.)
dcterms:language	The language of the resource, found in `037b.a`.
dcterms:issued	The year the resource was issued, sanity checked using `<rdfmab:field/425..a>`.
dcterms:subject	Subject-Links. These are derived from several fields: `9.._.?9` fields contain identifiers from the subject authority file of the German National Library(DNB), which are available as Linked Data since April 2010. `700b.a` contain DDC-Notations. In order to link to the Linked Data Version of the classification, these numbers are truncated to the first three levels. If the full classification where available, we would be very happy to link to deeper levels.
bibo:issn	The ISSN of the resource, found in `542..?a`. The ISSN is deliberately provided as a string, not a URI, since it is the string that is the identifier, not some resource identified by <uri:ISSN:ISSN>. This conforms to the range defined in the BIBO.
dcterms:extent	The extent of the resource, usually the number of pages, as found in `43[3457].?.?.?`.
rdf:type	The type of a resource is derived from several fields, thus possibly resulting in multiple types for the same resource. The current mapping is a little a whole lot more elaborate since the start of lobid. org but will be subject of a further analysis. Have a look a the actual mapping fields . most resources are typed as `frbr:Manifestation`. Note that series and journals are no manifestation, see also RDF-Repräsentation von Serien und mehrbaendigen Werke
bibo:volume	The volume number of the resource, found in `090_..?`,which holds the sortable form, and the descriptive form in field `089_?.?a` is used.
dcterms:isPartOf	Fortunately, the original data already includes many links from subordinate to superordinate records which can be used to link the corresponding resources: `010_?1?a?` contains the record-id of a direct superordinate `453.?.?.?` contains the record-id of the first series title `599..?a` contains the record-id of the record describing the journal that this resource is published in.
dcterms:creator	`1...19` fields contain authority numbers of the authors of the resource. We decided to no more use the bibo:authorlist because of the simplicity of dcterms:creator (no need of blank nodes) - thus it can be ideally handled by generic Linked-Data-Displays such as pubby. Note that there are basically two types of authority numbers in the data: those maintained by the DNB (which are available as Linked Data) and local hbz-numbers, which are not available as Linked Data. In the first case, the resulting link leads to the Linked Data Service of the DNB, in the latter case the link unfortunately leads nowhere. Note two: somewhere at the end of the year the hbz-PND will be merged into the dnb-GND.
dcterms:publisher	There are a lot of fields, look at the mapping: {publisher_name}} is the name of the publisher and {publisher_place}} the place of the publisher. To conform to the range of the `dcterms:publisher` predicate as defined in the DCMI Metadata Terms, we have introduced blank nodes for the publishers, typed as `foaf:Organisation`. The place of the publisher is attached as another blank node via `geo:location`. That blank node is typed `geo:SpatialThing` and has the name of the place attached by `geonames:name`, since we lack a mapping of the place names to geonames-identifiers. We are aware that this seems overly complicated, but we are trying to identify and properly model the entities that are referenced in the original data, even if that results in blank nodes in the first run. As soon as an authority file for publishers is available, we will try to link there. We might even have a look at the resulting blank nodes and see if the information is clean enough to form the basis of such a file.
frbr:exemplar	In the current state of the raw data, holding information is only implicitly available. Since the records are segmented into packages by instutition, we know that an institution is the `frbr:owner` of at least one `frbr:Item` of the described `frbr:Manifestation`. Since we currently do not have signature-information, those items are once again modelled as blank nodes.

The following predicates are totally new:

dcterms:format and dcterms:medium	`050` , `652_a` . The MAB2 values will be mapped according to mapping , look under `format`. Here is work to do: for now we do not have (a known) controlled vocabulary. Also at the moment the values of both properties are a bit mixed up.
owl:sameAs	For now there are some `owl:sameAs` links if the fields `026_.?a` exist. If it is a ZDB-ID a second `owl:sameAs` is created with this ID suffixed to http://lobid.org/resource/ . Have a look at HTTP URIs with ZDB-IDs.
wdrs:describedby	An URL to the local hbz-OPAC view is generated using the field `001_.?a` .
dcterms:source	There are a lot of fields, look at the mapping under `source`. Mostly there will be just literals so we cannot provide `dcterms:source` or something similar.
isbd\:P1004	Main title of the resource. (Is used in parallel to dc:title (which combines main title and other title information). Sequentially the following fields will be used (if they are available): `310..?a`,`331..?a`,`333_.?a` and these
isbd\:P1006	The subtitle or any other remainder of the title of the resource. There can be many dcterms:alternative , coming from these fields
bibo:isbn10 and bibo:isbn13	The ISBN 10 an ISBN 13 of the resource, found in `540...?`. The former used `bibo:isbn` is given up to these more specific predicates. The ISBN is deliberately provided as a string, not a URI, since it is the string that is the identifier, not some resource identified by <uri:ISBN:ISBN>. This conforms to the range defined in the BIBO.
dcterms:abstract	That's the property we use in the rare but happily cases the fields of `description_abstract` (look at the mapping) exist.
bibo:doi	If field `552b.a` exists the value is mapped to `bibo:doi`
bibo:oclcnum	Thats the field `25o[12]a`.
bibo:edition	Thats the fields `400 1a`, `403 1[an]`, `510 1a`.
dcterms:source	Thats the field `021 1a`. The ID points to the internal ID of the original source from which the resource is derived. From this ID a lobid-resource link is assembled.
dcterms:hasFormat	All resources which are linked to through dcterms:source will be enhanced with that predicate, linking to the otehr resource so that here will be a reziprocal relation. As this information is sadly missing in the underlying datasets this triple will be produced after the complete data transformation (sigh ) (not yet implemented).
dcterms:hasPart	Thats the field `529z*9a`. It's a link to a resource supplement.

Seitenhierarchie

Vocabularies

Mapping of fields

The resulting model

2 Kommentare

Adrian Pohl sagt:

Pascal Christoph sagt: