Sämtliche Quellen zum Thema Koreferenzen im Semantic Web

Der Resolving- und Lookup-Dienst von culturegraph.org möchte einer Proliferation (= einer Zunahme) von Koreferenzen im Semantic Web bzw. in den traditionellen Verbundkatalogen begegnen.

Koreferenz liegt vor, wenn in einer Äußerung mit zwei verschiedenen sprachlichen Ausdrücken dasselbe bezeichnet wird. Der Produzent dieser Äußerung referiert mit den beiden Ausdrücken auf dieselbe außersprachliche Entität. Quelle: Wikipedia.

Es gibt verschiedene Ansätze, mit Koreferenz im Semantic Web umzugehen, dazu gehören:

owl:sameAs
die coref-Ontology von Iam Millard und Hugh Glaser. (Siehe auch die dazugehörigen Publikationen).
die Bundle Ontologie von Ben 'O Steen. (Siehe auch diesen Blog-Post.)
evtl. kommt auch die Similarity Ontology in Frage.
...(andere Ideen?)...

Was sind die Vor- und Nachteile der verschiedenen Ansätze? Welcher Ansatz eignet sich für unsere Zwecke am besten? - Diese Fragen sollten geklärt werden, bevor die Verknüpfung koreferenter Identifier umgesetzt wird.

In Bearbeitung

Der Vorschlag ist, die Bundle-Ontologie zu verwenden. Begründung:

owl:sameAs bezieht sich auf Identität zwischen Individuen. Das ist hier nur bedingt gegeben, und im Zweifelsfall sollten wir die Property nicht verwenden.
die coref-Ontologie ist dafür entwickelt, Koreferenzen zu beschreiben. Es fehlen aber Möglichkeiten, die Begründung für die Kofererenz (warum sind diese Sachen eigentlich dieselben?) zu modellieren.
MuSim und Bundle bieten beide die Möglichkeit, Provenienzinformation mitzugeben, MuSim verwendet dafür eigene Konstrukte (sim:workflow bzw. sim:method; beide mit rdf:range sim:AssociationMethod), während Bundle die Open Provenance Model Vocabulary (OPMV) dafür verwendet. Die W3C Provenance Incubator Group fand die OPMV das bisher am besten modellierte Provenienzmodell und setzt es als Referenzvokabular ein. Bundle erweitert auch die OAI-ORE-Spezifikation, die u. a. das Europeana Data Model (EDM) für das Zusammenführen verteilter Inhalte verwendet. Dies wird voraussichtlich zu einer besseren Interoperabilität mit anderen Diensten ermöglichen.

1. owl:sameAs

Quellen

Das Prädikat owl:sameAs wurde im Linked-Data-Netz häufig benutzt, um dezentral publizierte Datensammlungen miteinander zu verknüpfen. Allerdings bringt das einige Probleme mit sich, weshalb es auch andauernde und umfangreiche Diskussionen sowie Publikationen zur Nutzung von owl:sameAs gab (siehe die oben angeführten Quellen).

Halpin und Hayes (2010) schreiben (S.1):

Much of the supposed “crisis” over the proliferation of owl:sameAs in Linked Data can be traced to the fact that these uses of owl:sameAs tend to be mutually incompatible, and almost always violate the rather strict logical semantics of identity demanded by owl:sameAs.

2. coref-ontology

Die coref-Ontologie wurde von Hugh Glaser und Ian Millard zur Nutzung im RKB Explorer entwickelt. Sie ist eine recht simple Ontologie zur Repräsentation von Koreferenz im Semantic Web und besteht aus

einer Klasse
drei Object-Properties und
einer Datatype-Property.

coref:Bundle: Ein Bundle enthält URIs, die als koreferent vor dem Hintergrund eines bestimmten Kontexts angesehen werden.
coref:coreferenceData: Dieses Prädikat verweist auf ein Bundle, das koreferente URIs zu einer URI enthält. Als Range ist auch coref:Bundle spezifiziert.
coref:duplicate: Ein Duplikat ist eine URI, die mit einer anderen URI koreferent ist. Als Domain dieses Prädikats wird ein coref:Bundle genutzt (siehe das Beispiel, allerdings ist es so in der Ontologie nicht spezifiziert.
coref:canon: Die kanonische URI eines Bundles, die vorzugsweise benutzt werden sollte. Auch hier wird als Domain coref:Bundle in der Praxis genutzt, ist aber nicht entsprechend in der Ontologie spezifiziert.
lastUpdated: Ein Literal gibt an, wann das Bundle zum letzten Mal aktualisiert wurde. Als Domain ist entsprechend coref:Bundle spezifiziert.

3. Bundle-Ontologie

Ontologie
prefix: bundle
Blogpost

Die Bundle-Ontologie wurde von Ben 'O Steen in expliziter Anknüpfung an die coref-Ontologie entwickelt. Sie ist umfangreicher und berücksichtigt

die Begründung zur Bündelung der URIs sowie
umfangreichere Provenienzangaben (Akteur, Zeitpunkt) unter Rückgriff auf das Open Provenance Model Vocabulary.

In der Ontologiebeschreibung heißt es:

The Bundle ontology provides classes and properties which model the assertions made when a person, software agent or some other process asserts that one URI is likely to be the same another URI. The property 'owl:sameas' is of no use for this purpose, as it is a logical assertion and not a realistic one. The loss of context as to why, how and who made the decision to equate two or more URIs makes it very hard to use at all.

Klassen:

bundle:Bundle: This class is used to represent a 'bundling' of URIs together, to make the assertion that the Agent who owns or has created this bundle believes that these URIs are the same. It is expected to be linked via the 'bundle:encapsulates' property to two or more URIs, and to be linked via 'bundle:justifiedby' to an bundle:Reason node, holding an OPMV graph detailing the how, where, what and why this link was made amongst other details.
bundle:NotABundle: This class has the same form as a bundle:Bundle, but has the crucial distinction that it is used to represent a set of URIs that have been assert to not be the same by a given Agent.
bundle:Reason: This class is used to represent the evidence and process by which a bundle:Bundle was made. As a Bundle is typically created by a probabalistic, NLP or human-driven methods, it is hard to qualify exactly what is necessary here without further usage of this technique. An OPMV graph is expected however to have this node as one of the 'Artifact's generated by the bundling process, hence it subclasses from opmv:Artifact.
- bundle:AlgorithmicReason: This class is used to represent the evidence, algorithm and any other metric that serves to provide a basis for the Bundle. Use of the bundle:algorithmicvector property may be helpful if a Felligi-Sunter or other approach with a vector/numeric confidence value had been taken.
- bundle:AgentReason: This class is used to represent the evidence, and as far as possible, the reason why an Agent (foaf:Person, etc) bundled these URIs together.
- bundle:CrowdsourcedReason: This class is used to represent the evidence as far as possible, for a crowd-sourced bundling of URIs. Typically, this will hope to reflect a mass attitude at a given time rather than anything concrete.

Properties:

bundle:encapsulates: This property links together URIs to a given Bundle to form something like a congruent closure, but without any of the strengh, mathmatical or authoritative, that a logical closure would bring. It may be treated as one, if the process by which the Bundle is created is infalible, but as the author of this schema, I cannot think of any situation where that might happen. (Domain: bundle:Bundle)
bundle:justifiedby: This property links a Bundle to the bundle:Reason artifact that provides it with context as to why and who created it and for what purpose. (Domain: bundle:Bundle, Range: bundle:Reason)
bundle:algorithmicvector: This property holds a value for the result of the algorithmic matching process that created the bundle.
bundle:comment: A human readable description for a given bundle:Reason. (Domain: union of bundle:Bundle & bundle:Reason)

4. Similarity Ontology

Ontologie
prefix: sim
Beispiel

Die Similarity Ontology ähnelt stark der Bundle Ontology. Allerdings geht es bei ihr - wie der Name schon sagt - nicht um die Modellierung von Gleichheit, sondern von Ähnlichkeit. Da allgemein von "Association" gesprochen wird, lässt sich mit ihr aber sicher auch die Gleichheitsbeziehung abbilden. Mit Klassen und Prädikaten wie sim:AssociationMethod und sim:associationMethod lässt sich die Methode der Assoziierung - ähnlich wie bei der Bundle Ontologie durch bundle:Reason und bundle:justifiedBy abbilden.

Classes:

sim:Association: An abstract class to define some association between things. Entities share an association if they are somehow inter-connected. Generally a directed association should have at lease one sim:subject property and one sim:object property or an undirected association should have at least two sim:element properties, however this is not a requirement and intentionally left out of the model.
sim:AssociationMethod: A concept for representing the method used to derive association or similarity statements.
sim:Influence: An abstract class indicating a directed association of influence where the subject entity has influenced the object entity
sim:Network: A network is a grouping of sim:Associations. The associations that comprise a network are specified using a series of sim:edge predicates
sim:Similarity: An abstract class to define similarity between two or more things. Entities share a similarity if they share some common characteristics of interest. A similarity is a special type of association

Properties:

sim:association: Binds a sim:Association to an arbitrary thing
sim:description: Specifies some description that discloses the process or set of processes used to derive association statements for the given AssociationMethod. This property is depricated in favor of the more appropriately named sim:workflow property
sim:distance: A weighting value for an Association where a value of 0 implies two elements are the same individual
sim:domain: Specifies appropriate object types for the sim:subject predicate for sim:Associations bound to the given sim:AssociationMethod. The presence of this predicate implies the given sim:AssociationMethod begets directed associations
sim:edge: Specifies an edge in a sim:Network
sim:element: Specifies an entity involved in the given sim:Association and implies the given association is undirected
sim:grounding: Binds an sim:Association statement directly instantiated N3-Tr formulae or some other workflow graph which enabled the association derivation
sim:method: Specifies the sim:AssociationMethod used to derive a particular Association statement. This should be used when the process for deriving association statements can be described further
sim:object: Specifies the object of a sim:Association implying a directed association where "subject is associated to object" but the reverse association does not necessarily exist, and if it does exist, it is not an equivalent association
sim:range: Specifies appropriate object types for the sim:object predicate for sim:Associations bound to the given sim:AssociationMethod. The presence of this predicate implies the given sim:AssociationMethod begets directed associations
sim:scope: Specifies appropriate object types for the sim:element predicate for sim:Associations bound to the given sim:AssociationMethod. The presence of this predicate implies the given sim:AssociationMethod begets undirected associations
sim:subject: Specifies the subject of an sim:Association implying a directed association where "subject is associated to object" but the reverse association does not necessarily exist, and if it does exist, it is not an equivalent association
sim:weight: A weighting value bound to a sim:Association where a value of 0 implies two elements are not at all associated and a higher value implies a closer association
sim:workflow: Specifies a workflow that discloses the process or set of processes used to derive association statements for the given sim:AssociationMethod

Seitenhierarchie

Koreferenzen im Semantic Web

1. owl:sameAs

2. coref-ontology

3. Bundle-Ontologie

4. Similarity Ontology