Method for Ontology Concept and Instance Similarity Measurement (tool OntoSim)

One sentence description of the method.

Institution: Institute of Informatics
Technologies used: Java, JENA
Inputs: Ontology Concepts or Instances
Outputs: Similarity between the two concepts or instances.

Addressed Problems

In many situations expressing similarity of two concepts or instances from an ontology can lead to more precise searching results. OntoSim addresses the problem of fast and effective comparison of ontology concepts and instances. It's result is intended for use by other methods in scope of the NAZOU project.

Description

OntoSim computes similarity between concepts and instances in the ontology. Measurement of similarity between concepts and instances in ontology can be used for many different purposes by many tools that have been developing within the NAZOU project. As a theoretical base for the OntoSim tool a similarity graphs is used.

Implementation was executed using Java programming language and Jena Semantic development library. The objective of the implementation was to compute similarity matrix of a given sub-ontology. Such goal is performed by a sequence of the following steps:

  1. Infer sub-classes of the main concept - for this purpose we used Jena generic inference engine, which was used to generate a new model of all sub-classes of a chosen main concept. The following rule was used as input for this:
    [ (?C1 rdfs:subClassOf ?C2) (?C2 rdfs:subClassOf ?C3) ->
      (?C1 rdfs:subClassOf ?C3) ]
    
    The above is a simple rule which infers that C1 is sub-class of C3 if there is a concept C2 for which holds that C1 is sub-class of C2 and C2 is sub-class of C3.
  2. Load all inferred sub-classes of the main concept - in this step the new inferred relations stored in a new model needs to be connected to the original model, because the new model contains only new inferred relations. If the two models would not be connected, initial relations between concepts would be lost. After both models are connected, all concepts are retrieved - inferred and original as well.
  3. Generate similarity matrix - similarity matrix is generated for all concept combinations. Therefore we generate a square matrix C x C, where C is a number of concepts in our sub-ontology. We use expression (1) to compute similarity among concepts; therefore we need to count shared nodes of two concepts. Having all inferred and original relations in one model we can retrieve all shared super-nodes using the following SPARQL query:
    SELECT ?x
      WHERE {
        ns:a rdfs:subClassOf ?x .
        ns:b rdfs:subClassOf ?x .
      }
    The above query retrieves all concepts x which are super-classes of both ns:a and ns:b.

To get a similarity of concepts C1 with C2, concept C1 must be identified as a row and C2 as a column. The value identified on the intersection is the similarity of the concepts. Similarity operation in this method is not symmetric, therefore:
sim(C1, C2) <> sim(C2, C1)

References

  1. Balogh Z., Budinská I.: OntoSim - Ontology-based Similarity Determination of Concepts and Instances> In: Tools for Acquisition, Organisation and Presenting of Information and Knowledge. P.Navrat et al. (Eds.), Vydavatelstvo STU, Bratislava, 2006, ISBN 80-227-2468-8. Workshop 29-30 September, Nizke Tatry, Slovakia. ITAT 2006, NAZOU Workshop, 26. 9 - 1. 10. 2006, Chata Kosodrevina, Bystrá dolina, Nízke Tatry, 2006