A              ConCom – Concept Comparer

A.1          Basic Information

Many approaches acquire user characteristics for a user model to be populated or kept up to date and this way provide a basis for successful personalization of visible aspects in adaptive web-based applications.

Some information can be acquired directly from the user (e.g., the user is asked a question, fills in a form), observations of user’s behavior while working with the application, analysis of logs on the web server or analysis of the presented content.

We focus on the analysis of the presented content especially on evaluation of the similarity to find common or different aspects of the content.

A.1.1      Basic Terms

Ontological concept

A set of properties that are connected to related concepts. Concepts can be ordered in the hierarchy.

Instance

Reflects objects from real world.

Datatype property

Expresses relations between concept instances and RDF literals and XML Schema datatypes.

Object property

Expresses relations between two instances.

A.1.2      Method Description

The main idea of the method used in ConCom tool is based on the evaluation of common property pairs present in both instances. Every instance of the concept can consist of object or datatype properties that need to be treated differently. When a datatype property is evaluated the method ends after using a metric intended for comparing strings. Object properties are processed recursively by using respective metrics until literals are reached or until there are no properties left.

The total similarity is computed continuously step by step as the different evaluation metrics are used, i.e. each metric contributes to the total similarity with its own partial similarity as computed for respective instances (literals).

Knowing user’s rating of displayed concepts we can use similarity measure computed for each property to investigate common and different properties of the compared instances and find out useful information about the user’s interests to be used for personalization purposes.

A.1.3      Scenarios of Use

ConCom can be used in the following scenarios:

§  An input is two instances of ontological concepts and the result is their similarity.

§  An input is one instance and the result is a set of most similar instances to given one.

ConCom should not be used in following case:

§  Instances of ontological concepts that do not belong to the same ontology.

A.1.4      External Links and Publications

Andrejko, A., Bieliková, M.: Investigating Similarity of Ontology Instances and Its Causes. In V. Kurkova, R. Neruda, J. Koutnik (eds.): Artificial Neural Networks – ICANN 2008, Prague, Czech Republic: LNCS. Springer, 2008. (to appear)

Andrejko, A., Bieliková, M.: Estimating similarity of the ontological concepts instances for the adaptive applications based on Semantic Web. [in Slovak] In: Václav Snášel (ed.): Znalosti 2008: Proceedings of the 7th annual conference, Bratislava, February 13-15, 2008, pp. 30-41.

Andrejko, A., Bieliková, M.: Estimating similarity of the ontological concepts instances for personalization purposes. [in Slovak] In: František Babič, Ján Paralič (eds.): 2nd Workshop on Intelligent and Knowledge Oriented Technologies, WIKT 2007 Proceedings, Košice, November 15-16, 2007, pp. 46-49.

Andrejko, A., Bieliková, M.: Comparing Instances of the Ontological Concepts. In: Tools for Acquisition, Organisation and Presenting of Information and Knowledge (2): Research Project Workshop Horský hotel Poľana, Slovakia September 22-23, 2007, pp. 26-35

Andrejko, A., Barla, M., Tvarožek, M.: Comparing Ontological Concepts to Evaluate Similarity. In: Tools for Acguisition, Organisation and Presenting of Information and Knowledge : Research Project Workshop Bystrá dolina, Nízke Tatry, Slovakia, September 29-30, 2006, pp. 71-78

Log4J. Java-based logging utility, Apache Software Foundation. (http://logging.apache.org/log4j)

SimMetric. Open source Similarity Measure Library. (http://sourceforge.net/projects/simmetrics/)

A.2          Integration Manual

ConCom is developed in Java (Standard Edition 6) and is distributed as a jar archive. Access to the functionality of the tool is provided through the command line using call:

java -jar concom.jar [common-options] URL1 URL2

where:

§  uri1, uri2 − unique identifiers of instances to be compared.

and [common-options]:

§  -help – shows help

§  -server <url> – ontology server

§  -ontology <name> – ontology name

§  -username <username> – username used for repository connection

§  -password <password> – password used for repository connection

§  -use-uncommon <true|false> – whether to use 'uncommon predicates' for data nodes, default is false

§  -strong <filename> – filename of file containing URIs of 'strongly' filtered predicates

§  -weak <filename> – filename of file containing URIs of 'weakly' filtered predicates

§  -metric-data <M|L|D> – strings comparison metric used for data nodes, default is D

§  -metric-labels <M|L|D> –  strings comparison metric used for labels, default is D

where [metric]

§  M – Monge-Elkan

§  L – Levenshtein

§  D – Dummy (internal)

ConCom is not a stand-alone application; the tool is proposed to be included in other application/tool, which will call its interface methods.

A.2.1      Dependencies

ConCom uses:

§  Log4J logging utility,

§  SimMetrics open source Similarity Measure Library.

A.2.2      Installation

Deploying ConCom into other application requires three external jar archives that must be included into existing project − the jar archives containing ConCom, Log4J and SimMetrics.

A.2.3      Configuration

ConCom uses configuration from the command line as described above or configuration parameters can be set in the configuration file.

A.2.4      Integration Guide

ConCom computes similarity measure for two instances of ontological concepts given in the command line. The result is a similarity measure computed using respective similarity metrics. Furthermore, ConCom provides an interface that allows searching for the most similar instances to the given one.

A.3          Development Manual

A.3.1      Tool Structure

ConCom consists of following packages:

§  Provides set of applications that compute various similarities (sk.fiit.nazou.concom.applications);

§  Provides classes and interfaces for handling instances of ontological concepts  (sk.fiit.nazou.concom.concept);

§  Provides classes and interfaces to compute a similarity measure between two instances. (sk.fiit.nazou.concom.similarity).

A.3.2      Method Implementation

To evaluate similarity measure between instances we proposed a method based on recursive evaluation of the properties compared instances consist of. The rough principle of the method illustrating comparison of two instances instanceA and instanceB is as follows.

 function getSimilarity(instanceA, instanceB)

   set similarity to 0.0
   set counter to 0
   store properties for instanceA and instenceB to properties

   foreach property in properties do

     increment counter

     if property is in both instances then

       store connected elements to elementX and elementY
       add computeSimilarity(elementX, elementY ) to similarity

     else

       add 0.0 to similarity

     end if

   end foreach

   return similarity/counter

 end function

 

 function computeSimilarity(elementX, elementY )

   if property is datatype then

     return getDatatypeSimilarity(elementX, elementY )

   else

     set similarity to 0.0

     add getObjecSimilarity(elementX, elementY ) to similarity

     add getSimilarity(elementX, elementY ) to similarity

     return similarity/2

   end if

 end function

When comparing two instances, properties can appear in different cardinalities:

§  single in both instances,

§  multiple in both instances,

§  single/multiple in one instance only.

When the property has a single occurrence in both instances then the similarity of related elements (instances in the case of object properties or literals in the case of datatype properties) is evaluated using different similarity metrics. The comparison of datatype properties ends after a metric is used to compute the similarity measure between the related literals. For object properties a metric for related instances is computed (e.g., taxonomy distance) and further comparison is performed recursively on the respective instances until literals are reached or until there are no properties left.

When an instance is being traversed recursively, an inverse property can connect it to an already traversed instance. If we do not consider inverse or symmetric properties, the algorithm will traverse them and enter an infinite loop. Therefore, we filter out inverse and symmetric properties to the examined property. However, loops can still occur, for example, if two different properties lead to the same instance. In such cases, the already traversed instances are omitted and further traversing stops.

Multiple occurrences of properties in an instance are the most complex case we have to address. In this case, two sets are constructed which contain elements which are connected to the examined property in the first and second instance respectively. These two sets can have different cardinalities – the problem is to identify (i.e., to match) similar elements between these two sets. We use our similarity measure to identify such element pairs, which are then compared and the computed similarity contributes to the total similarity between the two instances.

If single or multiple occurrence of a property occurs only in one instance, we estimate similarity of values attached to the property as equal zero. It is based on the similarity definition, i.e. the similarity equals zero if two objects are entirely different. Here, we assume that instances are entirely different in the property, since a value is assigned to the property in one instance only.

Furthermore, we investigate reasons (properties) that influenced user evaluation of content (e.g., interest). We introduce two threshold values used to discover a user’s likes and dislikes. From the personalization perspective we are only interested in the two outer sets – positive and negative items. The identified properties can be used by other tools for actualization of characteristics in the user model or for the acquisition of new ones.

A.3.3      Enhancements and Optimizing

Each instance of the concept is represented as a tree (eventually a graph) consisting of nodes (instances) and edges (properties). Each element (node or edge) is represented by its URI in the repository. Object representation to create a node from given URI is provided by NodeFactory:

final NodeFactory factory = Utils.getNodeFactory();

Afterwards the node for given URI (xURI) is acquired as follows:

final Node x = nodes.get(xUri);

The nodes and edges are acquired asynchronously in threads and afterwards are stored in the cache not to be acquired repeatedly. It avoids multiple querying repository for the same data.

A.4          Manual for Adaptation to Other Domains

The method of the recursive evaluation implemented in the ConCom tool is universal and exploits ontological structure of the concept. It is based on acquiring properties and instances (literals) which are connected. Therefore, it can be used also in other application domains. However, in some cases it might be desirable to add additional metrics to achieve better results or to deal with particularities typical for processed domain.

A.4.1      Configuring to Other Domain

When using ConCom in other application domains, more attention should be paid especially to inverse properties because they cause circular references. Inverse properties can be identified through owl:InverseOf property of OWL language. However, query returns both, i.e. base property and its inverse property. Therefore, it is necessary to fill in the list of inverse properties for given domain in the configuration file to be ignored.

A.4.2      Dependencies

Log4J and SimMetrics are involved domain independently into the ConCom.