A              Valinor – Values Normalization

A.1          Basic Information

There are many cases of numeric data in databases representing the same thing that are directly incomparable because of their distinct units. The Valinor tool aims to solve the problem by normalization of the values into same units. It provides methods for automatic values normalization in an ontology repository.

In the simplest cases it is reduced to a problem of converting one basic unit to another one. However, there are more complex units, for which possible conversion table could be rather large, for example velocity can be expressed in meters per second, miles per hour etc. Such cases are characterized by values which units are composed of other (basic) units, which is the reason of increase of the conversion table size.

In our project NAZOU (Návrat et al., 2005), we faced a similar problem. We are storing values of salaries for job offers, which are naturally expressed in various units, e.g. EUR per year, or USD per hour. We wanted to compare salary information directly between distinct job offers and thus we needed a method for automatic value conversion from an arbitrary unit to a specified normalized unit. We decided to store the normalized values in the original database, which is RDF ontology of job offers gained from the Web. In order to achieve higher flexibility, we did not decide to implement conversions into the chosen normalized unit only, but allow conversions between any two possible units.

In the spirit of the above-mentioned examples, a possible solution should satisfy the following requirements:

§  Direct conversion coefficients should be held only for atomic units, which are not composed of other units.

§  Conversion coefficients need not be held for each pair of atomic units — the conversion table need not be fully pre-computed.

§  Units composed of other units are converted using conversions between atomic units.

§  The way of particular value conversion should be based on declaratively stated relationships between basic units, and should be dynamically computed.

§  The method should be universal — usable for conversion between any comparable units.

§  The method should be flexible enough — e.g. change of normalized unit or adding a new unit are easy changes.

A.1.1      Basic Terms

Normalization

The process by which the given numeric values are transformed into one selected common unit.

Ontology

Description of data and data itself of an information domain in the form of taxonomy of concepts and relations between them.

(Information) Domain

The information (problem) area, where the tool normalizes given values. We use the domain of job offers as an experimental information space.

Repository

Repository is an ontological database that holds all information about a specific domain.

A.1.2      Method Description

The method for values normalization serves for the conversion of the ontological properties, which are expressed in different units and which should be comparable to each other. The normalization method is declaratively defined and is flexible for changes, so the proposed requirements should be fulfilled.

Any data stored in a database, and especially in an ontology, can be viewed as logic statements about entities described by the data. Inference of new statements from existing (known) statements is the focus of logic programming (LP) approach which we choose as the platform for values normalization. Logic programming is a tool universal enough for the realization of values conversions.

The two LP approaches are used: Prolog and Answer Set Programming (ASP). For each approach, one prototype is created.

The method is based on following of groups of rules, which are used within both prototypes (Pázman, 2007):

§  rules for known conversion coefficients between atomic units,

§  rules for normalized units definition for each value type,

§  rules for direct conversion using coefficients for atomic units,

§  rules for indirect conversion between atomic units using more coefficients,

§  rules for conversion of compound units (composition is realized by two unit arithmetic operations – division and multiplication),

§  rules for the computation of the normalized value.

The normalization logic rules are similar but different in the prototypes. The expressive power is much stronger in Prolog than in ASP. Any ASP solver is basically a propositional logic solver. The tools, which allow introducing variables, function symbols etc. are called parsers, and they translate a logic program into a program appropriate for ASP solvers. The result of the translation is a grounded propositional program. In order to allow such translation, an original program must fulfill some specific constraints, and thus some rules valid in Prolog are invalid in ASP. The most visible difference is in the rules for conversion of compound units, which use function symbols. We used so called stratified (leveled) predicates to allow the parser to generate a ground program for rules with function symbols.

The method itself and the differences are more deeply described in (Pázman, 2007).

A.1.3      Scenarios of Use

In general, the tool is used in one way only – as an offline batch normalization procedure. In accordance to its performance, it is not appropriate for online values normalization. Because of the two variants of LP used, it has to be chosen which of them should be used in advance.

For extending the tool to other reasoning tasks which require logic consistency check, the ASP prototype is more appropriate. For using the tool for normalization only, the Prolog prototype is to be preferred.

A.1.4      External Links and Publications

The project where the method was developed is described in:

Návrat, P. – Bieliková, M. – Rozinajová, V. (2005). Methods and Tools for Acquiring and Presenting Information and Knowledge in the Web. In: CompSysTech 2005, Rachev, B., Smrikarov, A. (Eds.), Varna, Bulgaria, June 2005. pp. IIIB.7.1–IIIB.7.6.

The detailed description of the normalization algorithm can be found in:

Pázman, R. (2007). Values Normalization with Logic Programming. In: Tools for Acquisition, Organisation and Presenting of Information and Knowledge (2), Návrat, P., Bartoš, P., Bieliková, M., Hluchý, L., Vojtáš, P. (Eds.), Proceedings in Informatics and Information Technologies, Research Project Workshop, Horský hotel Poľana, Slovakia, September 22–23, 2007, pp. 134–141.

A.2          Integration Manual

Valinor is created in Java (Standard edition 5.0) using SWI-Prolog or SMODELS ASP solver and is distributed in form of two JAR files and various configuration and data files. The tool consists of two layers: domain independent ValinorTool and domain specific ValinorWrapper. The tool depends on some other tools and libraries.

A.2.1      Dependencies

Valinor depends on the ITG-technology and OntoCM libraries. They have to be correctly installed and configured for the successful usage of Valinor. With regard to the approach used, either SMODELS or SWI-Prolog should be also installed and configured.

A.2.2      Installation

The installation procedure of Valinor into the environment of a web application is the following:

1.    Copy the files Nazou-Valinor*.jar into the directory WEB-INF/lib of the web application.

2.    Copy the files Valinor*.properties into the configuration directory (more about the directory can be found in the documentation of ITG-technology).

3.    Install the ASP solver SMODELS (together with the LParse parser) or install the SWI-Prolog – depending on the approach used.

4.    Create a Valinor's working directory.

5.    Copy the files *.p and Valinor*.sh into the working directory.

6.    Edit the specific settings dependent on the domain or the deployment environment in the files Valinor*.properties – including the path to the working directory. According to a demand, the common configuration file (Nazou-Commons.properties) can be changed based on its exemplar in the distribution.

7.    Set the launching of the tool on the application server using e.g. the Valinor*-run.sh file, which should be also adjusted to the particular application environment.

8.    Add more conversion coefficients into the files coefficients-*.p according to specific demands.

9.    Deepen the maximal level of the stratified (leveled) predicates for the conversion of compound units in the file normalization-asp.p.

A.2.3      Configuration

The tool is configured using its configuration file. Dependent on the approach used, the configuration parameters are stored either in the ValinorASP.properties or in the ValinorProlog.properties file.

They both include the configuration of the OntoCM library specifying the ontology connection parameters. These can include e.g. SESAME_SERVER, SESAME_USER, SESAME_PASSWORD, SESAME_REPOSITORY_ID and SESAME_INFERENCING for the Sesame implementation of OntoCM. For a detailed description of OntoCM's configuration parameters, see the documentation of the OntoCM library.

The parameters common to both Valinor approaches are as follows:

§  WORKING_DIRECTORY – specifies Valinor's working directory;

§  EXTRACTION_FILE_NAME – specifies the name of the file containing the ontology properties to normalize, e.g. properties-asp.p;

§  RATES_FILE_NAME – specifies the name of the file containing the actual exchange rates, e.g. exchangerates-asp.p;

§  VALINOR_PROGRAMS – contains the list of logic program file names (divided by ';') which should be used for computation of the values normalization.

Beside common Valinor parameters there are specific parameters for the ASP approach. Its specific parameters are as follows:

§  LPARSE_PATH – specifies the path to LPARSE executable file;

§  SMODELS_PATH – specifies the path to SMODELS executable file.

A.2.4      Integration Guide

The tool is usually used as a standalone tool, not directly (functionally) integrated with other tools. Typical usage is to regularly run one of its two main routines, which automatically recognize ontology properties which should be normalized, computes normal values and writes them back into the ontology. In the domain of job offers, it computes normalized salaries for all job offers which were added since the last run of the tool.

The two main routines correspond to the two approaches used. For the ASP approach, the main method of JobWrapperASP class should be called. For the Prolog approach, the main method of JobWrapperProlog class should be called. They have the same parameter: the name of the *.properties file which will be used to configure the tool. If no value is supplemented, the default properties file name will be used (ValinorASP.properties or ValinorProlog.properties).

Typical usage in e.g. ASP approach is the following (Linux shell script):

APP_LIB=/home/share/apache-tomcat-5.5.20/webapps/cocoon/WEB-INF/lib

CP=$APP_LIB/../classes
CP=$CP:$APP_LIB/Nazou-ValinorTool-1.0.jar
CP=$CP:$APP_LIB/Nazou-ValinorWrapper-1.0.jar
CP=$CP:$APP_LIB/Nazou-OntoCM-2.3.jar
CP=$CP:$APP_LIB/Nazou-ITG-2.3.jar
CP=$CP:$APP_LIB/sesame.jar
CP=$CP:$APP_LIB/rio.jar
CP=$CP:$APP_LIB/openrdf-model.jar
CP=$CP:$APP_LIB/openrdf-util.jar
CP=$CP:$APP_LIB/log4j-1.2.13.jar

$JAVA_HOME/bin/java -cp $CP sk.softec.nazou.valinor.wrapper.jobs.JobWrapperASP

The APP_LIB variable is the path to the web application where Valinor and other tools are installed and should be accommodated to the environment. The versions in the names of the libraries should be also changed according to the current environment.

A.3          Development Manual

A.3.1      Tool Structure

The tool is separated into two parts – domain independent ValinorTool providing core normalization computations, and domain specific ValinorWrapper allowing ValinorTool to compute values normalization in the domain of job offers.

ValinorTool component consists of the following packages:

§  sk.softec.nazou.valinor – common Valinor's functionality, it contains support for Valinor's runner (main Valinor routine), input data extraction from ontology and structure for description of quantities for normalization.

§  sk.softec.nazou.valinor.asp – support for ASP prototype of Valinor, it contains the ASP runner and classes for special processing of ASP programs and for processing the resulting converted values.

§  sk.softec.nazou.valinor.prolog – support for Prolog prototype of Valinor, it contains the Prolog runner only.

ValinorWrapper component consists of the following packages:

§  sk.softec.nazou.valinor.wrapper.jobs – support for Job offers domain, it contains job quantity descriptions (description of properties which should be normalized) and domain specific runners for both Prolog and ASP. It also contains support for converting from and to the AED currency.

§  sk.softec.nazou.valinor.wrapper.nbs – contains processing of NBS (National Bank of Slovakia – www.nbs.sk) exchange rates which are used to convert currencies during normalization.

A.3.2      Method Implementation

For supporting both Valinor approaches (ASP and Prolog) we created two prototypes. The ASP prototype uses ASP solver SMODELS and its affiliated parser LPARSE, open–source tools for computing answer sets of logic programs. We choose the solver and its parser because of their language extensions, namely for support of function symbols the tool uses for compound units. The Prolog prototype uses SWI-Prolog as the logic programming engine. SWI-Prolog is an open–source Prolog system widely used in research and education areas but it can be found in commercial applications as well.

The tool works — independently to LP approach — in the following steps:

1.    The tool downloads actual exchange rates from the web site of NBS and prepares a logic program with the conversion coefficients for currency units.

2.    The tool reads those properties of job offers (salaries) from the ontology repository, which ought to be normalized and prepares a logic program with them.

3.    The programs prepared in the previous steps are joined with the universal conversion and normalization program. The resulting program is sent to the LP engine (either SWI-Prolog or LPARSE with SMODELS).

4.    The results of the computation are the expected normalized values, which are written into the ontology repository afterwards.

The tool is implemented using the main classes described in this section below. In order to get more detailed description of these or other classes see the Javadoc documentation of the tool Valinor.

Classes in the package sk.softec.nazou.valinor:

§  InputDataExtraction – support for obtaining properties from an ontology.

§  QuantityDescription – structure for storing properties of quantity which should be normalized (e.g. salary).

§  Runner – abstract class for Valinor's runner – it controls the whole process of normalization.

Main classes in the package sk.softec.nazou.valinor.asp:

§  ResultDataProcessing – processes result data from normalization performed by ASP.

§  RunnerASP – concretization of the Runner class, it covers all the process of values normalization of the ASP prototype.

Class in the package sk.softec.nazou.valinor.prolog:

§  RunnerProlog – concretization of the Runner class, it covers all of the process of values normalization of the Prolog prototype.

Classes in the package sk.softec.nazou.valinor.wrapper.jobs:

§  JobQuantityDescriptions – creates quantity descriptions for job offers domain (instances of the QuantityDescription class).

§  JobWrapperASP – main runner for the ASP prototype in the job offer domain, it contains the main method.

§  JobWrapperProlog – main runner for the Prolog prototype in the job offer domain, it contains the main method.

§  OANDACurrencyRater – converts from and to the AED currency. All other currencies are processed by the classes from the following package.

Main classes in the package sk.softec.nazou.valinor.wrapper.nbs:

§  NBSStore – retrieves actual exchange rates from the NBS web site.

§  XMLMiner – retrieves data from an XML source (used for parsing the XML file with actual exchange rates).

A.3.3      Enhancements and Optimizing

The tool is well suited for the conversion of any values for which conversions of its atomic units are known. The principles of the method used could also be used in other automatic data conversion scenarios (not only values normalization).

Both prototypes provide normalized ontology properties. They also satisfy the requirements from the first section. The distinctions between them in the terms of their appropriate usage are described in more detail in (Pázman, 2007). As a conclusion, for the reason of values normalization, the Prolog approach is more suitable.

The prototypes are implemented so they work with the test data set. In order to use them with richer data, some coefficients should be added among either currency or time period atomic units. Also for the conversion of more complex units in the ASP prototype, the depth of the stratified (leveled) predicates (e.g. simpleCoefficient or exchangeCoefficient) has to be increased, which means addition of one or more rules for each stratified predicate.

Neither ASP, nor Prolog prototypes are sufficiently optimized and we see a space for some improvements here; we did not paid enough attention to the performance of the prototypes yet.

A.4          Manual for Adaptation to Other Domains

A.4.1      Configuring to Other Domain

As was already mentioned, the tool consists of two components – domain independent ValinorTool and domain specific ValinorWrapper. ValinorTool need not be changed when moving to another domain. ValinorWrapper should be completely revised, which mainly means:

§  to re-implement the top level control of the normalization (classes JobWrapperASP and JobWrapperProlog),

§  to change definition of the properties which should be normalized (class JobQuantityDescriptions and logic program domain.p),

§  to specify known and static conversion coefficients between atomic units (logic programs coefficients-asp.p and coefficients-prolog.p),

§  to implement gaining of any other conversion coefficients (similar to classes in the package sk.softec.nazou.valinor.wrapper.nbs or to the class OANDACurrencyRater).

A.4.2      Dependencies

The tool has no dependencies to other tools relating to adaptation to other domains.