A              CriteriaSearch – Criteria-Based Ontology Search

A.1          Basic Information

The CriteriaSearch tool provides methods for a user that allows him to search for information in an information domain by specifying criteria that suit his expectations. The information being searched is stored in an ontological repository. The user specifies these criteria in an input form, which is being generated dynamically according to the definition of criteria (domain).

Methods for translating user’s criteria to a language that the ontological repository works with are being used during the search process. The result of a search is a list of entities that satisfy the input criteria. Each result is being rated during the search process. This rating determines the degree of conformity between the input criteria and the attributes of the resulting output.

Even though the tool is domain independent in principle, a real-world application domain has been specified for testing purposes – the domain of job offers. The input criteria in this case are e.g. salary, job position, job location, level of education etc. The user specifies values for some (or all) of these criteria such as:

§  Salary: at least 50000 U.S. dollars / year.

§  Job position: Software Engineer.

§  Job location: New York or Toronto.

§  Level of Education: Master degree.

The user can specify multiple values for a criterion (see job location in the example above).

The result of a search in this domain is a list of specific job offers sorted by rating i.e., the degree of how a concrete job offer suits the user’s expectations.

The user can also search for offers that are similar to one specific offer already found. In this case, the input criteria are being extracted from the specific offer and used as a basis for a new search.

A.1.1      Basic Terms

Domain

The information area, where the tool searches entities of given type. We use the domain of job offers as an experimental information space.

Repository

Repository is an ontological database that holds all information about a specific domain.

Criterion

This term means an input variable – a property of searched objects, e.g. salary. Its value (or values) is being used to search for results.

Offer

This term represents an entity being searched, e.g. a job offer in the domain of job offers. Every search returns a set (0-to-n) of offers.

Rating

A number that stands for the degree of how a result entity (offer) suits the value of one input criterion or input criteria as a whole.

Distance

This term specifies how distant (different) two values are. It’s inversely related to the rating.

Criterion modifiers

Each criterion has a set of modifiers, which determines the weight/importance and other parameters of the criterion in the overall rating.

A.1.2      Method Description

The method searches for offers in the ontological repository on the basis of criteria input by a user. The user specifies criteria by filling out an input form. Each criterion has its own input form. The current method implementation supports 2 kinds of criteria according to their input method: one with direct value input (e.g., numbers, strings and dates) and hierarchical one. Hierarchical criteria allow the user to browse hierarchy and choose value which is closest to the user’s preferences. A big advantage of ontological databases we used in this method implementation is the ease of defining and querying of hierarchies.

The method searches not only for the offers directly complying with the search criteria, but also for the ones which do not fulfill them exactly. E.g. if the user is interested in job offers in Chicago with salary at least $50 000, the search method can offer him a job offer in a different place in the same state with salary $45 000, but with a lower rating compared to a perfectly complying job offer.

The user can express not only the expected values of ideal offer properties, but he can also influence the search results by three types of criteria modifiers. He can define importance, precision and obligation of a criterion. In addition, each criterion can have more searched values and each criterion allows to search for all input values (logical conjunction) or one of them (logical disjunction).

The method also cooperates with the methods for semantic logging (SemanticLog) and for analyzing semantic log (LogAnalyzer) in order to adjust to users’ preferences. The user’s selected values of all three modifiers are being logged by SemanticLog and subsequently analyzed and transformed by LogAnalyzer into the user ontology. The values for all following searches by the user are being evaluated according to the user properties stored in the user ontology.

The search method works in the following steps. More detailed description of the search and rating mechanism can be found in (Pázman, 2006).

1.    The user inputs search criteria and their modifiers. The search query is created from the obligatory criteria.

2.    The search query is used to select all offers complying with it. The found offers form the set the method returns as a search result.

3.    Each found offer is rated according to the following steps:

a.    The rating of each input value of a criterion is computed in relation to each value of the found offer’s respective property. The computation is based on finding the distance between the two values and converting the distance to a rating.

The distance for numerical criteria (involves also dates) is computed as a weighted difference between the search value and the offer’s value. The distance for text criteria is determined by the number of words in the query which are not contained in the offer’s value.

The main focus of evaluation of the method concentrates in hierarchical criteria. The distance in a hierarchy of 2 values is based on the shortest path length between them. The edges in the path do not have the same distance; the edge’s distances increase when moving to the top of the hierarchy and the distance from a parent to its child is significantly smaller than the opposite one.

Afterwards the rating is computed from the distance as its complement to 100% adjusted with the precision in such a way, that the high values of precision penalizes already small differences between desired value and the offer’s value.

b.    The rating of each search value of a criterion is determined as a maximum of the ratings of all offer’s respective property values.

c.    The rating of a single search criterion for an offer is determined as a weighted average between the arithmetic average of the criterion values’ ratings and the least (for logical conjunction), respectively greatest (for logical disjunction) from these ratings.

d.    The overall rating for an offer is computed as the weighted average of the criteria’ ratings, where the weights are taken from importance modifiers of the criteria.

4.    The last step is sorting of the found offers according to their ratings.

The search and rating method is also used for finding similar offers to a given offer. A search query is constructed from the offer in such a way, that each value of its properties is used as a search value for the respective criterion. When this query is used to search for offers, the result represents the similarity search for the offer.

A.1.3      Scenarios of Use

The tool can be used in two ways:

§  It can be used as a standalone search application for searching offers stored in an ontology. There are two alternatives how to present the search result – using the tool’s own presentation, or displaying the result in the faceted semantic browser (Factic).

§  It can be used as a library for searching and rating, including similarity search. Input to the library are search criteria and the output are rated offers fulfilling (at least partially) the given search criteria.

A.1.4      External Links and Publications

The project where the method was developed is described in:

Návrat, P. – Bieliková, M. – Rozinajová, V. (2005). Methods and Tools for Acquiring and Presenting Information and Knowledge in the Web. In: CompSysTech 2005, Rachev, B., Smrikarov, A. (Eds.), Varna, Bulgaria, June 2005. pp. IIIB.7.1–IIIB.7.6.

The detailed description of the search and rating mechanism of the tool can be found in:

Pázman, R. (2006). Ontology Search with User Preferences. In: Tools for Acquisition, Organisation and Presenting of Information and Knowledge, Návrat, P., Bartoš, P., Bieliková, M., Hluchý, L., Vojtáš, P. (Eds.), Proceedings in Informatics and Information Technologies, Research Project Workshop, Bystrá dolina, Nízke Tatry, Slovakia, September 29–30, 2006, pp. 139–147.

A.2          Integration Manual

CriteriaSearch is created in Java (Standard edition 5.0) and is distributed as a set of three JAR files and diverse configuration and data files. The tool depends on some other tools and libraries (dependencies are discussed in detail in the next section). This project has two types of output – a (web-based) presentation of the results and a data structure containing these results. Methods of this tool are being accessed through a set of interfaces (mainly IDisplaySearchResults).

A.2.1      Dependencies

The tool CriteriaSearch directly uses following tools or libraries:

§  the relational interface to the corporate memory – CorporateMemory,

§  the ontological interface to the corporate memory – OntoCM,

§  the integration technology – ITG-technology,

§  the tool for semantic faceted browsing – Factic (CriteriaSearch implements some of its Java interfaces),

§  the tool for semantic logging of user events – SemanticLog,

§  the tool for analysis of the logged events – LogAnalyzer.

It also requires Apache Cocoon, MySQL and Sesame to run on a server.

In addition, it depends on Sesame ontology repository filled with offers, their properties and hierarchies for hierarchy properties.

A.2.2      Installation

The following steps have to be done to be able to use CriteriaSearch:

1.    Apache Tomcat (or some other servlet container), Apache Cocoon, MySQL and Sesame have to be correctly installed and configured. MySQL is a must when using some of the advanced indexing methods discussed later in this documentation.

2.    Install all necessary libraries (see section above) to Cocoon which means mainly copy their jar files to WEB-INF/lib directory and configure the libraries. For the detailed installation instructions see their documentation.

3.    Copy CriteriaSearch*.jar from the tool distribution to Cocoon‘s WEB-INF/lib directory.

4.    Copy CriteriaSearch.xml and CS*.sh from the tool distribution into build/webapp/nazou/tools/CriteriaSearch/ located in Cocoon’s root directory.

5.    Copy CriteriaSearch.properties and CSCriteriaDescription_*.xml from the tool distribution into build/webapp/nazou/tools/config/ located in Cocoon’s root directory.

6.    Set the correct values and attributes in CriteriaSearch.properties and in Nazou-Commons.properties in build/webapp/nazou/tools/config/ located in Cocoon’s root directory. The most of the values specify the location, port, username and password of the database used but there are also language settings and other properties as well. For detailed information see the section Configuration.

7.    Set up the recurring creation of indexes and other off-line data by calling the procedures specified in CSCron.sh from build/webapp/nazou/tools/CriteriaSearch/ located in Cocoon’s root directory. Edit the script CSMakeIndexes.sh and adjust the environment related settings in it.

8.    Define a coplet in portal.xml located in portal\profiles\copletdata, e.g:

<coplet-data id="CS" name="standard">
    <title>Criteria Search</title>
    <coplet-base-data>CachingURICoplet</coplet-base-data>
    <attribute>
           <name>buffer</name>
           <value xsi:type="java:java.lang.Boolean">false</value>
    </attribute>
    <attribute>
           <name>cache-enabled</name>
           <value xsi:type="java:java.lang.Boolean">false</value>
    </attribute>
    <attribute>
           <name>cache-global</name>
           <value xsi:type="java:java.lang.Boolean">false</value>
    </attribute>
    <attribute>
           <name>handleParameters</name>
           <value xsi:type="java:java.lang.Boolean">true</value>
    </attribute>
    <attribute>
           <name>uri</name>
           <value xsi:type="java:java.lang.String">
                 cocoon:/coplets/html/application
           </value>
    </attribute>
    <attribute>
           <name>temporary:application-uri</name>
           <value xsi:type="java:java.lang.String">
                 cocoon://nazou/tools/CriteriaSearch/CriteriaSearch.xml
           </value>
    </attribute>
</coplet-data>

9.    Add a (one or more) coplets to the webpage by specifying them in portal-user-anonymous.xml located in portal\profiles\layout, e.g:

<named-item name="Criteria Search">
    <parameter name="height" value="100"/>
    <coplet-layout name="coplet" layout-renderer-name="nowindow">
           <coplet-instance-data>CS</coplet-instance-data>
    </coplet-layout>
</named-item>

10.        The last thing to do is to add an entry about each coplet into portal-user-anonymous.xml in portal\profiles\copletinstancedata, e.g:

<coplet-instance-data id="CS-1" name="standard">
    <coplet-data>CS</coplet-data>
</coplet-instance-data>

Now, it is possible to access the web application using any standard browser entering the correct URL (e.g. http://localhost:8080/cocoon/samples/blocks/portal/).

A.2.3      Configuration

Configuration files (Nazou-Commons.properties and CriteriaSearch.properties) need to be changed in order to make CriteriaSearch work correctly. The first file mentioned specifies default properties (usable when using more tools on the same server). The second one specifies special properties for CriteriaSearch. If these two files contain the same key, then the value (corresponding to the key) from the first file will be overridden by the value from the second file. For more information on configuring tools see the documentation of ITG-technology.

The language hierarchy can be set by changing the value of the LANGUAGE_HIERARCHY key (value SK;EN means that Slovak labels will be used. If there are no Slovak labels, English labels will be used instead). Nowadays, only English and Slovak are being supported.

The distance-computing method is specified by the key CRITERIASEARCH_TYPE. The value is always a full java name of a class which extends the HierarchyDistance class.

The key named CRITERIA_DESCRIPTION_FILE specifies the name of the file which contains the criteria description for the specific domain.

The criteria description is included in the distribution of the tool – the files named CSCriteriaDescription_noninfer.xml and CSCriteriaDescription_infer.xml define criteria of the job offer domain, one for a repository with RDFS inference, the other one without it.

Another key named OFFER_RDF_TYPE specifies the type of the concrete offer.

The FACTIC_MODE property specifies whether to display the search results using CriteriaSearch or to redirect to the Factic tool and send the results using session parameters.

The SEMANTIC_LOG_IMPLEMENTATION holds the name of the class which implements the ISemanticLog interface as its value. This class is being used to log the user’s preferences concerning the modifiers.

Other properties specified in the tool’s configuration file go from the OntoCM library configuration. These properties include the following property keys: FACTORY_NAME, SESAME_SERVER, SESAME_USER, SESAME_PASSWORD, SESAME_REPOSITORY_ID, SESAME_INFERENCING, NAMESPACES_BASE, NS_SHORTCUT_* and NS_URI_*. For the more detailed information about these properties, read the OntoCM’s documentation.

Indexing component (CriteriaSearchIndexes, see below) needs also a connection to ontology using CorporateMemory library, which has its own separated configuration files. For the library configuration see its documentation.

A.2.4      Integration Guide

The tool CriteriaSearch can either be used as a web application that presents a user with an interface or accessed through the IDisplaySearchResults interface to only access the tool’s functionality.

These two different types of output produced by the CriteriaSearch tool are being discussed in this section.

The first one is the visual presentation. There are three different kinds of pages that are being generated using four main procedures. The first page is a search page. The user is able to select values for diverse criteria here (see e.g. Figure 2). He can select multiple values and change modifiers for each criterion. Once finished with the selection, he can start the search by hitting the “Start Search” link. The user is presented with the results table afterwards (see Figure 4). He can change the appearance by choosing only the first 15 results to be shown or to let all of them to be visualized. The user has to select one of the offers to get to the next page. On this page he is being presented with detailed information (in a table) about the selected offer (see Figure 5). At this point the user can either start a new search (this option is also possible on any other page) or search for similar offers. All of these procedures are being discussed in detail later in this document.

The tool’s second output returns only plain data. The classes which are responsible for producing data output implement interfaces specified in the Factic JAR archive named Nazou-Factic-Integration.jar (see the Factic documentation). This JAR file specifies three interfaces and an implementation of one of them:

§  IDisplaySearchResults

This is the main interface of this package. It declares four methods but only two of them are relevant for CriteriaSearch – findSimilar() and getSearchResults() – and are implemented in the class CSDisplaySearchResults. The first method searches for similar offers for a specific offer and returns them as a list of ISearchResult objects (these are instances of the SearchResultImpl class). The second method provides a list of search result instances for presentation and is used for passive presentation of search results in the Factic GUI.

§  IRestrictedAttribute

This interface is irrelevant for the use the CriteriaSearch. There is no implementation of this class in this project.

§  ISearchResult

This interface specifies one result returned from a search process. It declares two methods – getRating() and getURI(). The first of them returns the rating of the found job offer. The second one returns a unique identifier of a job offer.

§  SearchResultImpl

This class implements the two methods specified by ISearchResult.

The following code demonstrates how to use this interface:

IDisplaySearchResults displaySearchResult = new CSDisplaySearchResults(CONFIG_FILE);

IOntoMemory memory = OntoCreator.getMemoryFactory(CONFIG_FILE).getMemoryInstance();

List<ISearchResult> result = displaySearchResult.findSimilar(null, "jo:JobOffer", 5, memory.getFullURI( "join:S006_cierny_01054");

A.3          Development Manual

A.3.1      Tool Structure

The tool CriteriaSearch consists of three parts / components – CriteriaSearchTool, CriteriaSearchWrapper and CriteriaSearchIndexes. The first one is the core of the whole tool. It defines interfaces and classes, which are independent of the application domain (e.g. job offers). Wrapper consists of classes which implement the interfaces specified in Tool. These classes are domain specific. The third part is responsible for speeding up the search procedures by using indexes stored in a MySQL database.

Figure 1. CriteriaSearch’s components dependencies. There are also packages visible for each component.

Tool component consists of the following packages:

§  sk.softec.nazou.criteria.tool – provides classes and interfaces for searching and visualizing criteria and results.

§  sk.softec.nazou.criteria.tool.criteria – holds classes which specify domain-independent search criteria.

§  sk.softec.nazou.criteria.tool.search – specifies the default distance calculation method and auxiliary methods used by searching and rating.

Wrapper component consists of the following package:

§  sk.softec.nazou.criteria.wrapper – provides wrapper for the Tool component (supplies environment- and domain-specific settings to the other components) and other auxiliary classes.

Indexes component consists of the following packages:

§  sk.softec.nazou.criteria.indexes – holds classes, which are used for creating indexes on the ontological database.

§  sk.softec.nazou.cs.indexes.matrix – implements methods that are responsible for creating and using of the matrix indexes.

§  sk.softec.nazou.cs.indexes.string – implements methods that are responsible for creating and using of the string indexes.

A.3.2      Method implementation

The main Wrapper class

The central control class of the tool is CSWrapper from the sk.softec.nazou.cs.wrapper package. There are four public methods that are responsible for all procedures used from outside of the tool – from selecting results from the ontological repository to the visualization of the data. All these methods take an instance of a NazouContext object (defined in ITG-technology library) as an input parameter which holds session parameters in the Cocoon environment.

These methods are:

§  public String getSearchPage( NazouContext ctx ) – This method presents the user with a web-based graphical user interface where he can choose search criteria.

§  public String getSearchResults( NazouContext ctx ) – This method starts a search with search criteria as input and visualizes the search results.

§  public String getSimilarResults( NazouContext ctx ) – This method starts a search and visualizes the search results just like the method above but creates the search criteria from an existing job offer.

§  public String getDetailResult( NazouContext ctx ) – This method loads and visualizes details of one selected job offer.

Exact procedures invoked from these methods are discussed in this section.

The method getSearchPage() reads the already selected criteria values at first. The getSearchPage method of the class CriteriaInput is being called afterwards. It takes six input parameters – the domain specific information about criteria, the selected criteria values, the user instance (to provide user specific modifiers), the name of the actual class and two strings that describe the methods that can be called from the resulting screen. The first one holds the name of the actual method which is being called if a value for a criterion is being selected i.e. after selecting values (successively) the page remains the same, only the set of data (selected criteria values) changes. The page is being displayed (see Figure 2) in loops presenting new data until the search is being started.

Figure 2. Method getSearchPage’s output (no criterion value selected).

You can see how the user interface changes if a value is selected (Figure 3). As mentioned before, the page remains the same. Only the data presented changes.

Figure 3. Method getSearchPage‘s output (value selected).

The getSearchPage method of the CriteriaInput class gathers data about the criteria and concatenates this data into one IPage object. Objects of this class are abstract representations of a user interface and are tightly coupled with a class implementing the IInterfaceBuilder – interface which produces IPage objects.

Another important method of the class CriteriaInput is getActualCriterion. It’s responsible for creating the visual output of criterion values (as shown in Figure 2 and Figure 3 as red rectangles). At first, it determines the input method (data field, hierarchy browser) for a specific criterion and than presents the criterion values.

Finally, the getSearchCriteriaPage method of IInterfaceBuilder is being called. This method wraps the previously generated output with standard visualization elements like header and footer.

The getSearchResults() method starts by extracting the search criteria from the session parameters and converting them. A query is being generated from these criteria and its results are being stored in a list of ResultFound objects. These objects store the identifiers (URIs) and the ratings of a job offer. All of this happens in the getSearchResults method of the Search class.

At first, getSearchResults selects results matching the search criteria (the selectResults method of the SelectResultHelper class). Afterwards it calculates the rating for each criterion using the rateResultAccordingToCriteria method of the RateResultHelper class (the algorithm of rating calculation is described in the section Method Description). The criteria are being sorted by their rating.

Figure 4. Method getSearchResult‘s output.

The method getSearchResults of the ToolResults class selects some basic information about each offer returned by the procedures specified above and puts them (together with their rating) into a table (see Figure 4). This method uses the IInterfaceBuilder interface for output as well which is being wrapped again by the getResultsPage method of the interface.

The getDetailResult() method is responsible for displaying detail information about one job offer. First, it gathers data from the session parameters. Afterwards it calls the getDetailResultPage method of the ToolResults class.

The getOfferDetails() method of this class loads detailed information about the job offer (by calling the loadDetails method of the JobOffer class) and prints them into a table as shown in Figure 5.

Figure 5. Method getDetailResult‘s output.

Afterwards, this table is being wrapped (header, footer, etc) by using the getResultsPage of the IInterfaceBuilder.

The last method getSimilarResults() searches for job offers which are similar to one specific job offer. This method takes values of this job offer and converts them into search criteria. The getSearchCriteria method of the SimilarityResultHelper takes care of this. The method proceeds exactly like getSearchResults from this point on. The presentation of the results is exactly the same as well.

Providing functionality to the outside world

The class used for pure data retrieval (callable from other tools) is situated in the sk.softec.nazou.cs.wrapper package as well. This class named CSDisplaySearchResult is an implementation of one of the interfaces specified in the Nazou-Factic-Integration.jar (see also Integration Guide).

The class CSDisplaySearchResult extends the CSWrapper class and implements the interface IDisplaySearchResults specified in the package mentioned above. It has one constructor which does initialization and four methods inherited from the interface named previously. Only two of these methods are implemented since the functionality for the other two methods does not relates to CriteriaSearch.

Here is some information on those two methods that have been implemented:

§  public List<ISearchResult> findSimilar(String userURI, String typeURI, int limit, String instanceURI)

This method takes four input parameters and returns a list of results. The first parameter userURI identifies a user. It’s being used to load user specific preferences obtained from the analysis of user’s previous searches. The second input parameter specifies the type of the result entities i.e., in the actual test domain the entity http://nazou.fiit.stuba.sk/nazou/ontologies/v0.6.17/offer-job#JobOffer. The third parameter specifies the number of desired results (sets the limit for the length of the result list). The produced list of results is being cut off after this number of results. Therefore no performance improvement can be gained by setting this parameter to a small number. The last parameter specifies an URI of the offer the similar offers shall be found for.

§  public List<ISearchResult> getSearchResults()

This method takes no input parameters. It returns the list of the last search results that have been produced by the tool.

CriteriaSearch is currently one of three tools (together with JDBSearch and ConCom) in Pilot application of project NAZOU that implement this interface in order to be dynamically embedded into the Similarity component. This component uses the tools to find similar offers to one specific offer using different approaches implemented by them.

More on rating algorithm

The rating calculation for hierarchy criteria are based on setting weights for edges between nodes in ontological repository. Weights on edges in a graph of ontological repository are computed from these 3 parameters:

§  depth distance discount,

§  distance to parent,

§  distance to child.

These parameters are used in this relation:

Weight = (Depth Distance Discount)Depth in the tree * X

Where

§  X is either distance to parent or distance to child according to direction of edge.

§  Depth in the tree is the number of nodes between current node and the root of the tree.

The fourth parameter – direction turn surcharge – is added to the value of the distance if edges of both directions were used during the computation of the distance between two nodes. The direction stands for the direction of traversal – from a leaf to the root or from the root to a leaf.

Other important classes

Classes in the package sk.softec.nazou.cs.tool.search are:

§  HierarchyDistance

An abstract class that contains a method getDistance which returns a value of the distance between two nodes in a repository.

§  CriteriaSearchDistance

A subclass of HierarchyDistance, which contains the default implementation of HierarchyDistance. It obtains parents for the given nodes from an ontological repository, determines the least common ancestor and then computes the distance according to the specification. It does not use any indexing to improve performance (see MatrixDistance and StringDistance for algorithms with improved performance).

§  SimilarityResultHelper

Method getSearchCriteria creates a list of SearchCriterion objects from an already existing offer. It sends queries to the repository to extract values from one specific job offer and inserts them into a list. This list could be used as search criteria for another search for job offers.

§  SelectResultHelper

Method selectResults returns a list of results from a query, constructed by the getSelectQuery method. This method generates a query which selects offers fulfilling the mandatory search criteria.

§  RateResultHelper

The rateResultAccordingToCriteria method calculates rating for one result found. It sums the ratings for each search criteria obtained from the getRateForOneCriterion method and applies the importance modifier. To evaluate the rate for one criterion, rates are being computed for all searched values from one criterion and are summed according to the multiple values modifier. Rates for values are computed from distances, which are provided by these methods for three types of values (integer – getDistanceForTwoValuesNumber, string – getDistanceForManyValuesText, hierarchy – getDistanceForTwoValues). A method for converting distance to rating convertDistanceToRating takes also the criterion modifier precision into consideration.

Selected classes from the package sk.softec.nazou.cs.indexes and its subpackages are:

§  MatrixDistance

MatrixDistance is a subclass of HierarchyDistance, which uses SQL tables with stored distances between every two nodes, for each hierarchy criterion. The getDistance method simply returns the distance between two given nodes obtained from a SQL query.

§  StringDistance

StringDistance is a subclass of HierarchyDistance, which uses SQL tables with a path from each node to the root of a tree, for each criterion. Method getDistance obtains two paths as strings, the first one between the root and the source node and the second one between the root and the destination node. After determining the least common ancestor of these nodes, this method computes a distance according to specification and returns that value.

Both of these classes need the index tables to be already prepared and filled with data. There are two methods, which are called from Wrapper classes for each criterion to achieve this. These classes are:

§  MatrixIndexes

This class is used to generate SQL tables which are required for MatrixDistance. In the fill method of this class, ontological repository is recursively traversed by the depth-first-search and the retrieved information is stored in an instance of the HierarchyTree class. This class is an implementation of a tree data structure with weighted edges. Weights on these edges are computed during the construction of the tree according to the specification put on evaluation of distances in ontological repository. The obtained object is used for processing by the Matrix class, which computes a matrix of distances between each node of the given tree. This matrix is eventually saved in a SQL table.

§  StringIndexes

This class is used for generating SQL tables which are required in the StringDistance class. In the fill method of this class, ontological repository is recursively traversed by depth first search and the retrieved information about the parent of a node is used to construct string representation of path between processing node and root of tree. This representation is continuously saved in a SQL table. For proper working of the mandatory modifier, an addition of a new isOffspring predicate is needed to simulate inference. (Even in a repository which uses inference, there are some hierarchies which are linked with predicates that are not covered by the default inference).

§  OffspringBuilder

OffspringBuilder is using the same recursive algorithm to traverse the tree of criteria values in the ontological repository. The fill method of this class adds triplets with the isOffspring predicate into the repository.

The more detailed information about each class or interface is contained in the JavaDoc documentation.

A.3.3      Enhancements and Optimizing

Drawback of the indexing approach is that even after a single change in repository there is a need to recreate the tables containing indexing information. Some analysis exploring incremental modifying of these tables have been done, which shows that there would be some serious problems with triggering such modifying procedure. Relatively small time which is needed to create whole table suggest that it is sufficient for practical usage.

The performance evaluation has been accomplished. The default implementation of the distance for hierarchical criteria is approximately about 7 to 10 times slower than the implementations using indexes. The string index is almost equally effective as the matrix index but it uses much less database space. While searching using string indexes in an ontology with 5 000 offers and 10 properties (5 of them were hierarchical) we get search times about 9 seconds on a contemporary desktop computer with Sesame repository loaded into the memory.

A.4          Manual for Adaptation to Other Domains

A.4.1      Configuring to Other Domain

The Tool and Indexes components are domain independent, so they do not need any change when moving to other domain.

The Wrapper component is domain specific and it should be significantly changed for some other domain. That means mainly that the properties of environment and tool should be set up, the descriptions of search criteria in the form of XML documents should be created, and the visualization of search results should be re-implemented.

A.4.2      Dependencies

The tool has no dependencies to other tools relating to adaptation to other domains.