Method for Automated On-line Annotation (tool Pannda)

Extracts ontolgy individuals from text according to domain ontology.

Institution: Slovak University of Technology
Technologies used: Java, Sesame, SimMetrics, Apache Lucene, QTag, MySQL
Inputs: HTML or text document, domain ontologies
Outputs: Ontology individuals extracted from text with coordinates of each individual within text
Documentation: HTML, doc, JavaDoc

Addressed Problems

The task of recognition of the sense of shown data on web is mostly trivial for a person, but often very difficult, if not impossible, for a machine. That's why we aim to put semantic to present web, so data on web can be accessed and understood not only by a human, but also by a machine. Moreover, the Pannda tool uses recognized semantic data during web browsing to mark relevant parts of text which could be interesting for the reader.


The main task is to design a method to simplify navigation on a web page by enhancing the page content with useful annotations. This annotation process is based on known ontologies and user preferences. Annotations are added into the document on-line, whenever a page is accessed by web browser.

The task of annotation is done within five steps by utilizing of four different algorithms. We can split these steps into two different groups. In the first one, the aim is to find parts of processed text, which should represent (or match) concrete instances from given ontologies describing domain. We are recognizing instances according to:

The second group of our annotation steps aims to find parts of text which could be potentially instances of known concepts from given ontologies. We are recognizing concepts according to:


  1. Martin Adam (2007). An Approach to Automated On-line Annotation. In Proc of research project workshop, Tools for Acguisition, Organisation and Presenting of Information and Knowledge, P. Návrat et al. (Eds.), Polana, Slovakia.