Method for Ontology-Based Text Annotation (tool OnTeA)

Method finds or creates semantic metadata according to domain ontology from text.

Institution: Institute of Informatics
Technologies used: RegEx, Java, Jena, Sesame
Inputs: HTML or text document, domain ontology
Outputs: Ontology individual of defined type representing input text
Documentation: HTML, doc, JavaDoc
Distribution packages: zip
Video: demonstration video

Addressed Problems

When documents (HTML, text) are processed by computer system it needs to understand document structure. Web documents are structured but its structure is understandable mainly for humans. This problem is basic problem of the Semantic Web. The OnTeA method tries to create structured semantic metadata out of such documents according to the application domain ontology model. Thus OnTeA does not create new ontology, but tries to map documents with its equivalent in defined application ontology.


OnTeA analyze document or text using a regular expression patterns and detects equivalent semantics elements according to defined domain ontology. Several cross application patterns are defined but to achieve good results new patterns need to be defined for each application. OnTeA also creates new ontology individual of defined class and assignees detected ontology elements/individuals as properties of defined ontology class. Thus ontology instance of job offer is created out of its text representation in NAZOU pilot application.

Ontea Architecture


  1. Ontea at
  2. Ontea Poster
