Induction of Generalized Annotated Programs

Rule induction for monotone graded classification of job offers to classes of appropriateness

Institution: Pavol Jozef Safarik University
Technologies used: Java, MySQL, SWI-Prolog
Inputs: Information about job offers and their attributes in MySQL database
Outputs: Rules of classification of job offers in MySQL database
Distribution packages: zip

Addressed Problems

The appropriateness of job offer is a subjective conception and is relevant to users, e.g. one user prefers the given job offer while it is not appropriate for others. The users have their own classification of job offers to several classes. The main attribute of this classification is the monotonicity, i.e. an offer of higher relevance must fulfill requirements on offers of lower relevance. This monotone graded classification can be represented by Generalized Annotated Program (GAP) rules - rules in Prolog notation what can contain graded literals. The Prolog notation increases the expressive power and the graded literals guarantee the monotonicity. Since our domain contains both numeric and non-numeric attributes (salary, place, etc.), the induction of GAP rules seems to be a convenient solution for the monotone graded classification of job offers.

Description

The method is based on Inductive Logic Programming (ILP) system ALEPH, what induces Prolog rules (hypothesis H) from true and false Prolog facts (examples E+,E-) using a Prolog program (background knowledge B). Against ILP, in our case the examples and the background knowledge facts are not just true or false but are evaluated with values from the unit interval [0,1]. The main idea of the method is to divide the monotone graded classification task to several ILP tasks in following way:

The IGAP principle

The schema of the main principle of IGAP

Illustration

As an illustrative example we use the following table of hotels, where objects of interests are hotels with attributes name, distance from the city centre, price of rooms per night per person and equipments of rooms. User evaluates objects by three grades, i.e. excellent, good and poor according to approppriateness of the hotel to him/her interests.


Hotel nameDistancePriceEquipmentUser's evaluation
Apple10099-poor
Danube1300120tvgood
Cherry50099internetgood
Iris110035internet, tvexcellent
Lemon500149-poor
Linden12060internet, tvexcellent
Oak500149internet, tvgood
Pear50099tvgood
Poplar10099internet, tvgood
Rhine50099nothingpoor
Rose50099internet, tvexcellent
Spruce30040internetgood
Themse100149internet, tvpoor
Tulip80045internet, tvexcellent

Our data from the table above corresponds to real situations: user rates just few objects which have mixed types of attributes (nominal, ordinal). Thus our task is to find GAP rules from the small, heterogeneous dataset. The following rules where computed by IGAP for the above introduced data:


Finally, we mention that IGAP is a domain-independent method, input (data and attributes) are obtained from other tools to which the output (computed GAP rules) are returned back.


References

  1. T. Horváth, P. Vojtáš: Induction of Fuzzy and Annotated Logic Programs. In: S. Muggleton, R. Otero, and A. Tamaddoni-Nezhad (Eds.): ILP 2006, LNAI 4455, pp. 260–274, 2007, Springer-Verlag Berlin Heidelberg 2007.
  2. T. Horváth. P. Vojtáš: Ordinal Classification with Monotonicity Constraints. In: Proceedings of the 6th Industrial Conference on Data Mining (ICDM '06), Leipzig, Germany, 2006: LNAI 4065, Springer, 2006, ISBN 3-540-36036-0, p: 217-225.
  3. T. Horváth, P. Vojtáš: Fuzzy induction via generalized annotated programs. In: 8th International Conference on Computational Intelligence (Fuzzy Days, Dortmund '04), Dortmund, Germany, 2004: Springer, Advances in Soft Computing Series, 2005, ISBN 3-540-22807-1, p:419-433.
  4. T. Horváth, S. Krajči, R. Lencses, P. Vojtáš: An ILP model for a monotone graded classification problem. In: J. KYBERNETIKA 40 (2004), No. 3, AV ČR, Czech Republic, ISSN 0023-5954. 2004. 317–332.
  5. T. Horváth, F. Sudzina, P. Vojtáš: Mining rules from monotone classification measuring impact of information systems on business competitiveness. In: 6th International Conference on Information Technology for Balanced Automation Systems (BASYS '04), Wien, Austria, 2004: Springer, Dortmund, Germany, ISBN 0-387-22828-4. 2004. 451-458.