Method for Relevant Internet Data Resource Identification (tool RIDAR)

RIDAR retrieves query relevant resources on the Internet (URL addresses) and stores them into a database.

Institution: Institute of Informatics
Technologies used: Java, Web Services (SOAP), MySQL,
Inputs: Query: Searched terms or expressions,
Sources: Search Engines to be used,
Target: where to store the search results (i.e. database)
Outputs: Information Resources (URL addresses and titles) retrieved as search results
Documentation: HTML, doc, JavaDoc

Addressed Problems

Information aquirying systems often require to identify primary internet resources. RIDAR allows to exploit existing search engines to retrieve links to relevant Internet resources based on users-supplied serch terms or more complicated search expressions. Details about identified resources (URL, title, etc.) are stored into databases.

Description

This method exploits the potential of existing search engines to identify relevant information resouces on the Internet based on users-supplied serch terms or more complicated search expressions. RIDAR can integrate any search engine which exposes a web service API. Currently the following search engines are supports and had beed integrated :

RIDAR provides generic interfaces which allow to integrate search engines as well as targets for storing search results (databases).

In order to acces API of any search engine, one must register to obtain an application ID (or licence key). Licence key must be used each time the API is accessed. Licence key for Google and application ID for Yahoo was obtain just for the purpose of the NAZOU project. Limitation of such registration is that the numbe of queries is limited for each licence key.

RIDAR also allows storing retrieved results into any target such as database or generic file. Currently MySQL target is implemented in RIDAR.

RIDAR Basic Architecture

RIDAR Basic Architecture

References

  1. Balogh Z.: RIDAR – RELEVANT INTERNET DATA RESOURCE IDENTIFICATION. In: Laclavik M. et al.: WIKT 2006 Proceedings, 1st Workshop on Intelligent and Knowledge-oriented Technologies, ISBN 978-80-969202-5-9, pp.122, 2007, Bratislava, Slovakia.
  2. HLUCHÝ L., ŠELENG M., ORAVEC V., BUDINSKÁ I., LACLAVÍK M., GATIAL E., BALOGH Z., CIGLAN M.: Data transition chain. In: HLUCHÝ, Ladislav. Tools for acquisition, organisation and presenting of information and knowledge - proceedings in informatics and information technologies. - Košice : Vydavateľstvo STU, Bratislava, 2007. ISBN 978-80-227-2716-7, part 2, P. 79-91.
  3. GATIAL E., BALOGH Z., HLUCHÝ L., VOJTEK P.: Identification and acquisition of domain dependent internet resources. In: HLUCHÝ, Ladislav. Tools for acquisition, organisation and presenting of information and knowledge - proceedings in informatics and information technologies. - Košice : Vydavateľstvo STU, Bratislava, 2007. ISBN 978-80-227-2716-7, part 2, P. 68-78.
  4. Gatial E., Balogh Z.: Identifying, Retrieving and Determining Relevance of Heterogenous Internet Resources. In: Tools for Acquisition, Organisation and Presenting of Information and Knowledge. P.Navrat et al. (Eds.), Vydavatelstvo STU, Bratislava, 2006, pp.15-21, ISBN 80-227-2468-8. Workshop 26-28 September, Nizke Tatry, Slovakia.