Tool for converting documents to the plain text format.

Institution: Slovak academy of sciences
Technologies used: Java and 3th party doc_format2txt converters
Inputs: Document in non-plain-text format / directory with documents in various formats
Outputs: Converted documents in plain text
Documentation: HTML, doc, JavaDoc

Addressed Problems

Within the scope of NAZOU project, documents acquired from public sources must be converted to plain text format for further data processing. DocConverter tool address this issue.


DocConverter is a simple tool that wraps multiple 3th party converters from various documents format to the plain text into a single software tool. User specify the document format extension and related conversion tool in the DocConverter's configuration file. DocCoverter is than capable to convert batch of input files (stored in specified input directory) to the plain text format for further data processing and information/knowledge extraction.