Σύγχρονες τεχνικές αυτόματης ταξινόμησης εγγράφων
Modern techniques of automatic document classification
View/ Open
Keywords
Ταξινόμηση εγγράφωνAbstract
This paper studies recent advances in the Document Classification domain through a predefined set of classes. There are millions of modern documents which belong to public and private entities. Unfortunately, these documents stay unexploited, because they exist only in physical form. Image Classification of such documents is a crucial step towards the exploitation of useful information. Moreover, such processing facilitates the data entry procedure, saves information about these documents in a time-enduring digital form and allows the optimal use of human resources. Common classes of document images are forms, invoices/receipts, newspaper articles, letters and scientific reports. Such heterogeneous sample is a challenge for an Optical Character Recognition system, because an initial categorization in the respective document class is usually required, in order to further assist the processing procedure (eg. extraction of regions of interest, indexing). Recent technological advances in the field of Artificial Intelligence - AI (eg. Neural Networks) and innovative Image Processing techniques have been proven to be a priceless asset for the Document Classification problem. This paper aims to provide an introductive presentation of such AI systems and especially how do they apply on a Document Classification system.