Saturday, May 24, 2008

CHARACTER RECOGNITION METHODS

1 Template Matching and Correlation Techniques

In 1929 Tausheck obtained a patent on OCR in Germany and this is the first conceived idea of an OCR. Their approach was, what is referred to as template matching in the literature. The template matching process can be roughly divided into two sub processes, i.e. superimposing an input shape on a template and measuring the degree of coincidence between the input shape and the template. The template, which matches most closely with the unknown, provides recognition. The two-dimensional template matching is very sensitive to noise and difficult to adapt to a different font. A variation of template matching approach is to test only selected pixels and employ a decision tree for further analysis. Peephole method is one of the simplest methods based on selected pixels matching approach. In this approach, the main difficulty lies in selecting the invariant discriminating set of pixels for the alphabet. Moreover, from an Artificial Intelligence perspective, template matching has been ruled out as an explanation for human performance [1, 2].

2 Features Derived from the Statistical Distribution of Points

This technique is based on matching on feature planes or spaces, which are distributed on an n-dimensional plane where n is the number of features. This approach is referred to as statistical or decision theoretic approach. Unlike template matching where an input character is directly compared with a standard set of stored prototypes. Many samples of a pattern are used for collecting statistics. This phase is known as the training phase. The objective is to expose the system to natural variants of a character. Recognition process uses this statistics for identifying an unknown character. The objective is to expose the system to natural variants of a character. The recognition process uses this statistics for partitioning the feature space. For instance, in the K-L expansion one of the first attempt in statistical feature extraction, orthogonal vectors are generated from a data set. For the vectors, the covariance matrix is constructed and its eigenvectors are solved which form the coordinates of the given pattern space. Initially, the correlation was pixel-based which led to large number of covariance matrices. This approach was further refined to the use of class-based correlation instead of pixel-based one which led to compact space size. However, this approach was very sensitive to noise and variation in stroke thickness. To make the approach tolerant to variation and noise, a tree structure was used for making a decision and multiple prototypes were stored for each class. Researchers for classification have used the Fourier series expansions, Walsh, Haar, and Hadamard series expansion.

3 Geometrical and Topological Features

The classifier is expected to recognize the natural variants of a character but discriminate between similar looking characters such as ‘k’ – ‘ph’, ‘p’ - ‘Sh’ etc. This is a contradicting requirement which makes the classification task challenging. The structural approach has the capability of meeting this requirement. The multiple prototypes are stored for each class, to take care of the natural variants of the character. However, a large number of prototypes for the same class are required to cover the natural variants when the prototypes are generated automatically. In contrast, the descriptions may be handcrafted and a suitable matching strategy incorporating expected variations is relied upon to yield the true class. The matching strategies include dynamic programming, test for isomorphism, inexact matching, relaxation techniques and multiple to-one matching. Rocha have used a conceptual model of variations and noise along with multiple to one mapping. Yet another class of structural approach is to use a phrase structured grammar for prototype descriptions and parse the unknown pattern syntactically using the grammar. Here the terminal symbols of the grammar are the primitives of strokes and non-terminals represent the pattern-classes. The production rules give the spatial relationships of the constituent primitives.

4 Hybrid Approach

The statistical approach and structural approach both have their advantages and shortcomings. The statistical features are more tolerant to noise (provided the sample space over which training has been performed is representative and realistic) than structural descriptions. Whereas, the variation due to font or writing style can be more easily abstracted in structural descriptions. Two approaches are complimentary in terms of their strengths and have been combined. The primitives have to be ultimately classified using a statistical approach. Combine the approaches by mapping variable length, unordered sets of geometrical shapes to fixed length numerical vectors. This approach, the hybrid approach, has been used for omni font, variable size character recognition systems.
5 Neural Networks
In the beginning, character recognition was regarded as a problem, which could be easily solved. But the problem turned out to be more challenging than the expectations of most of the researchers in this field. The challenge still exists and an unconstrained document recognition system matching human performance is still nowhere in the sight. The performance of a system deteriorates very rapidly with deterioration in the quality of the input or with the introduction of new fonts handwriting. In other words, the systems do not adapt to the changed environment easily. Training phase aims at exposing the system to a large number of fonts and their natural variants. The neural networks are based on the theory of learning from the known inputs. A back propagation neural network is composed of several layers of interconnected elements. Each element computes an output, which is a function of weighted sum of its inputs. The weights are modified until a desired output is obtained. The neural networks have been employed for character recognition with varying degree of success. The neural networks are employed for integrating the results of the classifiers by adjusting weights to obtain desired output. The main weakness of the systems based on neural networks is their poor capability for generality. There is always a chance of under training or over training the system. Besides this, a neural network does not provide structural description, which is vital from artificial intelligence viewpoint. The neural network approach has solved the problem of character classification no more than the earlier described approaches. The recent research results call for the use of multiple features and intelligent ways of combining them. The combination of potentially conflicting decisions by multiple classifiers should take advantage of the strength of the individual classifier, avoid their weaknesses and improve the classification accuracy. The intersection and union of decision regions are the two most obvious methods for classification combination.

[get this widget]

0 comments:

search

Google