Talking about Character Recognition Software OCR

The task of the Chinese character recognition software is to study how to enable the computer to “literate”. The system usually uses photoelectric conversion devices to convert Chinese characters or characters into electrical signals and sends them to a computer. The computer automatically recognizes and reads them, so it is called an optical character. Identification (Optical Character Recognition), abbreviated as OCR.

OCR Development Profile

The concept of OCR was first proposed by German scientist Tausheck in 1929. Later American scientist Handel also proposed the idea of ​​using technology to identify words. The earliest research on printed Chinese character recognition was Casey and Nagy of IBM. In 1966, they published the first article on Chinese character recognition and used the template matching method to identify 1000 printed Chinese characters. In the early 1970s, Japanese scholars began to study Chinese character recognition and did a lot of work. The study of Chinese character recognition started relatively late in China, and research work on OCR began in the late 1970s. Early OCR software failed to meet actual requirements due to various factors such as recognition rate and productization. At the same time, due to the high cost of hardware equipment and slow operation, it has not reached a practical level. Only some departments, such as the information department, press and publishing unit, use OCR software. After 1986, China's OCR research has made great progress. It has made innovations in Chinese character modeling and recognition methods. It has achieved fruitful results in system development and application, and many companies have successively launched Chinese OCR products. After entering the 1990s, with the wide application of flatbed scanners and the popularity of information automation and office automation in China, OCR technology has greatly promoted the further development, so that OCR's recognition accuracy rate and recognition speed meet the needs of users. Claim.

At present, there are many popular OCR softwares. The OCR in English mainly includes OmniPage, and the Chinese OCRs mainly include Tsinghua Unisplendour OCR, Tsinghua Wentong OCR, Hanwang OCR, Zhongjing Shangshu OCR, Danqing OCR, and Mongolian OCR. Although Chinese characters are large in size and complex in shape, OCR technology has matured. Many OCR software can not only recognize black-and-white printed Chinese characters, but also recognize gray and color printed Chinese characters. The recognition speed is fast, the recognition accuracy rate is over 99%, and it can recognize various fonts such as Song, Black, and Carcass. Traditional; can recognize multiple fonts, different sizes of shuffle; some OCR software can also identify images, tables. At the same time, great progress has been made in the study of handwritten Chinese character recognition, and the correct recognition rate has reached more than 70%.

Application of OCR Software

In the scanner market, many types of office and home scanners are equipped with OCR software, such as the violet scanner is equipped with a violet OCR, the Microtek scanner is equipped with a book OCR, the Mustek scanner is equipped with Danqing OCR and so on. Scanners and OCR software share the entire process from document input to text recognition.

Document scanning is often used in the office area. Scanning documents are scanned for related documents published in newspapers, magazines, and other media. Then they are OCR-recognized or stored as image files, which can be later recognized by OCR and converted into text. File or Word file is stored.

In addition, the storage and transmission of digital information are not only cost-effective and highly efficient, but also can adapt to the ever-evolving needs of typesetting and network transmission. At present, there are a large number of books, newspapers, magazines, and other paper treasures left over from history, and it is urgently needed to convert them into electronic information. Such as the establishment of the e-library, it is necessary to scan the books page by page, and the recognition of the OCR software has replaced the work of manually typing texts, greatly shortening the entry time, reducing labor intensity, saving manpower and reducing costs. Improve the accuracy of entry, work efficiency and modern office automation.

At present, the collocation of OCR software and scanner has been applied to many fields in the information age, such as digital library, identification of various reports, and recognition of bank and taxation system bills. With the development and popularization of networking and informatization, its application scope will become more and more extensive.

The composition of OCR system

The function of the Chinese character recognition software OCR is to identify all the patterns or images of each Chinese character in the input Chinese characters, prints, or handwritings by a computer, and to mark the Chinese character class code. Therefore, Chinese character recognition is ultimately an image recognition problem. Due to the large volume of Chinese characters, their different fonts and fonts, and their complex structure, the process of Chinese character recognition is extremely complicated.

Due to the popularity and wide application of scanners, OCR software only needs to provide an interface with the scanner and use the scanner driver software. Therefore, OCR software is mainly composed of four parts: an image processing module, a layout division module, a character recognition module, and a text editing module.

Image processing module

The image processing module mainly has the functions of document scanning, image scaling, image rotation and the like. After input through the scanner, the document forms an image file, and the image processing module can enlarge the image to remove stains and scratches. If the image is placed incorrectly, the image can be manually or automatically rotated in order to create better conditions for character recognition. Higher recognition rate.

2. Layout division module

The layout division module mainly includes layout division and change division, that is, understanding of the layout, word segmentation, normalization, etc., and can be selected automatically or manually. The purpose is to tell the OCR software to separate the articles, forms, etc. of the same layout so that they can be processed separately and in what order.

3. Character recognition module

The character recognition module is the core part of the OCR software. The character recognition module mainly “reads” the input Chinese characters, but it can't read one line at a time. It must be cut one by one. For Chinese characters, it is usually identified by a word, ie, word recognition. Normalize. The character recognition module extracts features of different sample Chinese characters, completes recognition, automatically finds suspicious characters, and has functions such as front and rear associations.

4. Text editing module

The text editing module mainly modifies and edits the OCR-recognized text. If the system recognizes that it is incorrect, the text is displayed in bold red or blue, and similar text is provided for selection, and the editor is selected for output.

How to use OCR software

Although there are many types of OCR software, their usage is much the same. The first thing to do is scan the document and then perform OCR recognition. The OCR software is used as follows:

1. Document scanning

In order to use OCR software for character recognition, documents can be scanned directly in the OCR software. After running the OCR software, the OCR software interface appears. Such as the use of Crystal Book OCR.

Oxford Fabric Zip Luggage

Luggage Suitcase,Nylon Oxford Fabric Luggage,Nylon Oxford Fabric Suitcase,Oxford Fabric Zip Luggage

Yongxin Juanhua Leather & Bags Co., Ltd , https://www.luggagegld.com

Posted on