Enabling text search on images
<< Click to Display Table of Contents >> Navigation: Gnostice Document Studio .NET > Going Deeper > Document Viewing > ASP.NET > Enabling text search on images |
The OCR feature that is integrated into the ASP.NET Document Viewer can be enabled to allow for text present in images to be made "searchable". It is easy to enable this feature. First follow the Getting Started topic to get a basic ASP.NET document viewer going in your app. Then follow the steps below.
1.Use either the GUI or the Package Manager Console and install the Gnostice.DocumentStudio.OCR add-on NuGet package.
2.Add a new class called DocumentViewerEventsHandler which extends the ServerEventsHandler abstract class (from the namespace Gnostice.Controls.ASP).
3.Override the OnServerStart method to provide the digitization settings as shown below.
public class DocumentViewerEventsHandler : ServerEventsHandler
|
4.Build the project and run the Web application in your favorite browser.
5.Click on open button control from viewer’s toolbar to load a scanned image file containing printed matter from your machine.
6.Once the image is loaded click on the "Quick find" toolbar button and search for text present on the image.
Enabling additional languages
The Tesseract OCR library uses training data to recognize text. The training data is stored in the tessdata folder, which is located in the same folder as your binaries. The Gnostice.DocumentStudio.OCR add-on NuGet package includes training data only for the English language. Tesseract can recognize many more languages. To enable additional languages you can download the training data for additional languages from the Tesseract GitHub page and copy it to the tessdata folder (that is present under the bin directory of the web application). Also remember to set the list of languages in the textLanguage setting as shown in the code snippet above. Multiple languages can be specified by separating them with a plus sign. For example for English, German, and French use "eng+deu+fra".
Deployment
Make sure the additional folders and the DLLs as that are loaded at runtime are placed where the rest of the web application's binaries are deployed.