Extract text from specified positions in images or PDFs (AI-OCR)

Home > User guide > BOT > How to use the BOT editor > Extention> AI > Extract text from specified positions in images or PDFs (AI-OCR)

App overview
Text extraction from standardized format documents using AI-OCR.
Extraction settings will be configured based on a template document.
Extended Feature URL | cbot-extension://cloud-bot:ai:recognize-image-marker:3 |
Provider | Cloud BOT official |
External communication | Yes *This application communicates with Azure Cognitive Services API. |
Version | 3 |
Transaction | Use a transaction for each extraction. 3 transactions per page |
定型フォーマットの同じ箇所のテキストを、複数のドキュメントから連続して抽出できます。
ドキュメントに記載されている一部のテキストを目印(マーカー)として定義し、そこからの相対位置により抽出対象のテキストを抽出します。
例

Red frame: Text defined as a marker
By designating multiple texts that are consistent in content and position across multiple documents, and are located as far apart as possible, you can extract the target text even if the image or PDF is somewhat tilted.
Blue frame: Extraction position
The text to be extracted is identified based on its relative position from the defined marker.
Preconfiguration
Creation of Extraction method
Extraction method, which records the markers and extraction positions, is created in advance.
This extraction method is created once and stored, and is used when extracting text sequentially from multiple documents.
Since this setting is performed to obtain the extraction method, it does not need to be saved as a BOT.
Extraction method (definition setting)

Extraction method
Specify the extraction method.
To set the extraction definition, select “Extract text after setting extract”.
Extract text after setting extract : Extraction settings are made based on the template document and the text is extracted using the definitions.
Enter extraction definition and extract text : Enter an extraction definition to extract text.
URL for the extraction configuration file
Specify a document file to be used as a template.
(Supported formats : PDF,JPG/JPG,PNG,BMP,TIFF)
Target page for extraction configuration
Specify the target page from the template document for which the extraction settings are to be made.
Extraction configuration (definition setting)

[Data]
Text extracted from the template document will be displayed.
[Marker]
By checking the box, the text is used as a marker and its positional relationship to the text to be extracted is used as definition information.
[Extraction name]
Specify a data name for the text you wish to extract.
Extraction options (definition setting)

Extraction definition information is created.
You can continue to verify that the created information works correctly.
File URL
Specifies the file from which the text extraction is verified.
(Supported formats : PDF,JPG/JPG,PNG,BMP,TIFF)
Page range (Optional)
Specifies the page on which the text extraction is verified.
* You can specify multiple pages to be extracted, separated by commas (,). (ex: 1,2,5)
* The number of pages to be extracted can be specified with a hyphen (-). (ex: 3-6)
* If an empty value is specified, all pages are covered.
Extraction definition information
This is the definition information created by the extraction settings.
This information is used to extract text from a standard formatted document.
Screen description
Input screen
Extraction is performed using the extraction definition information created in advance.
This operation requires extraction definition information.
Extraction method

Extraction method
Specify the extraction method.
To extract text from a document, select “Enter extraction definition and extract text.
Extraction options

File URL
Specifies the file from which the text extraction is verified.
(Supported formats : PDF,JPG/JPG,PNG,BMP,TIFF)
Page range (Optional)
Specifies the page on which the text extraction is verified.
* You can specify multiple pages to be extracted, separated by commas (,). (ex: 1,2,5)
* The number of pages to be extracted can be specified with a hyphen (-). (ex: 3-6)
* If an empty value is specified, all pages are covered.
Extraction definition information
Specifies the definition information set for the extraction.
*Click here for extraction settingsResult screen
The extraction is complete.

Extraction results are displayed.
Files can be processed in succession by clicking on “Next file".