Extract text from image or PDF (AI-OCR)

NO IMAGE

  Home > User guide > BOT > How to use the BOT editor > Extention> AI > Extract text from image or PDF (AI-OCR)

Japanese may be displayed on some pages.

App overview

Extract text from images or PDFs by AI-OCR.

Watch a Video   No audio.

Watch in HD

Extended Feature URLcbot-extension://cloud-bot:ai:recognize-image:1
ProviderCloud BOT official
External communicationYes
*This application communicates with Azure Cognitive Services API.
Version1
*This app is open to the public for free as a beta version. Specifications and fee structure may change in the future.

Screen description

Input screen

Extract option

File upload (*1)

Specifies the file from which text extraction is to be performed.

(Supported formats:PDF,JPG/JPG,PNG,BMP,TIFF)

*1 Please specify one or the other.

File URL (*1)

Specify the URL where the file from which the text extraction is to be performed is located.

(Supported formats:PDF,JPG/JPG,PNG,BMP,TIFF)

*1 Please specify one or the other.

Page range

Specifies the page from which text extraction is performed.

* You can specify multiple pages to be extracted, separated by commas (,). (ex: 1,2,5)

* The number of pages to be extracted can be specified with a hyphen (-). (ex: 3-6)

* If an empty value is specified, all pages are covered.

Output format

Specifies the output format of the text.

  • Text: Output the extraction results as a single text data.
  • Layout: Classifies the extraction results into specific categories and outputs them.
  • JSON: Output the extraction results in json format.

Data categories to output (Output format:Displayed only when 'Layout' is selected)

Specify the 'Data category' to be displayed on the results screen.

*For more information on data categories, please click here.

Attributes information to output (Output format:Displayed only when 'Layout' is selected)

Specifies 'Attribute information' to be displayed on the results screen.

*For more information on data attributes, please click here.

Result screen

The extraction is complete.

Extraction results are displayed.

Output Format: Additional explanation of "Layout"

When a layout is selected, the extracted results are sorted into the following data categories: "Table," "Title," "Section Heading," "Footnote," "Header," "Footer," "Page number," "Barcode," and "No category" and output.

*AI will automatically determine which data category the extracted data will be classified into.

When "Attributes information to output" is specified, data attribute information such as "Data category" and "Page number" can be output to the Extraction Result screen.

* Only when the data category is "Table", only "page number" will be output as attribute information.