{"id":22056,"date":"2023-08-09T09:56:52","date_gmt":"2023-08-09T00:56:52","guid":{"rendered":"https:\/\/docs.c-bot.pro\/?p=22056"},"modified":"2025-01-31T17:41:07","modified_gmt":"2025-01-31T08:41:07","slug":"extract_text_from_image","status":"publish","type":"post","link":"https:\/\/docs.c-bot.pro\/en\/user_guide\/bot\/b-bot_editer\/extension\/ai\/extract_text_from_image\/","title":{"rendered":"AI \/ Extract text from image or PDF (AI-OCR)"},"content":{"rendered":"\n<p><a href=\"https:\/\/docs.c-bot.pro\/en\/\"><i class=\"fas fa-book\"> <\/i> &nbsp; Home<\/a> &gt; <a href=\"https:\/\/docs.c-bot.pro\/en\/user_guide\">User guide<\/a> &gt; <a href=\"https:\/\/docs.c-bot.pro\/en\/user_guide\/bot\/\">BOT<\/a> &gt; <a href=\"https:\/\/docs.c-bot.pro\/en\/user_guide\/bot\/b-bot_editer\/\">How to use the BOT editor<\/a> &gt; <a href=\"https:\/\/docs.c-bot.pro\/en\/user_guide\/bot\/b-bot_editer\/extension\">Extention<\/a>&gt; <a href=\"https:\/\/docs.c-bot.pro\/en\/user_guide\/bot\/b-bot_editer\/extension\/ai\">AI<\/a> &gt; Extract text from image or PDF (AI-OCR)<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image size-full is-resized extension_icon\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2023\/04\/icon-s_96p_ai.png\" alt=\"\" class=\"wp-image-19454\" width=\"75\" height=\"75\"\/><\/figure>\n\n\n\n<h2 id=\"outline__1\" class=\"wp-block-heading\">App overview<\/h2>\n\n\n\n<p>Use AI-OCR to convert images or PDFs into text and extract table data or text from specified coordinates.<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>Extended Feature URL<\/td><td>cbot-extension:\/\/cloud-bot:ai:recognize-image:4<\/td><\/tr><tr><td>Provider<\/td><td>Cloud BOT <span style=\"margin-left: 2px ; padding: 2px 7px; border:0px solid #000 ; background-color: #007bff ; border-radius: 5px ; color: #ffffff ; font-size: 0.7em;\" class=\"badge\">official<\/span><\/td><\/tr><tr><td>External communication<\/td><td>Yes<br>*This application communicates with <a href=\"https:\/\/azure.microsoft.com\/en-us\/products\/ai-services?activetab=pivot:azureopenaiservicetab\" data-type=\"URL\" data-id=\"https:\/\/azure.microsoft.com\/en-us\/products\/ai-services?activetab=pivot:azureopenaiservicetab\" target=\"_blank\" rel=\"noreferrer noopener\">Azure Cognitive Services<\/a> API.<\/td><\/tr><tr><td>Version<\/td><td>4<\/td><\/tr><tr><td>Transaction<\/td><td>Use a transaction for each extraction.<br>3 transactions per page<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 id=\"outline__2\" class=\"wp-block-heading\">Screen description<\/h2>\n\n\n\n<h3 id=\"outline__2_1\" class=\"wp-block-heading\">Input screen<\/h3>\n\n\n\n<h4 id=\"outline__2_1_1\" class=\"wp-block-heading\">Extract option<\/h4>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"701\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2025\/01\/img_aiocr_input_en-1024x701.jpg\" alt=\"\" class=\"wp-image-32279\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-black-color has-text-color\"><strong>File URL<\/strong><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specify the URL where the file from which the text extraction is to be performed is located.<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-black-color has-text-color has-small-font-size\">(Supported formats:PDF,JPG\/JPG,PNG,BMP,TIFF)<\/p>\n\n\n\n<p><strong>Output format<\/strong><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specifies the output format of the text.<\/p>\n\n\n\n<ul class=\"extension_detail_item_body wp-block-list\">\n<li>Text: Output the extraction results as a single text data.<\/li>\n\n\n\n<li><a href=\"#outline__2_3\" data-type=\"internal\" data-id=\"#outline__2_3\">Layout<\/a>: Classifies the extraction results into specific categories and outputs them.<\/li>\n\n\n\n<li>JSON: Output the extraction results in json format.<\/li>\n<\/ul>\n\n\n\n<p><strong>Data categories to output<\/strong><span style=\"color:#cf2e2e\" class=\"tadv-color\"> (Output format:Displayed only when 'Layout' is selected)<\/span><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specify the 'Data category' to be displayed on the results screen.<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-small-font-size\">*For more information on data categories, please click <a href=\"#outline__2_3\" data-type=\"internal\" data-id=\"#outline__2_3\">here<\/a>.<\/p>\n\n\n\n<p><strong>Attributes information to output<\/strong><span style=\"color:#cf2e2e\" class=\"tadv-color\"> <\/span><span style=\"color:#cf2e2e\" class=\"tadv-color\">(Output format:Displayed only when 'Layout' is selected)<\/span><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specifies 'Attribute information' to be displayed on the results screen.<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-small-font-size\">*For more information on data attributes, please click <a href=\"#outline__2_3\" data-type=\"internal\" data-id=\"#outline__2_3\">here<\/a>.<\/p>\n\n\n\n<p><strong>Page range<\/strong> (Option)<\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specifies the page from which text extraction is performed.<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-vivid-cyan-blue-color has-text-color has-small-font-size\">* You can specify multiple pages to be extracted, separated by commas (,). (ex: 1,2,5)<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-vivid-cyan-blue-color has-text-color has-small-font-size\">* The number of pages to be extracted can be specified with a hyphen (-). (ex: 3-6)<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-vivid-cyan-blue-color has-text-color has-small-font-size\">* If an empty value is specified, all pages are covered.<\/p>\n\n\n\n<p><strong>Detected region<\/strong> (Option)<\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specify Detected region. Detected only from the specified Detected region.<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-vivid-cyan-blue-color has-text-color has-small-font-size\">* Specifying the Detected region is optional.<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-vivid-cyan-blue-color has-text-color has-small-font-size\">* You can increase or decrease the Detected region by clicking the Add or Delete button. A maximum of 10 region can be set.<\/p>\n\n\n\n<p>In order to specify coordinates accurately, the coordinates to be specified must be detected and confirmed in advance.<br>For more information on specific methods, please click <a href=\"#outline__2_4\" data-type=\"internal\" data-id=\"#outline__2_4\">here<\/a>.<\/p>\n\n\n\n<p><strong>[Region number]<\/strong><br>Numbers from 1 to 10 are assigned in sequence.<\/p>\n\n\n\n<p><strong>[Detected range coordinates]<\/strong><br>Detected range coordinates can be specified.<br>Coordinates are expressed as a rectangle with two corner points, upper left and lower right. For example, for 50,50,250,200, X1=50, Y1=50, X2=250, Y2=200.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1347\" height=\"615\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2023\/11\/user_guide_ex_extract_text_from_image_2311_en_02.png\" alt=\"\" class=\"wp-image-25303\"\/><\/figure>\n\n\n\n<p><strong>[Extraction mode]<\/strong><br>Select Extraction mode.<\/p>\n\n\n\n<p><small><strong>Extract information over lapping with the range<\/strong>: Extracts all information that overlaps the Detected range coordinates.<\/small><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1347\" height=\"522\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2023\/11\/user_guide_ex_extract_text_from_image_2311_en_07.png\" alt=\"\" class=\"wp-image-25320\"\/><\/figure>\n\n\n\n<p><small><strong>Extract information contained within the range<\/strong>: Only information that falls within the Detected range coordinates is extracted.<\/small><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1347\" height=\"471\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2023\/11\/user_guide_ex_extract_text_from_image_2311_en_08.png\" alt=\"\" class=\"wp-image-25321\"\/><\/figure>\n\n\n\n<p><strong>[Page number]<\/strong><br>Specifies the page number.<\/p>\n<\/div>\n<\/div>\n\n\n\n<h3 id=\"outline__2_2\" class=\"wp-block-heading\">Result screen<\/h3>\n\n\n\n<h4 id=\"outline__2_2_1\" class=\"wp-block-heading\">The extraction is complete.<\/h4>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"701\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2025\/01\/img_aiocr_comp_en-1024x701.jpg\" alt=\"\" class=\"wp-image-32280\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>Extraction results are displayed.<\/p>\n<\/div>\n<\/div>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 id=\"outline__2_3\" class=\"wp-block-heading\">Output Format: Additional explanation of \"Layout\"<\/h3>\n\n\n\n<p>When a layout is selected, the extracted results are sorted into the following data categories: \"Table,\" \"Title,\" \"Section Heading,\" \"Footnote,\" \"Header,\" \"Footer,\" \"Page number,\" \"Barcode,\" and \"No category\" and output.<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-vivid-red-color has-text-color has-small-font-size\">*AI will automatically determine which data category the extracted data will be classified into.<\/p>\n\n\n\n<p>When \"Attributes information to output\" is specified, data attribute information such as \"Data category\" and \"Page number\" and \"Detected region number\" can be output to the Extraction Result screen.<br>The coordinates of the detected text are also displayed by specifying \"<a href=\"#outline__2_4\" data-type=\"internal\" data-id=\"#outline__2_4\">Detected rectangle<\/a>\".<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1654\" height=\"737\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2023\/08\/\u30ec\u30a4\u30a2\u30a6\u30c8\u8aac\u660e_en.png\" alt=\"\" class=\"wp-image-22243\"\/><\/figure>\n\n\n\n<p class=\"has-small-font-size\">* Only when the data category is \"Table\", only \"page number\" will be output as attribute information.<\/p>\n\n\n\n<h3 id=\"outline__2_4\" class=\"wp-block-heading\"><strong>Attributes information to output: Additional explanation of \"Detected rectangle\"<\/strong><\/h3>\n\n\n\n<p>When \"Detected rectangle\" is specified in \"Attributes information to output\", the coordinates of the detected text are displayed. The coordinates detected at this time can be used in the \"coordinates\" field of the \"Detected region\" input screen.<br>The detection methods are as follows.<\/p>\n\n\n\n<p><small>1. Open a Virtual browser. (*It is not necessary to record the task.)<br>2. Open the \"Extract text from image or PDF\" extension.<br>3. Upload a file or specify a URL.<br>4. Select \"Layout\" as output format.<br>5. When \"Attributes information to output\" is displayed, specify \"Detected rectangle\".<br>6. The coordinates will be detected in the result display screen.<br>7. The detected coordinates can be copied to the clipboard.<\/small><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Home &gt; User guide &gt; BOT &gt; How to use the BOT editor &gt; Extention&gt; AI &gt; Extract text from image or PDF (AI-OCR) App overview Use AI-OCR to convert images or PDFs into text and extract table data or text from specified coordinates. Extended Feature URL cbot-extension:\/\/cloud-bot:ai:recognize-image:4 Provider Cloud BOT official External communication Yes*This application communicates with Azure Cognitive Services API. Version 4 Transaction Use a transaction for each extraction.3 transactions per page Screen description Input screen Extract option File URL Specify the URL where the file from which the text extraction is to be performed is located. (Supported formats:PDF,JPG\/JPG,PNG,BMP,TIFF) Output format Specifies the output format of the text. Data categories to output (Output format:Displayed only when 'Layout' is selected) Specify the 'Data category' to be displayed on the results screen. *For more information on data categories, please click here. Attributes information to output (Output format:Displayed only when 'Layout' is selected) Specifies 'Attribute information' to be displayed on the results screen. *For more information on data attributes, please click here. Page range (Option) Specifies the page from which text extraction is performed. * You can specify multiple pages to be extracted, separated by commas (,). (ex: 1,2,5) * [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_locale":"en_US","_original_post":"https:\/\/docs.c-bot.pro\/?p=21903","footnotes":""},"categories":[60],"tags":[],"class_list":["post-22056","post","type-post","status-publish","format-standard","hentry","category-ai","en-US"],"_links":{"self":[{"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/posts\/22056","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/comments?post=22056"}],"version-history":[{"count":9,"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/posts\/22056\/revisions"}],"predecessor-version":[{"id":32281,"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/posts\/22056\/revisions\/32281"}],"wp:attachment":[{"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/media?parent=22056"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/categories?post=22056"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/tags?post=22056"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}