{"id":32265,"date":"2025-01-31T17:28:19","date_gmt":"2025-01-31T08:28:19","guid":{"rendered":"https:\/\/docs.c-bot.pro\/?p=32265"},"modified":"2025-01-31T17:28:19","modified_gmt":"2025-01-31T08:28:19","slug":"extract_text_from_specified_positions_in_images_or_pdfs","status":"publish","type":"post","link":"https:\/\/docs.c-bot.pro\/en\/user_guide\/bot\/b-bot_editer\/extension\/ai\/extract_text_from_specified_positions_in_images_or_pdfs\/","title":{"rendered":"Extract text from specified positions in images or PDFs (AI-OCR)"},"content":{"rendered":"\n<p><a href=\"https:\/\/docs.c-bot.pro\/en\/\"><i class=\"fas fa-book\"> <\/i> &nbsp; Home<\/a> &gt; <a href=\"https:\/\/docs.c-bot.pro\/en\/user_guide\">User guide<\/a> &gt; <a href=\"https:\/\/docs.c-bot.pro\/en\/user_guide\/bot\/\">BOT<\/a> &gt; <a href=\"https:\/\/docs.c-bot.pro\/en\/user_guide\/bot\/b-bot_editer\/\">How to use the BOT editor<\/a> &gt; <a href=\"https:\/\/docs.c-bot.pro\/en\/user_guide\/bot\/b-bot_editer\/extension\">Extention<\/a>&gt; <a href=\"https:\/\/docs.c-bot.pro\/en\/user_guide\/bot\/b-bot_editer\/extension\/ai\">AI<\/a> &gt; Extract text from specified positions in images or PDFs (AI-OCR)<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image size-full is-resized extension_icon\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2023\/04\/icon-s_96p_ai.png\" alt=\"\" class=\"wp-image-19454\" width=\"75\" height=\"75\"\/><\/figure>\n\n\n\n<h2 id=\"outline__1\" class=\"wp-block-heading\">App overview<\/h2>\n\n\n\n<p>Text extraction from standardized format documents using AI-OCR.<br>Extraction settings will be configured based on a template document.<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>Extended Feature URL<\/td><td>cbot-extension:\/\/cloud-bot:ai:recognize-image-marker:3<\/td><\/tr><tr><td>Provider<\/td><td>Cloud BOT <span style=\"margin-left: 2px ; padding: 2px 7px; border:0px solid #000 ; background-color: #007bff ; border-radius: 5px ; color: #ffffff ; font-size: 0.7em;\" class=\"badge\">official<\/span><\/td><\/tr><tr><td>External communication<\/td><td>Yes<br>*This application communicates with <a rel=\"noreferrer noopener\" href=\"https:\/\/azure.microsoft.com\/en-us\/products\/ai-services?activetab=pivot:azureopenaiservicetab\" data-type=\"URL\" data-id=\"https:\/\/azure.microsoft.com\/en-us\/products\/ai-services?activetab=pivot:azureopenaiservicetab\" target=\"_blank\">Azure Cognitive Services<\/a> API.<\/td><\/tr><tr><td>Version<\/td><td>3<\/td><\/tr><tr><td>Transaction<\/td><td>Use a transaction for each extraction.<br>3 transactions per page<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>\u5b9a\u578b\u30d5\u30a9\u30fc\u30de\u30c3\u30c8\u306e\u540c\u3058\u7b87\u6240\u306e\u30c6\u30ad\u30b9\u30c8\u3092\u3001\u8907\u6570\u306e\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u304b\u3089\u9023\u7d9a\u3057\u3066\u62bd\u51fa\u3067\u304d\u307e\u3059\u3002<br>\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u306b\u8a18\u8f09\u3055\u308c\u3066\u3044\u308b\u4e00\u90e8\u306e\u30c6\u30ad\u30b9\u30c8\u3092\u76ee\u5370(\u30de\u30fc\u30ab\u30fc)\u3068\u3057\u3066\u5b9a\u7fa9\u3057\u3001\u305d\u3053\u304b\u3089\u306e\u76f8\u5bfe\u4f4d\u7f6e\u306b\u3088\u308a\u62bd\u51fa\u5bfe\u8c61\u306e\u30c6\u30ad\u30b9\u30c8\u3092\u62bd\u51fa\u3057\u307e\u3059\u3002<\/p>\n\n\n\n<p><strong>\u4f8b<\/strong><\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"597\" height=\"726\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2025\/01\/img_aiocr_marker_sup_en.jpg\" alt=\"\" class=\"wp-image-32268\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-vivid-red-color has-text-color\"><strong>Red frame:<\/strong> <strong>Text defined as a marker<\/strong><\/p>\n\n\n\n<p>By designating multiple texts that are consistent in content and position across multiple documents, and are located as far apart as possible, you can extract the target text even if the image or PDF is somewhat tilted.<\/p>\n\n\n\n<p class=\"has-vivid-cyan-blue-color has-text-color\"><strong>Blue frame: Extraction position<\/strong><\/p>\n\n\n\n<p>The text to be extracted is identified based on its relative position from the defined marker.<\/p>\n<\/div>\n<\/div>\n\n\n\n<h2 id=\"outline__2\" class=\"wp-block-heading\">Preconfiguration<\/h2>\n\n\n\n<h3 id=\"outline__2_1\" class=\"wp-block-heading\">Creation of Extraction method<\/h3>\n\n\n\n<p>Extraction method, which records the markers and extraction positions, is created in advance.<br>This extraction method is created once and stored, and is used when extracting text sequentially from multiple documents.<\/p>\n\n\n\n<p>Since this setting is performed to obtain the extraction method, it does not need to be saved as a BOT.<\/p>\n\n\n\n<h4 id=\"outline__2_1_1\" class=\"wp-block-heading\">Extraction method (definition setting)<\/h4>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"549\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2025\/01\/img_aiocr_marker_input_en-1024x549.jpg\" alt=\"\" class=\"wp-image-32269\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-black-color has-text-color\"><strong>Extraction method<\/strong><\/p>\n\n\n\n<p>Specify the extraction method.<br>To set the extraction definition, select \u201cExtract text after setting extract\u201d.<\/p>\n\n\n\n<p>Extract text after setting extract : Extraction settings are made based on the template document and the text is extracted using the definitions.<br>Enter extraction definition and extract text : Enter an extraction definition to extract text.<\/p>\n\n\n\n<p><strong>URL for the extraction configuration file<\/strong><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specify a document file to be used as a template.<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-black-color has-text-color has-small-font-size\">(Supported formats : PDF,JPG\/JPG,PNG,BMP,TIFF)<\/p>\n\n\n\n<p><strong>Target page for extraction configuration<\/strong><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specify the target page from the template document for which the extraction settings are to be made.<\/p>\n<\/div>\n<\/div>\n\n\n\n<h4 id=\"outline__2_1_2\" class=\"wp-block-heading\">Extraction configuration (definition setting)<\/h4>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"701\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2025\/01\/img_aiocr_marker_edit_en-1024x701.jpg\" alt=\"\" class=\"wp-image-32270\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-black-color has-text-color\"><strong>[Data]<\/strong><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Text extracted from the template document will be displayed.<\/p>\n\n\n\n<p><strong>[Marker]<\/strong><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">By checking the box, the text is used as a marker and its positional relationship to the text to be extracted is used as definition information.<\/p>\n\n\n\n<p><strong>[Extraction name]<\/strong><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specify a data name for the text you wish to extract.<\/p>\n<\/div>\n<\/div>\n\n\n\n<h4 id=\"outline__2_1_3\" class=\"wp-block-heading\">Extraction options (definition setting)<\/h4>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"720\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2025\/01\/img_aiocr_marker_option_en-1024x720.jpg\" alt=\"\" class=\"wp-image-32271\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>Extraction definition information is created.<br>You can continue to verify that the created information works correctly.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\"><strong>File URL<\/strong><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specifies the file from which the text extraction is verified.<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-black-color has-text-color has-small-font-size\">(Supported formats : PDF,JPG\/JPG,PNG,BMP,TIFF)<\/p>\n\n\n\n<p><strong>Page range (Optional)<\/strong><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specifies the page on which the text extraction is verified.<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-vivid-cyan-blue-color has-text-color has-small-font-size\">* You can specify multiple pages to be extracted, separated by commas (,). (ex: 1,2,5)<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-vivid-cyan-blue-color has-text-color has-small-font-size\">* The number of pages to be extracted can be specified with a hyphen (-). (ex: 3-6)<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-vivid-cyan-blue-color has-text-color has-small-font-size\">* If an empty value is specified, all pages are covered.<\/p>\n\n\n\n<p><strong>Extraction definition information<\/strong><\/p>\n\n\n\n<p>This is the definition information created by the extraction settings.<br>This information is used to extract text from a standard formatted document.<\/p>\n<\/div>\n<\/div>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 id=\"outline__3\" class=\"wp-block-heading\">Screen description<\/h2>\n\n\n\n<h3 id=\"outline__3_1\" class=\"wp-block-heading\">Input screen<\/h3>\n\n\n\n<p>Extraction is performed using the extraction definition information created in advance.<br>This operation requires extraction definition information.<\/p>\n\n\n\n<small>*<a href=\"#outline__2_1\">Click here<\/a> for extraction settings<\/small>\n\n\n\n<h4 id=\"outline__3_1_1\" class=\"wp-block-heading\">Extraction method<\/h4>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"549\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2025\/01\/img_aiocr_marker_input_02_en-1024x549.jpg\" alt=\"\" class=\"wp-image-32272\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-black-color has-text-color\"><strong>Extraction method<\/strong><\/p>\n\n\n\n<p>Specify the extraction method.<br>To extract text from a document, select \u201cEnter extraction definition and extract text.<\/p>\n<\/div>\n<\/div>\n\n\n\n<h4 id=\"outline__3_1_2\" class=\"wp-block-heading\">Extraction options<\/h4>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"720\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2025\/01\/img_aiocr_marker_option_02_en-1024x720.jpg\" alt=\"\" class=\"wp-image-32273\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-black-color has-text-color\"><strong>File URL<\/strong><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specifies the file from which the text extraction is verified.<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-black-color has-text-color has-small-font-size\">(Supported formats : PDF,JPG\/JPG,PNG,BMP,TIFF)<\/p>\n\n\n\n<p><strong>Page range (Optional)<\/strong><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specifies the page on which the text extraction is verified.<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-vivid-cyan-blue-color has-text-color has-small-font-size\">* You can specify multiple pages to be extracted, separated by commas (,). (ex: 1,2,5)<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-vivid-cyan-blue-color has-text-color has-small-font-size\">* The number of pages to be extracted can be specified with a hyphen (-). (ex: 3-6)<\/p>\n\n\n\n<p class=\"extension_detail_item_body has-vivid-cyan-blue-color has-text-color has-small-font-size\">* If an empty value is specified, all pages are covered.<\/p>\n\n\n\n<p><strong><strong>Extraction definition information<\/strong><\/strong><\/p>\n\n\n\n<p class=\"extension_detail_item_body\">Specifies the definition information set for the extraction.<\/p>\n\n\n\n<small>*<a href=\"#outline__2_1\">Click here<\/a> for extraction settings<\/small>\n<\/div>\n<\/div>\n\n\n\n<h3 id=\"outline__3_2\" class=\"wp-block-heading\">Result screen<\/h3>\n\n\n\n<h4 id=\"outline__3_2_1\" class=\"wp-block-heading\">The extraction is complete.<\/h4>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"701\" src=\"https:\/\/docs.c-bot.pro\/wp-content\/uploads\/2025\/01\/img_aiocr_marker_comp_en-1024x701.jpg\" alt=\"\" class=\"wp-image-32274\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>Extraction results are displayed.<br>Files can be processed in succession by clicking on \u201cNext file\".<\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Home &gt; User guide &gt; BOT &gt; How to use the BOT editor &gt; Extention&gt; AI &gt; Extract text from specified positions in images or PDFs (AI-OCR) App overview Text extraction from standardized format documents using AI-OCR.Extraction settings will be configured based on a template document. Extended Feature URL cbot-extension:\/\/cloud-bot:ai:recognize-image-marker:3 Provider Cloud BOT official External communication Yes*This application communicates with Azure Cognitive Services API. Version 3 Transaction Use a transaction for each extraction.3 transactions per page \u5b9a\u578b\u30d5\u30a9\u30fc\u30de\u30c3\u30c8\u306e\u540c\u3058\u7b87\u6240\u306e\u30c6\u30ad\u30b9\u30c8\u3092\u3001\u8907\u6570\u306e\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u304b\u3089\u9023\u7d9a\u3057\u3066\u62bd\u51fa\u3067\u304d\u307e\u3059\u3002\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u306b\u8a18\u8f09\u3055\u308c\u3066\u3044\u308b\u4e00\u90e8\u306e\u30c6\u30ad\u30b9\u30c8\u3092\u76ee\u5370(\u30de\u30fc\u30ab\u30fc)\u3068\u3057\u3066\u5b9a\u7fa9\u3057\u3001\u305d\u3053\u304b\u3089\u306e\u76f8\u5bfe\u4f4d\u7f6e\u306b\u3088\u308a\u62bd\u51fa\u5bfe\u8c61\u306e\u30c6\u30ad\u30b9\u30c8\u3092\u62bd\u51fa\u3057\u307e\u3059\u3002 \u4f8b Red frame: Text defined as a marker By designating multiple texts that are consistent in content and position across multiple documents, and are located as far apart as possible, you can extract the target text even if the image or PDF is somewhat tilted. Blue frame: Extraction position The text to be extracted is identified based on its relative position from the defined marker. Preconfiguration Creation of Extraction method Extraction method, which records the markers and extraction positions, is created in advance.This extraction method is created once and stored, and is used when extracting text sequentially from multiple documents. Since this setting is performed to obtain the extraction method, it does not need to be saved as a BOT. Extraction method [&hellip;]<\/p>\n","protected":false},"author":9,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_locale":"en_US","_original_post":"https:\/\/docs.c-bot.pro\/?p=32131","footnotes":""},"categories":[60],"tags":[],"class_list":["post-32265","post","type-post","status-publish","format-standard","hentry","category-ai","en-US"],"_links":{"self":[{"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/posts\/32265","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/users\/9"}],"replies":[{"embeddable":true,"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/comments?post=32265"}],"version-history":[{"count":1,"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/posts\/32265\/revisions"}],"predecessor-version":[{"id":32275,"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/posts\/32265\/revisions\/32275"}],"wp:attachment":[{"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/media?parent=32265"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/categories?post=32265"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/docs.c-bot.pro\/wp-json\/wp\/v2\/tags?post=32265"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}