Python 如何用Pytesseract OCR 辨識影像的字元

woff · 發表於 2023-11-10 17:55:18

1. 到 Github 下載 tesseract-ocr-w64-setup-5.3.3.20231005.exe 來安裝Tesseract。

2. 記錄Tesseract安裝的路徑，預設路徑通常為 C:\Program Files\Tesseract-OCR。

3. 將Tesseract.exe路徑新增到環境變數 "PATH" 中

4. 安裝pytesseract

pip install pytesseract

複製代碼

代碼

import cv2
import pytesseract
import matplotlib.pyplot as plt
import matplotlib
matplotlib.use('TkAgg') #加入這行
# 載入圖檔
image = cv2.imread('./images_ocr/receipt.png')
# 顯示圖檔
image_RGB = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(10,6))
plt.imshow(image_RGB)
plt.axis('off')
plt.show()
# 參數設定
custom_config = r'--psm 6'
# OCR 辨識
print(pytesseract.image_to_string(image, config=custom_config))
# 參數設定，只辨識數字
custom_config = r'--psm 6 outputbase digits'
# OCR 辨識
print(pytesseract.image_to_string(image, config=custom_config))
# 參數設定白名單，只辨識有限字元
custom_config = r'-c tessedit_char_whitelist=abcdefghijklmnopqrstuvwxyz --psm 6'
# OCR 辨識
print(pytesseract.image_to_string(image, config=custom_config))
# 參數設定黑名單，只辨識有限字元
custom_config = r'-c tessedit_char_blacklist=abcdefghijklmnopqrstuvwxyz --psm 6'
# OCR 辨識
print(pytesseract.image_to_string(image, config=custom_config))
# 載入圖檔
image = cv2.imread('./images_ocr/chinese.png')
# 顯示圖檔
image_RGB = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(10,6))
plt.imshow(image_RGB)
plt.axis('off')
plt.show()
# 辨識多國文字，中文繁體、日文及英文
custom_config = r'-l chi_tra+jpn+eng --psm 6'
# OCR 辨識
print(pytesseract.image_to_string(image, config=custom_config))
# 載入圖檔
image = cv2.imread('./images_ocr/chinese_2.png')
# 顯示圖檔
image_RGB = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(10,6))
plt.imshow(image_RGB)
plt.axis('off')
plt.show()
# 辨識多國文字，中文繁體、日文及英文
custom_config = r'-l chi_tra+jpn+eng --psm 6'
# OCR 辨識
print(pytesseract.image_to_string(image, config=custom_config))

複製代碼

結果圖：

Python Pytesseract OCR 辨識數字

文章出處： NetYea 網頁設計

賬號		自動登錄	找回密碼
密碼			註冊

[教學] Python 如何用Pytesseract OCR 辨識影像的字元

相關帖子