图片提取文字

先上一张效果图

图片提取文字

代码：

from PIL import Image
import pytesseract

text=pytesseract.image_to_string(Image.open('timg.jpg'),lang='chi_sim')
print(text)

具体实现：

1.使用的库有pillow（PIL的代替，PIL年久失修），pytesser，Tesseract OCR引擎。

pillow，pytesser都是python库，可以通过pip下载。

pip install pytesseract

pip install pillow

Tesseract OCR引擎是一个exe文件，下载后需要安装，配置环境变量。

下载地址：http://download.****.net/download/l_lipo/10202168

包含了Tesseract OCR和需要的中文语音包。

下载后安装，path中配置环境变量，D:\Learning Programs\Tesseract-OCR

配置TESSDATA_PREFIX变量，指向D:\Learning Programs\Tesseract-OCR\tessdata

把语言包放到tessdata文件夹中。

修改pytesser库中pytesseract.py文件，地址指向引擎执行文件路径。

import os
import sys
import subprocess
import tempfile
import shlex


# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
tesseract_cmd = 'D:/Learning Programs/Tesseract-OCR/tesseract.exe'

至此，准备工作就做完了。

2.操作：

将图片放在与py文件平齐的目录中，写代码。运行即可

from PIL import Image
import pytesseract

text=pytesseract.image_to_string(Image.open('timg.jpg'),lang='chi_sim')
print(text)

具体实现：

相关推荐