1. 照常-将图像转换为灰度。
2. 应用轻微的模糊以减少图像中的噪点。
3. 现在,我们的目标是找到带有文本的区域,即图像的文本块。为了使文本块检测更容易,我们将反转并最大化图像的颜色,这将通过阈值化来实现。因此,现在文本变为白色(恰好为255,255,255白色),而背景为黑色(同样为0,0,0黑色)。
4. 要查找文本块,我们需要合并该块的所有打印字符。我们通过膨胀(扩展白色像素)来实现。在X轴上使用较大的内核可以消除单词之间的所有空间,而在Y轴上使用较小的内核可以将彼此之间的一个块的行混合在一起,但保持文本块之间的较大间隔不变。
5. 现在,用最小面积矩形包围轮廓的简单轮廓检测将形成我们需要的所有文本块。
6. 确定倾斜角度的方法有很多种,但我们将坚持简单的方法-使用最大的文本块并使用其角度。
# Calculate skew angle of an image
def getSkewAngle(cvImage) -> float:
#Prep image, copy, convert to gray scale, blur, and threshold
newImage = cvImage.copy()
gray = cv2.cvtColor(newImage, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (9, 9), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV +cv2.THRESH_OTSU)[1]
#Apply dilate to merge text into meaningful lines/paragraphs.
#Use larger kernel on X axis to merge characters into single line, cancellingout any spaces.
#But use smaller kernel on Y axis to separate between different blocks of text
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (30, 5))
dilate = cv2.dilate(thresh, kernel, iterations=5)
#Find all contours
contours, hierarchy = cv2.findContours(dilate, cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key = cv2.contourArea, reverse = True)
#Find largest contour and surround in min area box
largestContour = contours[0]
minAreaRect= cv2.minAreaRect(largestContour)
#Determine the angle. Convert it to the value that was originally used to obtainskewed image
angle = minAreaRect[-1]
if angle < -45:
angle = 90 + angle
return -1.0 * angle
# Rotate the image around its center
def rotateImage(cvImage, angle: float):
newImage = cvImage.copy()
(h, w) = newImage.shape[:2]
center = (w // 2, h // 2)
M= cv2.getRotationMatrix2D(center, angle, 1.0)
newImage = cv2.warpAffine(newImage, M, (w, h), flags=cv2.INTER_CUBIC,borderMode=cv2.BORDER_REPLICATE)
return newImage
# Deskew image
def deskew(cvImage):
angle = getSkewAngle(cvImage)
return rotateImage(cvImage, -1.0 * angle)
1- 可以使用所有文本块的平均角度:
allContourAngles = [cv2.minAreaRect(c)[-1]for c in contours]
angle = sum(allContourAngles) /len(allContourAngles)
2- 可以采用中间块的角度:
middleContour = contours[len(contours) //2]
angle = cv2.minAreaRect(middleContour)[-1]
3- 可以尝试最大,最小和中间块的平均角度。
largestContour = contours[0]
middleContour = contours[len(contours) //2]
smallestContour = contours[-1]
angle =sum([cv2.minAreaRect(largestContour)[-1], cv2.minAreaRect(middleContour)[-1],cv2.minAreaRect(smallestContour)[-1]]) / 3
为了测试这种方法,我使用了一个新生成的带有Lorem Ipsum文本的PDF文件。本文档的首页以300 DPI分辨率(使用PDF文档时最常用的设置)呈现。之后,通过拍摄原始图像并在-10度到+10度范围内随机旋转来生成20个样本图像的测试数据集。然后,我将图像及其倾斜角度保存在一起。您可以在我的GitHub存储库中找到用于生成这些示例图像的所有代码,这里不再赘述。
Item #0,with angle=1.77, calculated=1.77, difference=0.0%
Item #1,with angle=-1.2, calculated=-1.19, difference=0.83%
Item #2,with angle=8.92, calculated=8.92, difference=0.0%
Item #3,with angle=8.68, calculated=8.68, difference=0.0%
Item #4,with angle=4.83, calculated=4.82, difference=0.21%
Item #5,with angle=4.41, calculated=4.4, difference=0.23%
Item #6,with angle=-5.93, calculated=-5.91, difference=0.34%
Item #7,with angle=-3.32, calculated=-3.33, difference=0.3%
Item #8,with angle=6.53, calculated=6.54, difference=0.15%
Item #9,with angle=-2.66, calculated=-2.65, difference=0.38%
Item #10,with angle=-2.2, calculated=-2.19, difference=0.45%
Item #11,with angle=-1.42, calculated=-1.4, difference=1.41%
Item #12,with angle=-6.77, calculated=-6.77, difference=0.0%
Item #13,with angle=-9.26, calculated=-9.25, difference=0.11%
Item #14,with angle=4.36, calculated=4.35, difference=0.23%
Item #15,with angle=5.49, calculated=5.48, difference=0.18%
Item #16,with angle=-4.54, calculated=-4.55, difference=0.22%
Item #17,with angle=-2.54, calculated=-2.54, difference=0.0%
Item #18,with angle=4.65, calculated=4.66, difference=0.22%
Item #19,with angle=-4.33, calculated=-4.32, difference=0.23%
MinError: 0.0%
MaxError: 1.41%
Avg Error: 0.27%