对mtcnn的人脸对齐的理解
概念理解
人脸识的流程:人脸检测 ——人脸对齐——特征提取——相似度对比
人脸对齐也是关键的一步,在不同的应用场景下,会直接影响到人脸识别的结果。因为是否进行人脸对齐,会影响到提取到的特征,对齐前后提取到的特征是有差别的。
人脸对齐(矫正):就是检测到人脸角度不正,关键点不对齐,然后需要对齐操作。
人脸对齐前后的效果对比如下图,发现对齐后效果还是挺好的。
那么我们要怎么实施人脸对齐呢?大致的思路是:先设定一个src作为标准的人脸关键点的位置,然后和我们检测到的人脸关键点dst进行相似变换,变换的过程包括旋转、平移、缩放,这样就得到一个齐次变换矩阵M,然后把M作为参数进行仿射变换得到对齐后的人脸图片。
代码
可以结合下代码做更好的理解:
def preprocess(img, bbox=None, landmark=None, **kwargs):
if isinstance(img, str): # 判断一个对象是否是一个已知的类型,类似type()
img = read_image(img, **kwargs)
M = None
image_size = []
str_image_size = kwargs.get('image_size', '')
if len(str_image_size)>0: # 得到图片的image_size,这里用112x112
image_size = [int(x) for x in str_image_size.split(',')]
if len(image_size)==1:
image_size = [image_size[0], image_size[0]]
assert len(image_size)==2
assert image_size[0]==112
assert image_size[0]==112 or image_size[1]==96
if landmark is not None: # 如果landmark不为none,就计算出M
assert len(image_size)==2
src = np.array([ # 人脸的5个关键点的位置,是固定的
[30.2946, 51.6963],
[65.5318, 51.5014],
[48.0252, 71.7366],
[33.5493, 92.3655],
[62.7299, 92.2041] ], dtype=np.float32 )
if image_size[1]==112: # 如果为112,则要把这些坐标的横坐标都加上8.0
src[:,0] += 8.0 # 那么8.0是怎么计算的呢?(112-96)/2 = 8.0
dst = landmark.astype(np.float32) # 目标关键点,设置一下它的数据类型
tform = trans.SimilarityTransform() # 引用 class SimilarityTransform()
tform.estimate(dst, src) # 从一组对应的点估计转换
M = tform.params[0:2,:] # 得到(3, 3) 的齐次变换矩阵
#M = cv2.estimateRigidTransform( dst.reshape(1,5,2), src.reshape(1,5,2), False)
if M is None: # 如果通过上面的变换没有找到齐次变换矩阵,就用以下的方法来调整bbox
if bbox is None: #use center crop # 如果没有bbox,用中心来进行裁剪
det = np.zeros(4, dtype=np.int32)
det[0] = int(img.shape[1]*0.0625)
det[1] = int(img.shape[0]*0.0625)
det[2] = img.shape[1] - det[0]
det[3] = img.shape[0] - det[1]
else: # 直接使用bbox
det = bbox
margin = kwargs.get('margin', 44) # margin的值一般为0.2,表示两个类之间的间距
bb = np.zeros(4, dtype=np.int32) # 4个关键点坐标
bb[0] = np.maximum(det[0]-margin/2, 0)
bb[1] = np.maximum(det[1]-margin/2, 0)
bb[2] = np.minimum(det[2]+margin/2, img.shape[1])
bb[3] = np.minimum(det[3]+margin/2, img.shape[0])
ret = img[bb[1]:bb[3],bb[0]:bb[2],:] # 得到4个关键点坐标
if len(image_size)>0:
ret = cv2.resize(ret, (image_size[1], image_size[0])) # 图片缩放到112
return ret
else: #do align using landmark
assert len(image_size)==2
#src = src[0:3,:]
#dst = dst[0:3,:]
#print(src.shape, dst.shape)
#print(src)
#print(dst)
#print(M)
warped = cv2.warpAffine(img,M,(image_size[1],image_size[0]), borderValue = 0.0) # 进行仿射变换
#tform3 = trans.ProjectiveTransform()
#tform3.estimate(src, dst)
#warped = trans.warp(img, tform3, output_shape=_shape)
return warped
class SimilarityTransform(EuclideanTransform):
"""2D similarity transformation of the form:
X = a0 * x - b0 * y + a1 =
= s * x * cos(rotation) - s * y * sin(rotation) + a1
Y = b0 * x + a0 * y + b1 =
= s * x * sin(rotation) + s * y * cos(rotation) + b1
where ``s`` is a scale factor and the homogeneous transformation matrix is::
[[a0 b0 a1]
[b0 a0 b1]
[0 0 1]]
The similarity transformation extends the Euclidean transformation with a
single scaling factor in addition to the rotation and translation
parameters.
Parameters
----------
matrix : (3, 3) array, optional
Homogeneous transformation matrix.
scale : float, optional
Scale factor.
rotation : float, optional
Rotation angle in counter-clockwise direction as radians.
translation : (tx, ty) as array, list or tuple, optional
x, y translation parameters.
Attributes
----------
params : (3, 3) array
Homogeneous transformation matrix.
"""
def __init__(self, matrix=None, scale=None, rotation=None,
translation=None):
params = any(param is not None
for param in (scale, rotation, translation))
if params and matrix is not None:
raise ValueError("You cannot specify the transformation matrix and"
" the implicit parameters at the same time.")
elif matrix is not None:
if matrix.shape != (3, 3):
raise ValueError("Invalid shape of transformation matrix.")
self.params = matrix
elif params:
if scale is None:
scale = 1
if rotation is None:
rotation = 0
if translation is None:
translation = (0, 0)
self.params = np.array([
[math.cos(rotation), - math.sin(rotation), 0],
[math.sin(rotation), math.cos(rotation), 0],
[ 0, 0, 1]
])
self.params[0:2, 0:2] *= scale
self.params[0:2, 2] = translation
else:
# default to an identity transform
self.params = np.eye(3)
def estimate(self, src, dst):
"""Estimate the transformation from a set of corresponding points.
You can determine the over-, well- and under-determined parameters
with the total least-squares method.
Number of source and destination coordinates must match.
Parameters
----------
src : (N, 2) array
Source coordinates.
dst : (N, 2) array
Destination coordinates.
Returns
-------
success : bool
True, if model estimation succeeds.
"""
self.params = _umeyama(src, dst, True)
return True
@property
def scale(self):
if abs(math.cos(self.rotation)) < np.spacing(1):
# sin(self.rotation) == 1
scale = self.params[1, 0]
else:
scale = self.params[0, 0] / math.cos(self.rotation)
return scale
2维相似度变换公式:
公式中s是缩放因子,齐次变换矩阵是
[[a0 b0 a1]
[b0 a0 b1]
[0 0 1 ]]
参数:
matrix : (3, 3) 数组,可选的齐次变换矩阵
scale : 缩放因子
rotation : 逆时针旋转角度为弧度
translation : (tx, ty) 是一个 array, list or tuple, 转换参数
params : (3, 3) 数组,齐次变换矩阵
除了旋转和平移参数外,相似变换还扩展了具有单个比例因子的欧几里得变换。从一组相应的点估计转换,可以使用总最小二乘法确定过、好和欠的参数,且要求源坐标和目标坐标的数量必须匹配。