Facial detection is a particular case of an object detection. It uses basic detection algorithm which can be used to detect objects of any type (eyes, mouth corners, other facial feature points, hands, cars, etc.) Firstly, the algorithm normalizes snapshot’s brightness and contrast. Secondly, it moves a ‘window’ through a snapshot by gradually changing its left, top, width and height, which is called window-sliding. Every window is analyzed by an object classifier that decides whether a particular region of the snapshot is a face or not. As a result, we are having a list of regions where face (or another object) is detected. Regions are grouped in order to remove duplicated detections of the same object.

Structure of Haar classifier

The object classifier is defined as a set of Haar features with weights and thresholds. A 2-rectangle Haar feature can be defined as a difference of the sum of pixels of areas inside a rectangle, which can be located at any position and scale within the original image. The classifier also uses 3-rectangle features and 4-rectangle features. The values indicate certain characteristics of a particular area of the image. Each feature type can indicate the existence (or absence) of certain characteristics in the image, such as edges or changes in texture. For example, a 2-rectangle feature can indicate where the border lies between a dark region and a light region.

Haar features

The haar-like feature with polarity (+1, -1) and threshold defines a weak classifier. Feature value is multiplied by polarity and compared with the threshold. As a result, we have a boolean decision of classification. The decision of weak classifier has low accuracy (normally, the correct classification rate is 50%-80%). In order to improve accuracy, multiple statistically independent weak classifiers are grouped together. Weighted sum of weak decisions is used to make a final decision of a strong classifier. The features, weights and thresholds are obtained using Adaboost training algorithm offline and stored in XML file. Typical face classifier contains 500-1000 weak classifiers. In order to increase speed of object detection, weak classifiers are grouped in stages, a stage usually contains 4-100 weak classifiers.


As mentioned above, the detection algorithm moves a ‘window’ through a snapshot, and a classification procedure is executed for every window. This can take 500-1000 ms on "slow" computers (Silverlight, C#). When an object is detected, tracking is performed. Tracking procedure detects object only in positions which are adjacent to previously detected object position. Tracking takes much less time than object detection and it can be performed more frequently (with higher FPS).

Training of object classifier using Adaboost (Viola and Jones) algorithm

The training algorithm is launched offline and it takes about a week to train a face detector. As an input it has a list of positive (i.e. face) and negative (i.e. background, non-face) examples of objects. Consequently, it generates a list of weak classifiers which are grouped into stages.

Implementation in Silverlight

Function block diagram of the algorithm is shown below.

Face detection algorithm

The implementation can be downloaded here.

Last edited Dec 27, 2011 at 1:09 PM by asv128, version 1


No comments yet.