Rhonda Software

Highest quality full cycle software development.

Expert in areas of Computer Vision, Multimedia, Messaging, Networking and others. Focused on embedded software development. Competent in building cross-platform solutions and distributed SW systems.

Offer standalone custom solutions as well as integration of existing products. Opened for outsourcing services.

Visit us at: http://www.rhondasoftware.com

Fast & Furious face detection with OpenCV

Posted on : 18-06-2009 | By : rhondasw | In : OpenCV


In OpenCV/Samples there is  facedetect program.  This program can detect  faces on images and video.  It’s very fun, but its speed leaves much to be desired =(.  Of course  with OpenMP,  it works  faster; on Intel Core Duo 2.7GHZ, it works fast;  but will it work fast on ARM? I have big doubts.  I compiled facedetect without OpenMP and on average it takes 600 ms for 640×480 resolution to find one face.   I wanted to find out, if it’s possible to improve this time by software means or not…  After some investigations, code refactoring and improvements, facedetect started to work 2.5 time faster, even on ARM.  Of course, without big quality loss =)

I started investigation with profiling cvHaarDetectObjects on 640×480 image.  Function cvRunHaarClassifierCascade tooks 70% of computation time.  But cvRunHaarClassifierCascade is not so heavy, why it takes so much time? Scanning 20×20 window is moved on X-direction and Y-direction and Scale-direction and on each scanning window, cvRunHaarClassifierCascade is called.  Totally we have 160000 calls!

So to reduce time, we need optimize this triple cycle.  I know several ways:

  1. change parameters in cvHaarDetectObjects function.  Sometimes, it really helps, but let’s resort to such shamanism another time.   I used “default” parameters: 1.1 scale factor, 20×20 window.
  2. use fixed point in algorithm.  We did it here
  3. optimize OpenCV default frontal face cascade.  Cascade generation takes much time and who knows, will it be good or not =)
  4. somehow reduce number of cvRunHaarClassifierCascade calls.  Image contains only several real faces, not 160000 – so all this makes sence.

We have researched a lot of approaches and combination of ways above and got the result (Intel Core Duo 2.7GHZ):


Original face detect Fast face detect
512×768 250×250 up to
512×768 250×250 up to
Total 5444 1872 1748 5276 1872 1748
Fount 5420 1765 5191 1685
Hit rate 99,6% 94,3% 98,4% 90,0%
FP (incorrect found) 57 12 37 18 10 13
False alarm rate 2,1% 0,7%
FN (not found) 23 107 85 187
Average time, ms
not found 623,98 85,43 775,23 139,07 39,48 287,98
one face found 629,53 87,07 1053,49 248,26 42,99 455,31
two or more face found 632,32 88,04 245,12 43,39

Comments (31)

what approach did you use? In previous edition you mentioned skin filter….???

Hi Snik,

Yes, we are using the skin filter. Skin filter reduces false positives a lot but it does performance better up to 1.3 times. It is not enough for our goals. Thus we did find another significant technique. Except the filter, we are using original heuristics which allows to reach speed as mentioned in this article. Unfortunately, our management doesn’t allow to open our technologies, share code and such innovation techniques except result in scientific style but you can discuss with them (http://www.rhonda.ru/eng/feedback) and get full version (api library or even our code) if it is needed for you (sure it could require something from you as well ;))

Sorry that we cannot open it… it is not dependent on us.


> Unfortunately, our management doesn’t allow to open our technologies,
> share code and such innovation techniques except result in scientific style

Are these results published anywhere in the scientific periodicals?
Could you provide me with a reference, or, the best, a copy of your paper?


No, this blog is only one place where we published our results.

I’ve found other perfomance trouble in face detect.c (more precisely in cvHaarDetectObjects) .

It’s not about limited platform. This repeat on win32 and freebsd, and about to multithreading.

I modified facedetect.c so it put result only to console. Facedetect.exe find faces on lena.jpg ~ 0.5 sec on my computer.
But if I run three(3) facedetect.exe instances simultaneously (same lena.jpg) – it works ~ 1.6 sec!
But expected ~0.5 sec (+ thread overhead).
I.e. it works like as three sequential running.

My little investigations say that’s about cvhaarDetectObject(..) method.
Somewhere inside happens something so system process(or thread) is blocked.

This appears for OpenCV 1.0 and last SVN snapshot.
I’ve compile source under VS 2008 and mingw – same result. Also I build facedetect on FreeBSD using current port version – same bad news.

What is the problem? It’s my headache last four days..

Thanks for any suggestions.

I have just compiled facedetect.c with cvsample project from OpenCV 1.1. I just removed code which shows image on screen. My time is 699 ms for three instances. One instance takes 297 ms. So, I didn’t see the same problem on PC (1 core, P4 3.0 GHz).

Please try just use cvsample and facedetect.c without any modifications.

Let me know please your result.


Thanks for reply!
But I guess I made a mistake.. I forgot that face-detecting is very highload operation.

Yes, I get result like your for scale=1.2, min_neighbor=2, flags=0. (AMD 3500 ~ 2.2 GHz)
Now this options is my best for speed and quality.

And if I unterstand correctly, no more way increase speed (for ~same detect quality) without source editing?

Hi Andrey,

I was wonder with your question but I was not able to repeat it and I did think that something missing… Good that your problem was resolved!

>>no more way increase speed
>>(for ~same detect quality) without source editing?

[Aleksey] Basically, you are right. But your approach has impacted the quality. I could suggest to don’t change quality i.e. use default opencv parameters for HaarDetect: setting scale to 1.1 and min_neighbors at least to 3 (or more) but it will reduce the speed essentially!

So, I could only recommend our solution as the best way (sorry for self-marketing but it is true) as described in this article. (if you would like to get our code for this, you could ask our management/marketing … see “about” page for details).

Let me know if you have any more question.


I have to ask one question about your solution program. Basically is your sol is works based on Haar like feature with some other program?

Very intresting.. Why did you remove my comment?
Hiding real or shame problem?

>>Very intresting.. Why did you remove my comment?
>>Hiding real or shame problem?

The comments are being reviewed by admin to avoid spam. Your previous comment is approved. Let me know if you posted other comments and it was removed.


[…] We have changed facedetector and get about 15 fps – which is real time. You can see results  here and […]

Hi Aleksey, can you tell me, how did you get the 99,6% Hit Rate with the cascade(s) given in OpenCV? I’ve made an exprience with 800 images, each of them has 1 (and only one) frontal face, and got a Hit Rate of 30%. Here are the main lines in my programme:
char* cascade_file_name = “c:\\program files\\opencv\\data\\haarcascades\\haarcascade_frontalface_alt_tree.xml”;
CvHaarClassifierCascade* cascade = (CvHaarClassifierCascade*)cvLoad(cascade_file_name,0,0,0);

image = cvLoadImage(image_file_name);
gray = cvCreateImage(cvSize(image->width,image->height),8,1);
faces = cvHaarDetectObjects(gray,cascade,storage,1.1,2,0,cvSize(30,30));

Hi Anh,

We use another cascade + unique parameters.

hello alex
could you tell me whether the cascade which hitrate is 99%+ is cascade train,and may i know the parameters

Hi Aleksey

Thank for your kindness upload your experiment in this blog. Btw I have some questions about your experiment

1. How about the size image in training data set ? all of them (both of positive and negative samples) with size 20×20?

2. I have run my first experiment. Positive data set is come from FRGC data base (700 images with size 24×24) and negative set is come from background image (1394 image swith size 160 x120). I use 40 stages, but unfortunately for the 13th stages the time consuming very long (more than 2 days) so would you please tell me what is my problem?

thank for your help 🙂


Hi, please see http://www.computer-vision-software.com/blog/2009/11/faq-opencv-haartraining/ (FAQ: OpenCV Haartraining). Shortly,

1. Don’t build 40 stages, use 24, 20 or less;

2. All your positive images will be rescaled to the same size during creating vec file so you can use any sizes of positive images maintaining proportions;

3. Negative images must have much bigger size than positive samples, size 160×120 is insufficient, use 1280×1024 or more. If you take small background images, haartraining will not be able to extract negative samples for high stages. The more stages you use, the bigger negative images you need.

4. Haartraining can fall into infinite loop, unfortunately. Try to stop it, change negative images and restart the program. It will start from last successful stage.

Hi Andrey

Great, firstly thank to reply my comment

Btw I still have question about your comment.

Why haartraining can fall in to infinite loop? In my experiment it’s happen if the results of previous stages HR =1 and FA =0. So I try free code from MATLAB center to check adaboost algorithms. But as I know it will never infinite although HR =1 and FA =0. So I don’t know what is the problem. Would you please which part in the haartraing code it makes infinite?

thank for your help


why the negative images should bigger than the positive images .I train Mit nonfaces as negative faces . no error information feedback.

Hi there thanks for your article! I’ve been porting the OpenCV Haar Detection algorithm to our company’s MCU. In your approach for optimizing the detector, did you use the
CV_HAAR_SCALE_IMAGE parameter? For our particular system it was more favorable to scale the images instead of the features, but is there any difference on the quality?

Also in cvSetImagesForHaarClassifierCascade did you turn on the CV_ADJUST_FEATURES and CV_ADJUST_WEIGHTS options?
For CV_ADJUST_FEATURES the source code’s comment said something about aligning blocks, is that really necessary? And I couldn’t quite understand what CV_ADJUST_WEIGHTS was trying to do at all, is there any significant difference in quality with these options?

Thanks in advance.

I am a bit confusing about CV_ADJUST_FEATURES.
Do you know the meaning of CV_ADJUST_FEATURES ? which blocks are aligned?
Thanks in advance!

Hi Aleksey, can you tell me, how to use the “performance.exe” ? I use it like this “performance.exe -data test.xml -info test.txt -w 32 -h 22 -sf 1.1”,but I get the result–0 hits, 0 missed, 0 false, I think there is something wrong with my test.txt. I really
can’t sure how to creat the test.txt. I am annoyed! Can you help me?
thank for your help!


Hi Aleksey,

You got 2.5 times gain on ARM. Was it on dual core or quad core ? And is this gain only due to parallelization ?


Hi Deepak,

“facedetect started to work 2.5 time faster, even on ARM” – it is was on ARM11 and ARM Cortex-A8 – both are single core (no parallel calculation).


Hi, I made opencv facedetection on ARM926 CPU(333MHZ) about 2s. O my PC I was made face detection+face recognition in full HD resolution in 50ms. Can you explain how I can reduse detection time in ARM CPU?


I’m working with opencv on Windows for the start of a project but I wuold like to pass on ARM platform or FPGA for embedded aplications.
I want to make shape recognition on vidéos with high raisolution so I need platform with good performances in this field, and of course compatible with opencv.
So I thought Tegra2 (dual core) from nividia or TI OMAP4430 (dual core) from Texas Instruments but I don’t know if I can instaler opencv on it.
I would like to know if the two platforms are compatible with opencv, if there were other possibilities and what should I check before making my choise.

Thank you for everything.

I just change the parameter cvSize(20, 20) or (10, 10) to (50, 50) and it seems to work a lot faster, but once you go over (50, 50) there is no improvement.
Just to make note i’m using the Kinect as a camera.


Implemented the code for Face Detection and Recognition using opencv in C++. But i am facing one problem, that is, if any one stands behind me it is unable to recognize my face. where i am doing wrong, cant understand. Any idea regarding this problem.

Great study with amazing result.

How you improvise stock OpenCV given cascade?

Or you create your own cascade with samples and training with your own algorithm?

Cause i guess the cascade itself give the big impact to detection rate and its speed.

Do you have academic paper about this study?

I’m trying to use haarcadcade on Texas
Instrument dsp. This dsp has no file system.
I therefore cannot use the cascade . Xml
File. How’s can I load the cascade instead?
I’ll be glad to get any peace of helpfull information.
I’ m a mother to three with a demanding boss.
Waiting… Chaya

Hey can you please let me know how to perform performance profiling in order to find out which part of the code is taking much time and which function is called most of the time. It is extremely urgent. Help me out please.
thankx in advance

Write a comment