OpenCV – Computer Vision Software

Pose Estimation and Activity recognition demo

rhondasw — Fri, 22 Apr 2022 03:27:15 +0000

This demo showcases real-time Human Pose Estimation, based on the Open Pose library, ported onto the camera platform, and designed by Rhonda’s Activity Recognition neural network for human behavior recognition. The two Deep Learning Neural Networks (DNN), along with the video pipeline, run on the Rhonda Software CV22 System on a Module (CV22 SoM).

CV22 SoM – designed in-house as a low-power camera platform, is capable of running multiple neural networks, in addition to providing superior image quality. The core of the SoM platform is an Ambarella CV22 System on a Chip – an ARM-based Image Signal Processor with a DNN inference acceleration engine, implemented on a single crystal.

Both CV applications run simultaneously. The Pose Estimation network performs human body detection in a full 4K frame, and people’s figures recognized in the camera’s field of view are visualized with “skeleton-like” pose markups. A blob of pixels around a foreground skeleton selected within the region of interest is passed to the Activity recognition DNN.

The activity recognition algorithm is a simple, yet robust demo built by Rhonda’s CV team from scratch, and trained to identify several activity types: walking, standing, welcome hand gestures (high-five), jumping jacks, body-weight squats. Recognized Activity for the foreground body is displayed in the upper- left corner of the screen.

After the initial port onto the CV22 platform, Open Pose algorithm delivered a frame rate of 1 frame per second. It took a number of optimization procedures performed by Rhonda’s CV experts (such as pruning, quantization, and dedicated retraining) to achieve a fifteen fold acceleration in performance.

The system can be trained for different use cases, such as security, elderly care, production automation, sports activity analysis, and more.

For demo and testing purposes we’ve deployed a setup with HDMI video injection to show platforms’ recognition capabilities with additional activities.

As a road-safety application example, Rhonda Software has assembled the Pedestrian detection demo, based on the same optimized port of the Open Pose library. The algorithm is applied to automotive conditions to detect pedestrians as participants of road traffic.

Compiling OpenCV for Android using NDK 3

Alexander Permyakov — Thu, 22 Apr 2010 06:01:50 +0000

Build platform: Ubuntu 9.10
Target platform: Android

Download and prepare OpenCV library source code.

1. Download the latest version of OpenCV (http://sourceforge.net/project/showfiles.php?group_id=22870).

2. As build platform is Linux, select linux version (for example OpenCV2.1.0.tat.bz).

3. Unpack somewhere to home dir.

Download and prepare cross-compiler

1. Download Android NDK 3 for Linux (http://developer.android.com/sdk/ndk/index.html)

2. Unpack it to ~/android_ndk_3/

3. Then run ~/android_ndk_3/build/host-setup.sh but first fix the error in line 119

Change

if [ “$result” = “Pass” ] ; then

if [ “$result” == “Pass” ] ; then

4. Do/install whatever needed to let host-setup.sh complete successful.

Create NDK project/Modify Makefiles

There is one big issue with NKD toolchein. It has trimmed stdc library which does not contain STL. Because of that some files (like cvkdtree.cpp in cv) can not be compiled since they use vector, list and other stuff. The solution is to compile STL from source code. In my OpenCV NDK project I used STL sources from uClibc (http://www.uclibc.org).

The simpliest way to start your OpenCV NDK project is to update hello-jni project with OpenCV source files.

The ~/android_ndk_3/apps/hello-jni/project/jni folder of hello-jni project may look like this

The ~/android_ndk_3/apps/hello-jni/project/jni/Android.mk may looks like this

LOCAL_PATH := $(APPS_PATH)/cv/src
LOCAL_C_INCLUDES := $(APPS_PATH)/cv/hdr $(APPS_PATH)/cxcore/hdr $(APPS_PATH)/stl/hdr

LOCAL_MODULE := cv
LOCAL_SRC_FILES := cvkdtree.cpp cvaccum.cpp cvadapthresh.cpp cvapprox.cpp cvcalccontrasthistogram.cpp cvcalcimagehomography.cpp cvcalibinit.cpp cvcalibration.cpp cvcamshift.cpp cvcanny.cpp cvcolor.cpp cvcondens.cpp cvcontours.cpp cvcontourtree.cpp cvconvhull.cpp cvcorner.cpp cvcornersubpix.cpp cvderiv.cpp cvdistransform.cpp cvdominants.cpp cvemd.cpp cvfeatureselect.cpp cvfilter.cpp cvfloodfill.cpp cvfundam.cpp cvgeometry.cpp cvhaar.cpp cvhistogram.cpp cvhough.cpp cvimgwarp.cpp cvinpaint.cpp cvkalman.cpp cvlinefit.cpp cvlkpyramid.cpp cvmatchcontours.cpp cvmoments.cpp cvmorph.cpp cvmotempl.cpp cvoptflowbm.cpp cvoptflowhs.cpp cvoptflowlk.cpp cvpgh.cpp cvposit.cpp cvprecomp.cpp cvpyramids.cpp cvpyrsegmentation.cpp cvrotcalipers.cpp cvsamplers.cpp cvsegmentation.cpp cvshapedescr.cpp cvsmooth.cpp cvsnakes.cpp cvstereobm.cpp cvstereogc.cpp cvsubdivision2d.cpp cvsumpixels.cpp cvsurf.cpp cvswitcher.cpp cvtables.cpp cvtemplmatch.cpp cvthresh.cpp cvundistort.cpp cvutils.cpp dummy.cpp

LOCAL_STATIC_LIBRARIES := cxcore stl

include $(BUILD_STATIC_LIBRARY)

############################
# cvaux
############################
include $(CLEAR_VARS)

LOCAL_PATH := $(APPS_PATH)/cvaux/src
LOCAL_C_INCLUDES := $(APPS_PATH)/cvaux/hdr $(APPS_PATH)/cv/hdr $(APPS_PATH)/cv/src $(APPS_PATH)/cxcore/hdr $(APPS_PATH)/stl/hdr

LOCAL_MODULE := cvaux
LOCAL_SRC_FILES := camshift.cpp cvaux.cpp cvauxutils.cpp cvbgfg_acmmm2003.cpp cvbgfg_codebook.cpp cvbgfg_common.cpp cvbgfg_gaussmix.cpp cvcalibfilter.cpp cvclique.cpp cvcorrespond.cpp cvcorrimages.cpp cvcreatehandmask.cpp cvdpstereo.cpp cveigenobjects.cpp cvepilines.cpp cvface.cpp cvfacedetection.cpp cvfacetemplate.cpp cvfindface.cpp cvfindhandregion.cpp cvhmm.cpp cvhmm1d.cpp cvhmmobs.cpp cvlcm.cpp cvlee.cpp cvlevmar.cpp cvlevmarprojbandle.cpp cvlevmartrif.cpp cvlines.cpp cvlmeds.cpp cvmat.cpp cvmorphcontours.cpp cvmorphing.cpp cvprewarp.cpp cvscanlines.cpp cvsegment.cpp cvsubdiv2.cpp cvtexture.cpp cvtrifocal.cpp cvvecfacetracking.cpp cvvideo.cpp decomppoly.cpp dummy.cpp enmin.cpp extendededges.cpp precomp.cpp vs/bgfg_estimation.cpp vs/blobtrackanalysis.cpp vs/blobtrackanalysishist.cpp vs/blobtrackanalysisior.cpp vs/blobtrackanalysistrackdist.cpp vs/blobtrackgen1.cpp vs/blobtrackgenyml.cpp vs/blobtrackingauto.cpp vs/blobtrackingcc.cpp vs/blobtrackingccwithcr.cpp vs/blobtrackingkalman.cpp vs/blobtrackinglist.cpp vs/blobtrackingmsfg.cpp vs/blobtrackingmsfgs.cpp vs/blobtrackpostprockalman.cpp vs/blobtrackpostproclinear.cpp vs/blobtrackpostproclist.cpp vs/enteringblobdetection.cpp vs/enteringblobdetectionreal.cpp vs/testseq.cpp

# failed to compile
#cv3dtracker.cpp

LOCAL_STATIC_LIBRARIES := cv cxcore stl

include $(BUILD_STATIC_LIBRARY)

The ~/android_ndk_3/apps/hello-jni/Application.mk file needs to be updated as follows

To build the project go to ~/android_ndk_3 and type

make APP=hello-jni

Of course there will be compile issues. Understand and fix them. Easiest cases are related to syntax mismatch between different compilers. In more complicated cases some code should be commented out. For example usage of libs with optimizations for Intel processor is not needed for ARM.

HighGui is also can be built but only partially. Simply remove files that causing problems from Android.mk. In my case the rest of files were enough to use cvLoadImage function for bmp file.

Running facedetect openCV example

There is no way to run native C code as separate application on Android. Instead native C functions can be called from Java apps. Because of that I made native function FaceDetect using OpenCV example application facedetect.c.

The declaration looks like this

This function takes YUV_NV21 buffer (preview from camera captured by Java app), converts it to BGRA8888, searches the faces, draws circles around the faces and returns updated RGBA8888 buffer back to Java app. Java app can draw it on the screen.

USD banknotes recognition

Yuri Vashchenko — Tue, 15 Dec 2009 11:11:22 +0000

The currency recognition demo application works under Windows XP, Intel P4 3GHz. Quality of recognition: 85%. The solution is cross-platform. The application was tested on Linux, ARM11 and on Linux/Windows, Intel Atom.

FAQ: OpenCV Haartraining

rhondasw — Tue, 10 Nov 2009 06:59:44 +0000

Hi All, before posting your question, please look at this FAQ carefully! Also you can read OpenCV haartraining article. If you are sure, there is no answer to your question, feel free to post comment. Also please, put comments about improvement of this post. This post will be updated, if needed.

Positive images

Why positive images are named so?

Because a positive image contains the target object which you want machine to detect. Unlike them, a negative image doesn’t contain such target objects.

What’s vec file in OpenCV haartraining?

During haartraining positive samples should have the same width and height as you define in command “-w -h size”. So original positive images are resized and packed as thumbs to vec file. Vec file has header: number of positive samples, width, height and contain positive thumbs in body.

Is it possible to merge vec files?

Yes, use Google, there are free tools, written by OpenCV’s community.

I have positive images, how create vec file of positive samples?

There is tool in C:\Program Files\OpenCV\apps\HaarTraining\src createsamples.cpp. Usage:

createsamples -info positive_description.txt -vec samples.vec -w 20 -h 20

What’s positive description file?

The matter is that, on each positive image, there can be several objects. They have bounding rectangles: x,y, width, height. So you can write such description info of image:

positive_image_name num_of_objects x y width height x y width height …

Text file, which contains such info about positive images is called description file. So during vec file generation, really objects are packed, but not whole image. Essentially vec file is needed to speed up machine learning.

Do I always need description file, even if I have only one object on a image?

Yes, with createsamples you need description file. If you have only one object, it’s bounding rectangle may be bounding rectangle of whole image. If you want, write your own tool for vec file generation =)

Should lightning conditions and background be various on positive images?

Yes, it’s very important. On each positive image, beside object, there is background. Try to fill this background with random noise, avoid constant background.

How much background should be on positive image?

If you have much background pixels on your positive images in comparison with object’s pixels – it’s bad since the haartraining could remember the background as feature of positive image.

If you don’t have background pixels at all – it’s also bad. There should be small background frame on positive image

Should all original positive images have the same size?

No, original images can have any size. But it’s important that width, height of this rectangle have the same aspect ratio as -w -h.

What’ s -w and -h should I put in createsamples? Should it be always square?

You can put any value to -w and -h depend on aspect ratio of the target object which you want to detect. But objects of smaller size will not be detected! For faces, commonly used values are 24×24, 20×20. But you may use 24×20, 20×24, etc.

Errors during vec file generation: Incorrect size of input array, 0 kb vec file,

-First check you description file: positive_image_name should be absolute path name without spaces like “C:\content\image.jpg” not “C:\con tent\image.jpg” or relative path name.

-Avoid empty lines in description file

-Resolution of original positive image file should be not less, then -w -h parameters you put.

-Check that positive images are available in your file systems and not corrupted.

-There can be unsupported formats. Jpeg, Bmp, PPM are supported!

Example of vec file generation!

Let’s working directory be C:\haartraining. In it there is createsamples.exe. There is folder

C:\haartraining\positives. So create description file positive_desc.txt.

positives\image1.jpg 1 10 10 20 20

positives\image2.jpg 2 30 30 50 50 60 60 70 70

C:\haartraining\positives\image1.jpg 1 10 10 20 20

C:\haartraining\positives\image2.jpg 2 30 30 50 50 60 60 70 70

You should avoid empty lines and empty space in image’s path

createsamples -info positive_desc.txt -vec samples.vec -w 20 -h 20

Negative images

What negative images should I take?

You can use any image of OpenCV supported formats, which does not contain target objects (which are present on positive images). But they should be various – it’s important! Good enough database is here

Should negative images have the same size?

No. But the size should not be less than -w -h, which were put during vec file generation.

What’s description file for negative image?

It’s just text file, often called negative.dat, which contains full path to negative images like:

image_name1.jpg

image_name2.jpg

Avoid empty lines in it.

How many negative/positive image should I take?

It depends on your task. For real cascades there should be about 1000 positive images and 2000 negative images e.g.

Good enough proportion is positive:negative = 1:2, but it’s not hard rule! I would recommend first to use small number of samples, generate cascade, test it, then enlarge number of samples.

Launch haartraining.exe (OpenCV\apps\HaarTraining\src)

Example of launching

Working directory is C:\haartraining with haartraining.exe tool and samples.vec file.

Let’s negative images are in C:\haartraining\negative, in this case negative.dat should be like this:

negative\neg1.jpg

negative\neg2.jpg

…

So in C:\haartraining launch this: haartraining -data haarcascade -vec samples.vec -bg negatives.dat -nstages 20 -minhitrate 0.999 -maxfalsealarm 0.5 -npos 1000 -nneg 2000 -w 20 -h 20 -nonsym -mem 1024

w h is the same, you put during vec file generation
npos nneg – number of positive samples and negative samples
mem – RAM memory, that program may use
maxfalsealarm – maximum false alarm, that stage may have. If big false alarm – it could be bad detection system
minhitrate – minimal hit rate, that should stage have at least
nstage – number of stages in cascade

What’ s falsealarm and hitrate of stage?

You should read theory of adaboost about strong classifier. Stage is strong classifier. In short:

For example you have 1000 positive samples. You want your system to detect 900 of them. So desired hitrate = 900/1000 = 0.9. Commonly, put minhitrate = 0.999
For example you have 1000 negative samples. Because it’s negative, you don’t want your system to detect them. But your system, because it has error, will detect some of them. Let error be about 490 samples, so false alarm = 490/1000 = 0.49. Commonly,put false alarm = 0.5

Are falsealarm and hitrate depend on each other?

Yes, there is dependency. You could not put minhitrate = 1.0 and maxfalsealarm = 0.0. .

Firstly, the system builds classifier with desired hitrate, then it will calculate it’s falsealarm, if the false alarm is higher than maxfalsealarm, the system will reject such classifier and will build the next one. During haartraining you may see such:

N |%SMP|F| ST.THR | HR | FA | EXP. ERR|
+—-+—-+-+———+———+———+———+
| 0 |25%|-|-1423.312590| 1.000000| 1.000000| 0.876272|

HR – hitrate

FA – falsealarm

What’s falsealarm and hitrate of whole cascade?

Cascade is linked list (or three) of stages. That’s why:

False alarm of cascade = false alarm of stage 1* false alarm of stage 2* …
Hit rate = hitrate of stage 1 * hitrate of stage 2* …

How many stages should be used?

If you set big number of stages, then you will achieve better false alarm, but it will take more time for generating cascade.
If you set big number of stages, then the detection time could be slower
If you set big number of stages, then the worse hitrate will be (0.99*0.99*… etc). Commonly 14-25 stages are enough
It’s useless to set many stage, if you have small number of positive, negative samples

What’s weighttrimming, eqw, bt, nonsym options?

Really all these parameters are related to Adaboost, read theory. In short:

nonsym – If you positive samples are not X or Y symmetric, put -nonsym, -sym is default!
eqw – if you have different number of pos and neg images, it’s better to put no eqw
weighttrimming – for calculation optimization. It can reduce calculation time a little, but quality may be worse
bt – what Adaboost algorithm to use: Real AB, Gentle AB, etc.

What’s minpos, nsplits, maxtreesplits options?

These parameters are related to clustering. In Adaboost different week classifier may be used: stump-based or tree-based. If you choose nsplits > 0, tree-based will be used and you should set up minpos and maxtreesplits.

nsplits – minimun number of nodes in tree
maxtreesplits – maximum number of nodes in tree. If maxtreesplits < nsplits, tree will not be built
minpos – number of positive images, that can be used by one node during training. All positive images are splitted between nodes. Generally minpos should be not less than npos/nsplits.

Errors and stranges during haartraining!

Error (valid only for Discrete and Real AdaBoost): misclass – it’s warning, but no error. Some options are specific to D and R Adaboost. So your haartraining is ok.
Screen is filled with such | 1000 |25%|-|-1423.312590| 1.000000| 1.000000| 0.876272| – your training is cycled, restart it. First column should have value < 100
cvAlloc fails. Our of memory – you give too much negative images or sample.vec is too big. All these pictures are loaded to RAM.
Pay attention you put the same -w and -h, as during vec file generation
Pay attention, that number of positive samples and negative samples, you put in -npos -nneg are really available
Avoid empty line in negative.dat file
Required leaf false alarm rate achieved. Branch training terminated – it’s impossible to build classifier with good false alarm on this negative images. Check your negative images are really negative =), maxfalsealarm should be in [0.4-0.5]

OpenCV XML haarcascade

During haartraining, there are txt file in haarcascade folder, how can we get XML from them?

There is OpenCV/samples/c/convert_cascade.c. Use like:

convert_cascade –size=”20×20″ haarcascade haarcascade.xml

How can I test generated XML cascade?

There is OpenCv/apps/HaarTraining/src /perfomance.cpp. You need have positive images(not used during training) and positive description file. Use like:

performance -data haarcascade -w 20 -h 20 -info positive_description.txt -ni

performance -data haarcascade.xml -info positive_description.txt -ni

Time and Speed of haar cascade generation

Average time to generate cascade on PC?

It depends on task and your machine. I generated cascade for face detection, for this used such parameters: -nstages 20 -minhitrate 0.999 -maxfalsealarm 0.5 -npos 4000 -nneg 5000 -w 20 -h 20 -nonsym -mem 1024. It took 6 days on Pentium 2.7GHZ 2GB RAM.

What is OpenMP?

“The OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++ and Fortran on many architectures, including Unix and Microsoft Windows platforms“. If you have MT processor, you can use it. In code you should add OpenMP defines and put compile options. For example in VisualStudio2005: Properties->C/C++->Language->OpenMP support

Is it possible to improve speed of haartraining?

Yes, one of possible ways is to use parallel programming. We have realized OpenCV haartraining using MPI for linux cluster. You can read it here

Object detection with OpenCV XML cascades

Is it possible to detect rotated faces?

Yes. It is impossible to generate cascade, which can detect face in all orientations. But you can generate cascade for each orientation separately. For this you need positive content of rotated faces. You can try to generate cascade with OpenCV , add -mode ALL, with it tilted haar feature will be used. But it’s badly implemented, at least in OpenCV 1.1. If you want you can add your own feature to opencv haartraining – it’s not too hard.

Another approach is to write head pose estimator. Then rotate your pictures, so that you have frontal face and detect it with OpenCV default face cascade

Is it possible to recognize gender, attention, race with Haar features?

We tried, but could not do it with OpenCV haartraining. That’s why for such classification, we used our own gender and attention classificators. Of course you can use Adaboost for this task, which is implemented in haartraining, but we did not get good results.

Is it possible to detect faces in real time?

Yes. On PC default OpenCV facedetector takes about 200 ms for 640×480 picture, about 5fps – it’s not real time. We have changed facedetector and get about 15 fps – which is real time. You can see results here and here

Detect attention, please!

rhondasw — Mon, 09 Nov 2009 01:13:35 +0000

Nowadays, different audience measurement systems become more and more popular. They are used in active advertising, for gathering statistics, etc. One of the key features of these smart systems is attention detection. For advertisers, for instance, it seems very important to know, how much attention commercial attracts. In this article, I will describe attention detector module, used in our Audience Measurement system.

We started our work with attempts to understand, what attention we want to detect. On the one hand, It seems very easy to say, if person has attention or not. On the other hand, it’s very difficult to formalize: What attention is. In some articles, it’s considered to detect attention based on eyes information. But if person wears sunglasses? Another “criteria” is to to use nose information: Where nose points at! For our business case nose information is not enough either. More over, nose can also be “hidden”.

That’s why our attention is based on head pose information. We collected face images, which in our opinion, have attention and don’t have attention. To be honest, most of images with attention were frontal faces and vice versa. These two sets resolve task.

To teach machine to detect attention , we need a machine learning algorithm. We have Viola Jones one and if it can detect face/non face, why not use it to detect attention/non attention? Learning samples we have… So with Adabost, we chose 100 Haar-like features. With them, each image is converted to 100-dimensional vector. To classify it, we used C4.5. Self-test was very good: 97% accuracy. But when started testing on real video, we had bad result: 60% accuracy. The problems begun, when lightning conditions were modified or face shifted some pixels, even despite the fact that, we used normalization like in OpenCV Viola Jones algorithm. The matter is that, face and non-face images are very different, but faces with attention and with non-attention are very similar.

Thus, we needed lightning-invariant method, which is not so sensitive to XY-shifting. We developed our own template-matching method. First, using PCA, we get templates of face. With these templates, each face is converted to N-dimensional vector, which is classified with SVM. Accuracy of our attention system is about 90%. Its working you can see in our Audience Measurement system.

Object Recognition (Nike logo)

Aleksey Kodubets — Thu, 22 Oct 2009 09:43:40 +0000

nike.avi

This is a demo video of the invariant orientation and scale fast object detection algorithm. The algorithm is a robust in cases when the object is deformed a little

The algorithm is a cross-platform solution.

Performance:

on ARM11 530MHz, the algorithm gives 1 fps for 640×480 frame;
on Intel P4 3Hz, the algorithm gives 12 fps and more for 640×480 frame.

Quality: 86%.

Audience Measurement (face tracker, gender recognition, attention recognition, etc)

Aleksey Kodubets — Mon, 05 Oct 2009 08:23:13 +0000

am-3.avi

This is a demo video of Rhonda Audience Measurement system (MyAudience product, www.MyAudience.com).

The system is able to recognize gender of a person. So, red rectangle is for a woman, dark blue rectangle is for a man. The quality of gender recognition algorithm is 90%.

The system works on Intel Atom: 10 fps and higher, and on Intel Pentium 4: 15 fps and higher. Besides, it is a cross-platform solution. It was tested on both Windows XP and Linux, and also it was tested on ARM Cortext-A8.

No accelerations (CUDA, fixed point, etc) are used! So, the solution has great potential for improvements.

Cross-platform solution for getting MJPEG stream from AXIS ip-camera (AXIS 211M)

Andrew Chen — Sat, 29 Aug 2009 07:55:42 +0000

This paper describes how-to get MJPEG stream from AXIS ip-camera in your C++ application. My approach is a cross-platform solution and much better than solution from http://www.computer-vision-software.com/blog/2009/04/how-to-get-mjpeg-stream-from-axis-ip-cameras-axis-211m-and-axis-214-ptz-as-camera-device-in-opencv-using-directshow/.

Dependencies

We used boost library (boost/asio, http://www.boost.org). There are very useful network interfaces:

boost::asio::io_service: the io_service class provides the core I/O functionality for users of the asynchronous I/O objects;
boost::asio::ip::tcp::socket: the socket class has a function that will retrieve the remote endpoint;
boost::asio::ip::tcp::resolver: the resolver class init net-connection.

Network interface implementing

I created few useful network functions over “boost::asio”: connecting, sending and receiving packets.

The network interface will be implemented via boost objects. Definitions are:

boost::asio::io_service m_ios;
boost::asio::ip::tcp::socket m_socket(m_ios);
boost::asio::ip::tcp::resolver m_resolver(m_ios);

So, the functions like:

void connect(void)
{
    boost::asio::ip::tcp::resolver::iterator   end_point;
    boost::system::error_code                   ecode;
    boost::asio::ip::tcp::resolver::query     query(boost::asio::ip::tcp::v4(),
                                                                   “192.168.0.1”, “80”);

    end_point = m_resolver.resolve(query);
    m_socket.connect(*iterator, ecode);

    assert(!ecode);
}

int send(void *buffer, int buf_size)
{
    boost::system::error_code   ecode;
    boost::uint32_t                  sent_size = 0;

    sent_size = boost::asio::write(m_socket,
                                             boost::asio::buffer(buffer, buffer_size),
                                             boost::asio::transfer_all(),
                                             ecode);
    if(ecode)
        printf("socket write operation failed\n");

    return sent_size;
}

int receive(void *buffer, int buf_size)
{
    boost::system::error_code   ecode;
    int                                   received_size = 0;

    received_size = boost::asio::read(m_socket,
                                                  boost::asio::buffer(buffer, buffer_size),
                                                  boost::asio::transfer_at_least(1),
                                                  ecode);

    if(ecode)
        printf("socket read operation failed\n");

    return received_size;
}

Also there is disconnect function, it is written in the same manner.

Note: this is reductive version of code for better interpretation.

Getting JPEG frame from ip camera

We should send request to the camera for getting mjpeg stream. The command like:

“GET /axis-cgi/mjpg/video.cgi?resolution=x&fps=\r\n\r\n”,

where:

– width of requested frame;
– height of requested frame;
– the requested number of frames per second;
“\r\n\r\n” – end marker of request.

Example is:

“GET /axis-cgi/mjpg/video.cgi?resolution=640×480&fps=15\r\n\r\n”

The device sends response: 183 bytes. We need to check the response to make sure that is ok.

E.g. successful response:

“HTTP/1.0 200 OK\r\nCache-Control: no-cache\r\nPragma: no-cache\r\nExpires: Thu, 01 Dec 1994 16:00:00 GSM\r\nConnection: close\r\nContent-Type: multipart/x-mixed-replace; boundary=–myboundary”

E.g. unsuccessful response:

“HTTP/1.0 501 Not implemented\r\nDate: Sat, 29 Aug 2009 14:00:10 GSM\r\nAccept-Ranges: bytes\r\nConnection: close\r\n…”

As you can see, the successful response must contain “HTTP/1.0 200 OK” string at beginning.

The value of “boundary” (“–myboundary”) field is most important, because this string will be used as separator further.

The frame will be received part by part. The first packet size is 67 bytes it is meta-information. E.g.:

“–myboundary\r\nContent-Type: image/jpeg\r\nContent-Length: 56296\r\n\r\n”

“–myboundary” string at the begin is confirmation of beginning of new frame. Also, we should read and save value of the field “Content-Length” (56296). It is size of compressed jpeg frame. The second and following packets are jpeg-picture essentially. We receive few packets with jpg data and save each packet to memory buffer and calculate total size of received packets, and if the total size is equal to value of “Content-Length”, it means that full picture is received and the memory buffer contains it. Now, you can save the memory buffer to the disc-storage (into “123.jpg” file e.g.), and open it with any graphic-viewer and make sure that is usual jpg image.

Note: the jpeg picture should contain the end marker “\r\n” (2 bytes), so I recommend you to receive 2 bytes more.

Next frame can be received with the same aproach: the first package size is 67 bytes…

Note: symbols “\r\n” – 2 byte 0x0A and 0x0D accordingly.

Also, I recommend to develop separated thread for getting JPG frames from ip camera in order to avoid losing of connection…

We tested this solution under MS Windows, Linux, Intel P4 and also under Intel Atom and ARM Cortex-A8. All works fine.

We are using this approach in my module which decodes each JPG frame from Axis camera and convertes it to OpenCV IplImage (BGR frame).

P.S. Besides, we developed solutions for Arecont and ACTi ip-cameras. Sure, the implementations are different, but common idea is the same.

Fast & Furious face detection with OpenCV

rhondasw — Thu, 18 Jun 2009 04:06:49 +0000

In OpenCV/Samples there is facedetect program. This program can detect faces on images and video. It’s very fun, but its speed leaves much to be desired =(. Of course with OpenMP, it works faster; on Intel Core Duo 2.7GHZ, it works fast; but will it work fast on ARM? I have big doubts. I compiled facedetect without OpenMP and on average it takes 600 ms for 640×480 resolution to find one face. I wanted to find out, if it’s possible to improve this time by software means or not… After some investigations, code refactoring and improvements, facedetect started to work 2.5 time faster, even on ARM. Of course, without big quality loss =)

I started investigation with profiling cvHaarDetectObjects on 640×480 image. Function cvRunHaarClassifierCascade tooks 70% of computation time. But cvRunHaarClassifierCascade is not so heavy, why it takes so much time? Scanning 20×20 window is moved on X-direction and Y-direction and Scale-direction and on each scanning window, cvRunHaarClassifierCascade is called. Totally we have 160000 calls!

So to reduce time, we need optimize this triple cycle. I know several ways:

change parameters in cvHaarDetectObjects function. Sometimes, it really helps, but let’s resort to such shamanism another time. I used “default” parameters: 1.1 scale factor, 20×20 window.
use fixed point in algorithm. We did it here
optimize OpenCV default frontal face cascade. Cascade generation takes much time and who knows, will it be good or not =)
somehow reduce number of cvRunHaarClassifierCascade calls. Image contains only several real faces, not 160000 – so all this makes sence.

We have researched a lot of approaches and combination of ways above and got the result (Intel Core Duo 2.7GHZ):

	Original face detect			Fast face detect
	ColorFERET frontal	LabeledFaces InTheWild	NoFaces (Negative)	ColorFERET frontal	LabeledFaces InTheWild	NoFaces (Negative)
	512×768	250×250	up to 1280×1024	512×768	250×250	up to 1280×1024

Total	5444	1872	1748	5276	1872	1748

Fount	5420	1765		5191	1685
Hit rate	99,6%	94,3%		98,4%	90,0%
FP (incorrect found)	57	12	37	18	10	13
False alarm rate			2,1%			0,7%
FN (not found)	23	107		85	187

Average time, ms
not found	623,98	85,43	775,23	139,07	39,48	287,98
one face found	629,53	87,07	1053,49	248,26	42,99	455,31
two or more face found	632,32	88,04		245,12	43,39

Parallel world of OpenCV (HaarTraining)

rhondasw — Wed, 03 Jun 2009 01:25:37 +0000

If you want to generate cascade with OpenCV training tools, you should be ready for waiting plenty of time. For example, on training set: 3000 positive / 5000 negative, it takes about 6 days! to get cascade for face detection. I wanted to generate many cascades with different training sets, also I added my own features to standart OpenCV’s ones and refactor algorithms a little bit. So waiting for 6 days to understand, that your cascade does nothing good =) was really anoying. To reduce time, I chose paralleling methods.

OpenMP.

In OpenCV code supports OpenMP. OpenMP is library, which allows to run program in several threads. All this makes sence, if you have appropriate processor like Intel Core Duo or with Hyper Threading support at least.

The advantage of this method is that, it’s already implemented and, I really believe, debugged in OpenCV. OpenMP will speed up cascade generation – 4 days instead of 6 on my Intel Core2 1.8GHZ 2GB.

MPI.

We constructed Linux-based cluster from 11 machines with configuration: 2.7GHZ processor with 2GB RAM. Computers are linked via 100 Ethernet LAN.

OpenCV internal data structures are matrix and vectors – really good for paralleling. So we decided to add MPI API calls to places with OpenMP defines – so just clone OpenMP schemas, who knows why we did so =) – hurry I suppose. In this way we commonly paralleled loops. But MPI does not have shared memory, unlike OpenMP, so data(MBs of traffic) synchronization time over Ethernet LAN brought computation speed-ups to nothing. We understood, that for MPI we needed parallel schema, in which data synchronization would be small.

First, I wanted to investigate, what functions take most of all time – printf profiling in cvCreateTreeCascadeClassifier helped me. And what do you think? Function icvGetHaarTrainingDataFromBG is hero of the occasion – computation time was 9 hours on 11th cascade stage! Unlike it, icvGetHaarTrainingDataFromVec took about 10 minutes. The matter is that, positive samples are resized to 20×20, when make training vec file and each picture is just run through cascade. Negative samples have original resolution and it’s various, that’s why each picture is scanned with scalling 20×20 window to find false-positive, like haardetect does. The process is stopped, when we have required number of false-positive pictures. To reduce time, we needed to parallel icvGetHaarTrainingDataFromBG, but avoiding large data synchronization.

icvGetHaarTrainingDataFromBG works in such way:

it gets negative samples
found false positive until required number is reached
return false-positives

If we shuffle negative samples and then call icvGetHaarTrainingDataFromBG, what will happen? Anything bad? In output we will have another false-positive pictures, but algorithm in whole will work correctly and generate right cascade. So we decided to split negative samples into 11 parts(11 machines in cluster) and each cluster calls cvGetHaarTrainingDataFromBG on it’s own negative set, then clusters outputs are joined together.

Computation time was accelerated much, instead of 6 days, cascade was generated within 21 hours! With perfomance tool we compare our cascade with one, generated on single machine with the same training set. Results are very similar.