{"id":70,"date":"2009-06-03T11:25:37","date_gmt":"2009-06-03T01:25:37","guid":{"rendered":"http:\/\/www.computer-vision-software.com\/blog\/?p=70"},"modified":"2009-12-28T18:12:56","modified_gmt":"2009-12-28T08:12:56","slug":"parallel-world-of-opencv","status":"publish","type":"post","link":"http:\/\/www.computer-vision-software.com\/blog\/2009\/06\/parallel-world-of-opencv\/","title":{"rendered":"Parallel world of OpenCV (HaarTraining)"},"content":{"rendered":"<p style=\"text-align: justify; padding-left: 30px;\">If you want to generate cascade with OpenCV training tools, you should be ready for waiting plenty of time. For example, on training set: 3000 positive \/ 5000 negative, it takes about <strong>6 days<\/strong>! to get cascade for face detection.\u00a0 I wanted to generate many cascades with different training sets, also I added my own features to standart OpenCV&#8217;s ones\u00a0 and refactor algorithms a little bit.\u00a0 So waiting for 6 days to understand, that your cascade does nothing good =) was really anoying.\u00a0 To reduce time, I chose paralleling methods.<\/p>\n<p style=\"text-align: justify; padding-left: 30px;\">\n<p style=\"text-align: justify;\"><!--more--><\/p>\n<h3 style=\"text-align: justify;\"><strong>OpenMP.<\/strong><\/h3>\n<p style=\"text-align: justify;\">In OpenCV code supports OpenMP.\u00a0 OpenMP is library, which allows to run program in several threads.\u00a0 All this makes sence, if you have appropriate processor like Intel Core Duo or with Hyper Threading support at least.<\/p>\n<p style=\"text-align: justify;\">The advantage of this method is that, it&#8217;s already implemented and, I really believe, debugged in OpenCV.\u00a0 OpenMP will speed up cascade generation &#8211; 4 days instead of 6 on my Intel Core2\u00a0 1.8GHZ\u00a0 2GB.<\/p>\n<p><strong>MPI.<\/strong><\/p>\n<p style=\"text-align: justify;\">We constructed Linux-based cluster from 11 machines with\u00a0 configuration:\u00a0 2.7GHZ processor with 2GB RAM.\u00a0 \u00a0 Computers are linked via 100 Ethernet LAN.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-73\" title=\"2\" src=\"http:\/\/www.computer-vision-software.com\/blog\/wp-content\/uploads\/2009\/06\/2.jpg\" alt=\"2\" width=\"921\" height=\"365\" srcset=\"http:\/\/www.computer-vision-software.com\/blog\/wp-content\/uploads\/2009\/06\/2.jpg 921w, http:\/\/www.computer-vision-software.com\/blog\/wp-content\/uploads\/2009\/06\/2-300x118.jpg 300w\" sizes=\"auto, (max-width: 921px) 100vw, 921px\" \/><\/p>\n<p style=\"text-align: justify;\">OpenCV internal data structures are matrix and vectors &#8211; really good for paralleling.\u00a0 So we decided to add MPI API calls to places with OpenMP defines &#8211; so just clone OpenMP schemas, who knows why we did so =) &#8211; hurry I suppose. \u00a0 In this way we commonly paralleled\u00a0 loops. <span style=\"text-decoration: underline;\">But MPI does not have shared memory<\/span>, unlike OpenMP,\u00a0 so data(MBs of traffic) synchronization time over Ethernet LAN brought computation speed-ups to nothing.\u00a0 We understood, that\u00a0 for MPI we needed parallel schema, in which data synchronization would be small.<\/p>\n<p style=\"text-align: justify;\">First, I wanted to investigate, what functions take most of all time &#8211;\u00a0 printf\u00a0 profiling in cvCreateTreeCascadeClassifier helped me.\u00a0 And what do you think?\u00a0 Function icvGetHaarTrainingDataFromBG is hero of the occasion &#8211; computation time was 9 hours on 11th cascade stage! Unlike it, icvGetHaarTrainingDataFromVec\u00a0 took about 10 minutes. The matter is that, positive samples are resized to 20&#215;20, when make training vec file and each picture is just\u00a0 run through\u00a0 cascade. Negative samples have original resolution and it&#8217;s various, that&#8217;s why each picture is scanned with scalling 20&#215;20 window to find false-positive, like haardetect does.\u00a0 The process is stopped, when we have required number of false-positive pictures.\u00a0 To reduce time,\u00a0 we needed to parallel icvGetHaarTrainingDataFromBG, but <span style=\"text-decoration: underline;\">avoiding large data synchronization<\/span>.<\/p>\n<p>icvGetHaarTrainingDataFromBG\u00a0works in such way:<\/p>\n<ul>\n<li>it gets negative samples<\/li>\n<li>found false positive until required number is reached<\/li>\n<li>return false-positives<\/li>\n<\/ul>\n<p style=\"text-align: justify;\">If we shuffle negative samples and then call icvGetHaarTrainingDataFromBG, what will happen?\u00a0 Anything bad? In output we will have another false-positive pictures, but algorithm in whole will work correctly and generate right cascade.\u00a0 So we decided to split negative samples into 11 parts(11 machines in cluster) and each cluster calls cvGetHaarTrainingDataFromBG on it&#8217;s own negative set, then clusters outputs are joined together.<\/p>\n<p style=\"text-align: justify;\">Computation time was accelerated much, instead of 6 days, cascade was generated within <span style=\"color: #ff0000;\">21 hours<\/span>!\u00a0 With perfomance tool we compare our cascade with one, generated on single machine with the same training set.\u00a0 Results are very similar.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you want to generate cascade with OpenCV training tools, you should be ready for waiting plenty of time. For example, on training set: 3000 positive \/ 5000 negative, it takes about 6 days! to get cascade for face detection.\u00a0 I wanted to generate many cascades with different training sets, also I added my own [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[29,31,49,47,6,48,46,16],"class_list":["post-70","post","type-post","status-publish","format-standard","hentry","category-opencv","tag-face-detection","tag-haar","tag-haartraining","tag-mpi","tag-opencv","tag-openmp","tag-parallel","tag-profiling"],"_links":{"self":[{"href":"http:\/\/www.computer-vision-software.com\/blog\/wp-json\/wp\/v2\/posts\/70","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.computer-vision-software.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.computer-vision-software.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.computer-vision-software.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.computer-vision-software.com\/blog\/wp-json\/wp\/v2\/comments?post=70"}],"version-history":[{"count":0,"href":"http:\/\/www.computer-vision-software.com\/blog\/wp-json\/wp\/v2\/posts\/70\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.computer-vision-software.com\/blog\/wp-json\/wp\/v2\/media?parent=70"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.computer-vision-software.com\/blog\/wp-json\/wp\/v2\/categories?post=70"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.computer-vision-software.com\/blog\/wp-json\/wp\/v2\/tags?post=70"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}