{"id":156,"date":"2015-09-28T00:39:31","date_gmt":"2015-09-28T00:39:31","guid":{"rendered":"https:\/\/carson.fenimorefamily.com\/?p=156"},"modified":"2016-01-01T17:33:47","modified_gmt":"2016-01-01T17:33:47","slug":"a-few-ways-to-quickly-and-automatically-binarize-an-image","status":"publish","type":"post","link":"https:\/\/carson.fenimorefamily.com\/?p=156","title":{"rendered":"A few ways to quickly and automatically binarize an image"},"content":{"rendered":"<p>For my wife&#8217;s Spell To Write and Read (SWR) homeschooling we have a bunch of scanned\u00a0worksheets. \u00a0A sample of the scanned image is shown below:<\/p>\n<p><a href=\"\/\/carson.fenimorefamily.com\/wp-content\/uploads\/2015\/09\/10letter_aandc1.jpeg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-158\" src=\"\/\/carson.fenimorefamily.com\/wp-content\/uploads\/2015\/09\/10letter_aandc1-232x300.jpeg\" alt=\"10letter_aandc\" width=\"232\" height=\"300\" srcset=\"https:\/\/carson.fenimorefamily.com\/wp-content\/uploads\/2015\/09\/10letter_aandc1-232x300.jpeg 232w, https:\/\/carson.fenimorefamily.com\/wp-content\/uploads\/2015\/09\/10letter_aandc1-791x1024.jpeg 791w\" sizes=\"auto, (max-width: 232px) 100vw, 232px\" \/><\/a><\/p>\n<p>As you can see its entirely readable and fine for our purposes. \u00a0However it is not gentle on\u00a0our laser printer toner budget. \u00a0What we really want is the background to be white, and the foreground to be black &#8211; nothing inbetween. This process is called binarization &#8211; and scanner software often has a feature that lets you do this during scantime.<\/p>\n<p>We didn&#8217;t use that feature (or maybe our software didnt support it) at scantime. As such we need to resort to postprocessing. I have a Master&#8217;s in computer graphics and vision, and everytime I use something I learned the value of that degree goes up. It therefore behoves me to use it every chance I get.<\/p>\n<p>As a good computer vision student, when I think binarization my mind jumps straight to <a href=\"\/\/en.wikipedia.org\/wiki\/Otsu%27s_method\">Otsu<\/a>! \u00a0He came up with a great way of automatically determining a good threshold value (meaning, when we look at each pixel in the image, everything below a value turns black, all else turns white).<\/p>\n<p>My first thought is to check for an easy button somewhere. In gimp, for example, I found you can load the image, click on &#8220;Image -&gt; Mode -&gt; Indexed&#8221; then select &#8220;Use black and white (1 bit)&#8221;. Looks ok!<\/p>\n<p>Now how to automate this, given I have 60+ images? Turns out there is a threshold option in imagemagick. I could go through each image in the directory and manually threshold, but I might get the threshold wrong, and I don&#8217;t really want to train my wife on picking a threshold value. Plus I know Otsu is better!<\/p>\n<p>Turns out some guy named Fred has a bunch of <a href=\"http:\/\/www.fmwconcepts.com\/imagemagick\/otsuthresh\/index.php\">ImageMagick scripts<\/a>, including an Otsu one. I downloaded his script and ran it, yielding the following image:<\/p>\n<p><a href=\"\/\/carson.fenimorefamily.com\/wp-content\/uploads\/2015\/09\/10letter_aandc.jpeg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-157\" src=\"\/\/carson.fenimorefamily.com\/wp-content\/uploads\/2015\/09\/10letter_aandc-232x300.jpeg\" alt=\"10letter_aandc\" width=\"232\" height=\"300\" srcset=\"https:\/\/carson.fenimorefamily.com\/wp-content\/uploads\/2015\/09\/10letter_aandc-232x300.jpeg 232w, https:\/\/carson.fenimorefamily.com\/wp-content\/uploads\/2015\/09\/10letter_aandc-791x1024.jpeg 791w\" sizes=\"auto, (max-width: 232px) 100vw, 232px\" \/><\/a><\/p>\n<p>Pretty nice &#8211; just black and white. \u00a0Thanks Fred&#8230; sorry I cannot call him &#8220;Fast Freddy&#8221; since\u00a0it took around 18 seconds per image. \u00a0I know we can do better! Time to dust off those computer vision skills of Master.<\/p>\n<p>Here&#8217;s what I came up with using python\/opencv:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n\r\n#!\/usr\/bin\/python\r\nimport cv2\r\nimport sys\r\nimg=cv2.imread(sys.argv[1],0)\r\nret,imgThresh=cv2.threshold(img, 0, 255, cv2.THRESH_OTSU)\r\ncv2.imwrite(sys.argv[2], imgThresh)\r\n\r\n<\/pre>\n<p>Short and sweet! And performance is way better: about 3 seconds per image. \u00a0But it looks like\u00a0most of the program runtime is spent loading cv2. \u00a0Based on that assumption I decided to add\u00a0a bulk processing mode:<\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nimport cv2\r\nimport sys\r\nif len(sys.argv) == 1 or &quot;-h&quot; in sys.argv:\r\n    print &quot;Usage: %s [-inplace] image1 [image2 [image 3 ...]]&quot;\r\n    print &quot; %s inImage outImage&quot;\r\n    sys.exit(0)\r\nif &quot;-inplace&quot; == sys.argv[1]:\r\n    inOut = [ (arg, arg) for arg in sys.argv[2:] ]\r\nelse:\r\n    inOut = [ (sys.argv[1], sys.argv[2]) ]\r\nfor inImage, outImage in inOut:\r\n    print &quot;Converting %s to %s&quot; % (inImage, outImage)\r\n    img=cv2.imread(inImage,0)\r\n    ret,imgThresh=cv2.threshold(img, 0, 255, cv2.THRESH_OTSU)\r\n    cv2.imwrite(outImage, imgThresh)\r\n<\/pre>\n<p>When I run this script on the whole directory it takes an average of 2 seconds per image. Better, but longer than needed. What gives? It turns out I have all my data on a QNAP and opening, reading, and writing lots of files is not its forte. When I copy the data to my local SSD on the MAC, the cost per image is now 140ms. Much better.<\/p>\n<p>Since, as often happens, I have found my assumptions totally flawed, can I vindicate Freddy? After rerunning the test it appears he is still a &#8220;steady Freddy&#8221; at about 2.7 seconds when running straight on the hard drive. Sorry Fred; opencv just beat the pants off you.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For my wife&#8217;s Spell To Write and Read (SWR) homeschooling we have a bunch of scanned\u00a0worksheets. \u00a0A sample of the scanned image is shown below: As you can see its entirely readable and fine for our purposes. \u00a0However it is not gentle on\u00a0our laser printer toner budget. \u00a0What we really want is the background to &hellip; <a href=\"https:\/\/carson.fenimorefamily.com\/?p=156\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">A few ways to quickly and automatically binarize an image<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[],"class_list":["post-156","post","type-post","status-publish","format-standard","hentry","category-computer-vision"],"_links":{"self":[{"href":"https:\/\/carson.fenimorefamily.com\/index.php?rest_route=\/wp\/v2\/posts\/156","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/carson.fenimorefamily.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/carson.fenimorefamily.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/carson.fenimorefamily.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/carson.fenimorefamily.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=156"}],"version-history":[{"count":9,"href":"https:\/\/carson.fenimorefamily.com\/index.php?rest_route=\/wp\/v2\/posts\/156\/revisions"}],"predecessor-version":[{"id":173,"href":"https:\/\/carson.fenimorefamily.com\/index.php?rest_route=\/wp\/v2\/posts\/156\/revisions\/173"}],"wp:attachment":[{"href":"https:\/\/carson.fenimorefamily.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=156"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/carson.fenimorefamily.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=156"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/carson.fenimorefamily.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=156"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}