Category Archives: Computer Vision

Fine-Tuned MobilenetV2 on MyriadX and Coral

Fine-tuning MV2 using a GPU is not hard – especially if using the tensorflow Object Detection API in a docker container. It turns out that deploying a quantize-aware model to a Coral Edge TPU is also not hard.

Doing the same thing for MyriadX devices is somewhat harder. There doesn’t appear to be a good way to take your quantized model and convert it to MyriadX. If you try to convert your quantized model to openvino you get recondite errors; if you turn up logging on the tool you see the errors happen when the tool hits “FakeQuantization nodes.”

Thankfully, for OpenVINO you can just retrain without quantization and things work fine. As such it seems like you end up training twice – which is less than ideal.

Right now my pipeline is as follows:

  • Annotate using LabelImg – installable via brew (OS X)
  • Train (using my NVidia RTX 2070 Super). For this i use Tensorflow 1.15.2 and a specific hash of the tensorflow models dir. Using a different version of 1.x might work – but using 2.x definitely did not, in my case.
  • For Coral
    • Export to a frozen graph
    • Export the frozen graph to tflite
    • Compile the tflite bin to the edgetpu
    • Write custom c++ code to interface with the corale edgetpu libraries to run the net on images (in my case my code can grab live from a camera or from a file)
  • For DepthAI/MyriadX
    • Re-train without the quantized-aware portion of the pipeline config
    • Convert the frozen graph to openvino’s intermediate representation (IR)
    • Compile the IR to a MyriadX blob
    • Push the blob to a local depthai checkout

Let’s go through each section stage of the pipeline:

Annotation

For my custom data set I gathered a sample dataset consisting of video clips. I then exploded those clips using ffmpeg:

ffmpeg -i $1 -r 1/1 $1_%04d.jpg

I then installed LabelImg and annotated all my classes. As I got more proficient in LabelImg I could do about one image every second or two – fast enough, but certainly not as fast as Tesla’s auto labeler!

Note that for my macbook, I found the following worked for getting labelimg:

conda create --name deeplearning
conda activate deeplearning
pip install pyqt5==5.15.2 lxml
pip install labelImg
labelImg

Training

In my case I found that mobilenetv2 meets all my needs. Specifically it is fast, fairly accurate, and it runs on all my devices (both in software, and on the Coral and OAKDLite)

There are gobs of tutorials on training mobilenetv2 in general. For example, you can grab one of the great tensorflow docker images. By no means should you assume that once you have the image all is done – unless you are literally just running someone elses tutorial. The moment you throw in your own dataset you’re going to have to do a number of things. And most likely they will fail. Over and over. So script them.

But before we get to that let’s talk versions. I found that the models produced by tensorflow 2 didnt work well with any of my HW accelerators. Version 1.15.2 worked well, however, so i went with that. I even tried other versions of tensorflow 1x and had issues. I would like to dive into the cause of the issues – but have not done so yet.

See my github repo for an example Dockerfile. Note that for my GPU (an RTX 2070 Super) I had to work around memory growth issues by patching the Tensor Flow Object Detection model_main.py. I also modified my pipeline batch sizes (to 6, from the default 24). Without these fixes the training would explode mysteriously with an unhelpful error message. If only those blasted bitcoin miners hadn’t made GPUs so expensive perhaps I could upgrade to another GPU with more memory!

It is also worth noting that the version of mobilenet i used, and the pipeline config, were different for the Coral and Myraid devices, e.g.:

  • Coral: ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03
  • Myriad: ssd_mobilenet_v2_coco_2018_03_29

Update: it didn’t seem to matter which version of mobilenet i used – both work.

I found that I could get near a loss of 0.3 using my dataset and pipeline configuration after about 100,000 iterations. On my GPU this took about 3 hours, which was not bad at all.

Deploying to Coral

Coral requires quantized training exported to tflite. Once exported to tflite you must compile it for the edgetpu.

On the coral it is simple enough to change from the default mobilenet image to your custom image – literally the filenames change. Must point it at your new labelMap (so it can correctly map the classnames) and image.

Deploying to MyriadX

Myriad was a lot more difficult. To deploy a trained model one must first convert it to the OpenVINO IR formart, as follows:

source /opt/intel/openvino_2021/bin/setupvars.sh && \
    python /opt/intel/openvino_2021/deployment_tools/model_optimizer/mo.py \
    --input_model learn_tesla/frozen_graph/frozen_inference_graph.pb \
    --tensorflow_use_custom_operations_config /opt/intel/openvino_2021/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json \
    --tensorflow_object_detection_api_pipeline_config source_model_ckpt/pipeline.config \
    --reverse_input_channels \
    --output_dir learn_tesla/openvino_output/ \
    --data_type FP16

Then the IR can be converted to blob format by running the compile_tool command. I had significant problems with the compile_tool mainly because it didnt like something about my trained output. In the end I found the cause was that openvino simply doesn’t like the nodes being put into the training graph due to quantization. Removing that from the pipeline solved this. However this cuts the other way for Coral – it ONLY accepts quantized-aware training (caveat: in some cases you could post-process quant however Coral explicitly states this doesn’t work in all cases).

Once the blob file was ready i put the blob, and a json snippet, in the resources/nn/<custom> folder inside my depthai checkout. I was able to see full framerate (at 300×170) or about 10fps at full resolution (from a file). Not bad!

Runtime performance

On the google Coral I am able to create an extremely low-latency pipeline – from capture to inference in only a few milliseconds. Inference itself completes in about 20ms with my fine-tuned model – so I am able to operate at full speed.

I have not fully characterized the MyriadX based pipeline latency. Running from a file I was able to keep up at 30 fps. My Coral pipeline involved me pulling the video frames using my own C/C++ code down – including the YUV->BGR colorspace conversion. Since the MyriadX is able to grab and process on its device, it does have the potential to be low latency – but this has yet to be tested.

Object detection using the MyriadX (depthai oakd-lite) and Google Coral produced similar results.

Architecture For DIY AI-Driven Home Security

The raspberry pi’s ecosystem makes it a tempting platform for building computer vision projects such as home security systems. For around $80, one can assemble a pi (with sd card, camera, case, and power) that can capture video and suppress dead scenes using motion detection – typically with motion or MotionEye. This low-effort, low-cost solution seems attractive until one considers some of its shortfalls:

  • False detection events. The algorithm used to detect motion is suseptible to false positives – tree branches waving in the wind, clouds, etc. A user works around this by tweeking motion parameters (how BIG must an object be) or masking out regions (don’t look at the sky, just the road)
  • Lack of high level understanding. Even in tweeking the motion parameters anything that is moving is deemed of concern. There is no way to discriminate between a moving dog and a moving person.

The net result of these flaws – which all stem from a lack of real understanding – is wasted time. At a minimum the user is annoyed. Worse they are fatigue and miss events or neglect responding entirely.

By applying current state of the art AI techniques such as object detection, facial detection/recognition, one can vastly reduce the load on the user. To do this at full frame rate one needs to add an accelerator, such as the Coral TPU.

In testing we’ve found fairly good accuracy at almost full frame rate. Although Coral claims “400 fps” of speed – this is inference, not the full cycle of loading the image, running inference, and then examining the results. In real-world testing we found the full-cycle results closer to 15fps. This is still significantly better than the 2-3 fps one obtains by running in software.

In terms of scalability, running inference on the pi means we can scale endlessly. The server’s job is simply to log the video and metadata (object information, motion masks, etc.).

Here’s a rough sketch of such a system:

This approach is currently working successfully to provide the following, per rpi camera:

  • moving / static object detection
  • facial recognition
  • 3d object mapping – speed / location determination

This is all done at around 75% CPU utilization on a 2GB Rpi 4B. The imagery and metadata are streamed to a central server which performs no processing other than to archive the data from the cameras and serve it to clients (connected via an app or web page).

Installing OpenCV 4.1.1 on Raspberry Pi 4

Recently I purchased a raspberry pi 4 to see how it performs basic computer vision tasks. I largely followed this guide on building opencv https://www.learnopencv.com/install-opencv-4-on-raspberry-pi/

However, in the guide there are several missing dependencies on a fresh version of raspbian buster. There are also some apparent errors in the CMakeLists.txt file which other users already discovered. After patching these fixes and adding the needed missing dependencies, I now have opencv running on my pi. Here’s my complete script below.

IMPORTANT: You must READ the script, don’t just run it! For example, you should check the version of python you have at the time you run this script. When I ran it i was at python 3.7. Also, feel free to bump to a later version (or master) of opencv.

Oh, also during the numpy step it hung and i was too lazy to look into it. It didn’t seem to affect my ability to use opencv – so i didn’t go back and dig. My bad.

#!/bin/bash

cvVersion="4.1.1"
pythonVersion="3.7"
opencvDirRoot=/home/pi/opencv

sudo apt-get -y purge wolfram-engine
sudo apt-get -y purge libreoffice*
sudo apt-get -y clean
sudo apt-get -y autoremove

mkdir -p $opencvDirRoot
cd $opencvDirRoot

# Clean build directories
rm -rf opencv/build
rm -rf opencv_contrib/build

# Create directory for installation
rm -fr installation
mkdir -p installation
mkdir installation/OpenCV-"$cvVersion"


sudo apt -y update
sudo apt -y upgrade
sudo apt-get -y remove x264 libx264-dev
 
## Install dependencies
sudo apt-get install libblas-dev liblapack-dev
sudo apt-get install libeigen3-dev
sudo apt-get -y install qtbase5-dev qtdeclarative5-dev
sudo apt-get -y install build-essential checkinstall cmake pkg-config yasm
sudo apt-get -y install git gfortran
sudo apt-get -y install libjpeg8-dev libjasper-dev libpng12-dev

 
sudo apt-get -y install libtiff5-dev
 
sudo apt-get -y install libtiff-dev

sudo apt-get -y install libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev
sudo apt-get -y install libxine2-dev libv4l-dev
cd /usr/include/linux
sudo ln -s -f ../libv4l1-videodev.h videodev.h
cd $opencvDirRoot

sudo apt-get -y install libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev
sudo apt-get -y install libgtk2.0-dev libtbb-dev qt5-default
sudo apt-get -y install libatlas-base-dev
sudo apt-get -y install libmp3lame-dev libtheora-dev
sudo apt-get -y install libvorbis-dev libxvidcore-dev libx264-dev
sudo apt-get -y install libopencore-amrnb-dev libopencore-amrwb-dev
sudo apt-get -y install libavresample-dev
sudo apt-get -y install x264 v4l-utils
sudo apt-get -y install libmesa-dev
sudo apt-get -y install freeglut3-dev



# Optional dependencies
sudo apt-get -y install libprotobuf-dev protobuf-compiler
sudo apt-get -y install libgoogle-glog-dev libgflags-dev
sudo apt-get -y install libgphoto2-dev libeigen3-dev libhdf5-dev doxygen

sudo apt-get -y install python3-dev python3-pip
sudo -H pip3 install -U pip numpy
sudo apt-get -y install python3-testresources

cd $opencvDirRoot
# Install virtual environment
python3 -m venv OpenCV-"$cvVersion"-py3
echo "# Virtual Environment Wrapper" >> ~/.bashrc
echo "alias workoncv-$cvVersion=\"source $opencvDirRoot/OpenCV-$cvVersion-py3/bin/activate\"" >> ~/.bashrc
source "$opencvDirRoot"/OpenCV-"$cvVersion"-py3/bin/activate
#############

############ For Python 3 ############
# now install python libraries within this virtual environment
sudo sed -i 's/CONF_SWAPSIZE=100/CONF_SWAPSIZE=1024/g' /etc/dphys-swapfile
sudo /etc/init.d/dphys-swapfile stop
sudo /etc/init.d/dphys-swapfile start
pip install numpy dlib
# quit virtual environment
deactivate

git clone https://github.com/opencv/opencv.git
cd opencv
git checkout $cvVersion
cd ..

git clone https://github.com/opencv/opencv_contrib.git
cd opencv_contrib
git checkout $cvVersion
cd ..

cd opencv
mkdir build
cd build


# Eigen/Core to eigen3/Eigen/Core
sed -i s,Eigen/Core,eigen3/Eigen/Core/g ../modules/core/include/opencv2/core/private.hpp

# Add these to  opencv/samples/cpp/CMakeLists.txt 
find_package(OpenGL REQUIRED)
find_package(GLUT REQUIRED)

cmake .. -D CMAKE_BUILD_TYPE=RELEASE \
            -D CMAKE_INSTALL_PREFIX=$opencvDirRoot/installation/OpenCV-"$cvVersion" \
            -D INSTALL_C_EXAMPLES=ON \
            -D INSTALL_PYTHON_EXAMPLES=ON \
            -D WITH_TBB=ON \
            -D WITH_V4L=ON \
            -D OPENCV_PYTHON3_INSTALL_PATH=$opencvDirRoot/OpenCV-$cvVersion-py3/lib/python$pythonVersion/site-packages \
        -D WITH_QT=ON \
        -D WITH_OPENGL=ON \
    -D OPENCV_EXTRA_EXE_LINKER_FLAGS=-latomic \
        -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \
        -D BUILD_EXAMPLES=ON


make -j$(nproc)
make install


sudo sed -i 's/CONF_SWAPSIZE=1024/CONF_SWAPSIZE=100/g' /etc/dphys-swapfile
sudo /etc/init.d/dphys-swapfile stop
sudo /etc/init.d/dphys-swapfile start

echo "sudo modprobe bcm2835-v4l2" >> ~/.profile
                                                                                                                                        

A few ways to quickly and automatically binarize an image

For my wife’s Spell To Write and Read (SWR) homeschooling we have a bunch of scanned worksheets.  A sample of the scanned image is shown below:

10letter_aandc

As you can see its entirely readable and fine for our purposes.  However it is not gentle on our laser printer toner budget.  What we really want is the background to be white, and the foreground to be black – nothing inbetween. This process is called binarization – and scanner software often has a feature that lets you do this during scantime.

We didn’t use that feature (or maybe our software didnt support it) at scantime. As such we need to resort to postprocessing. I have a Master’s in computer graphics and vision, and everytime I use something I learned the value of that degree goes up. It therefore behoves me to use it every chance I get.

As a good computer vision student, when I think binarization my mind jumps straight to Otsu!  He came up with a great way of automatically determining a good threshold value (meaning, when we look at each pixel in the image, everything below a value turns black, all else turns white).

My first thought is to check for an easy button somewhere. In gimp, for example, I found you can load the image, click on “Image -> Mode -> Indexed” then select “Use black and white (1 bit)”. Looks ok!

Now how to automate this, given I have 60+ images? Turns out there is a threshold option in imagemagick. I could go through each image in the directory and manually threshold, but I might get the threshold wrong, and I don’t really want to train my wife on picking a threshold value. Plus I know Otsu is better!

Turns out some guy named Fred has a bunch of ImageMagick scripts, including an Otsu one. I downloaded his script and ran it, yielding the following image:

10letter_aandc

Pretty nice – just black and white.  Thanks Fred… sorry I cannot call him “Fast Freddy” since it took around 18 seconds per image.  I know we can do better! Time to dust off those computer vision skills of Master.

Here’s what I came up with using python/opencv:


#!/usr/bin/python
import cv2
import sys
img=cv2.imread(sys.argv[1],0)
ret,imgThresh=cv2.threshold(img, 0, 255, cv2.THRESH_OTSU)
cv2.imwrite(sys.argv[2], imgThresh)

Short and sweet! And performance is way better: about 3 seconds per image.  But it looks like most of the program runtime is spent loading cv2.  Based on that assumption I decided to add a bulk processing mode:

import cv2
import sys
if len(sys.argv) == 1 or "-h" in sys.argv:
    print "Usage: %s [-inplace] image1 [image2 [image 3 ...]]"
    print " %s inImage outImage"
    sys.exit(0)
if "-inplace" == sys.argv[1]:
    inOut = [ (arg, arg) for arg in sys.argv[2:] ]
else:
    inOut = [ (sys.argv[1], sys.argv[2]) ]
for inImage, outImage in inOut:
    print "Converting %s to %s" % (inImage, outImage)
    img=cv2.imread(inImage,0)
    ret,imgThresh=cv2.threshold(img, 0, 255, cv2.THRESH_OTSU)
    cv2.imwrite(outImage, imgThresh)

When I run this script on the whole directory it takes an average of 2 seconds per image. Better, but longer than needed. What gives? It turns out I have all my data on a QNAP and opening, reading, and writing lots of files is not its forte. When I copy the data to my local SSD on the MAC, the cost per image is now 140ms. Much better.

Since, as often happens, I have found my assumptions totally flawed, can I vindicate Freddy? After rerunning the test it appears he is still a “steady Freddy” at about 2.7 seconds when running straight on the hard drive. Sorry Fred; opencv just beat the pants off you.