Armchair Deep Learning Enthusiast: Object Detection Tips #1

Recently I’ve been using the Tensorflow Object Detection API. Much of my approach follows material provided on pyimagesearch.com, this API allows us to perform. Rather than rehash that material, I just wanted to give a few pointers that I found helpful.

  • Don’t limit yourself to the ImageNet Large Scale Visual Recognition (ILSVRC) dataset. This dataset is credited with helping really juice up the state of the art in image recognition, however there are some serious problems for armchair DL enthusiasts like myself
    1. It isn’t easy to get the actual images for training. You have to petition to have access to the dataset, and your email must not look “common”. Uh-huh – so what are your options? Sure, you can get links to the images on the imagenet website and download them yourself Sure you could get the bounding boxes – but how do you match them up with the images you manually downloaded. I don’t want any of this – I just want the easy button; download it and use it. Well – the closest thing you’ll get to that is to grab the images from academic torrents: http://academictorrents.com/collection/imagenet-lsvrc-2015. but even that doesn’t feel satisying – if imagenet is about moving the state of the art forward, and assuming i even had the capability o do so, they sure aren’t making it easy for me to do that!
    2. It seems outdated. The object detection dataset hasnt changed since 2012. That is probably good for stability but the total image size (~1M images) no longer seems big. Peoples hairstyles, clothing, etc. are all changing – time for an update!
    3. Oh, that’s right – there is no official “person” synset inside the ILSVRC image set! So don’t worry about those out of date hair styles or clothes!
    4. There are better datasets out there. Bottom line – people are moving to other data sets and you should too.
      1. Open Images being one of the best
      2. Oh, and you can download subsets of this easily using a tool like https://github.com/harshilpatel312/open-images-downloader.git.
  • The TFOD flow is easy to follow – provided you use the right tensorflow version.
    • You TFOD is not compatible with tensorflow 2.0. You have to use the 1.x series.
    • I am using anaconda to download tensorflow-gpu version 1.15.0. To do this type “conda install tensorflow-gpu=1.15.0” (inside an activated anaconda instance)
    • You then grab the TFOD library, as per the website
  • Make sure you actually feed TFOD data, else you get weird hanging-like behavior.
    • At some point i found that tensorflow was crashing because a bounding box was outside the image.
    • In the process of fixing tht i introduced a bug that caused zero records to be sent to tensorflow
    • When i then ran a training loop I saw tensorflow progress as usual until it reported it had loaded libcublas: “Successfully opened dynamic library libcublas.so.10.2”
    • I thought this was a tensorflow issue, and even found a github issue for it Successfully opened dynamic library libcublas.so.10.0′ – however this was all a red herring. It was NOT because of tensorflow, it was just because my bounding box fix had eliminated all bounding boxes. Once i fixed that alll was well
  • Make sure you provide enough discriminatory data. E.g. if you want to find squirrels, don’t just train on the squirrel set, otherwise your detector will think almost anything that looks like an object is a squirrel. Add in a few other data sets and you will find that squirrels are squirrels and everything else is hit or miss.

How I Set Up DLIB

It is fairly easy to check out and build dlib. Getting it to work in a performance-optimized manner – python bindings included -takes a little more work.

Per the dlib github one can build the bindings by simply issuing:

python setup.py install

First problem I found is that the setup process decided to latch on to an old version of CUDA. That was my bad – fixed by moving my PATH variable to point to the new cuda’s bin dir.

Second problem is that during compilation I saw the following:

Invoking CMake build: 'cmake --build . --config Release -- -j12'
[  1%] Building NVCC (Device) object dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cusolver_dlibapi.cu.o
[  2%] Building NVCC (Device) object dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cuda_dlib.cu.o
/home/carson/code/2020/facenet/dlib/dlib/cuda/cuda_dlib.cu(1762): error: calling a constexpr __host__ function("log1p") from a __device__ function("cuda_log1pexp") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

As it suggests this is resolved by passing in a flag to the compiler. To do this modify the setup.py line:

python setup.py install --set USE_AVX_INSTRUCTIONS=1 --set DLIB_USE_CUDA=1 --set CUDA_NVCC_FLAGS="--expt-relaxed-constexpr"

Everything went just peachy from there except when I attempted to use dlib from within python I got an error (something like):

dlib 19.19.99 is missing cblas_dtrsm symbol

After which i tried importing face_recognition and got a segfault.

I fixed this by install openblas-devel, then re-ran the setup.py script as above. Magically this fixed everything.

Again, not bad – dlib seems cool – just normal troubleshooting stuff.

GPUs Are Cool

Expect no witty sayings or clever analyses here – I just think GPUs are cool. And here are a few reasons why:

Exhibit A: Machine Learning Training a standard feed forward neural net on CIFAR-10 progresses at 50usec/sample; my 2.4 Ghz i7 takes almost 500usec/sample. The total set takes around 5 min to train on the GPU vs over a 35 min on my CPU. On long tasks this means a difference of days to weeks.

Exhibit B: Video transcoding In order to make backups of all my blu-ray disks, I rip and transcode them using ffmpeg or handbrake. Normally Im lucky to get a few dozen frames per second – completely making out my CPU during the process. By compiling ffmpeg to include nvenc/cuda support I get 456 fps (19x faster). As the screenshots show, my avg cpu usage was below 20% – and even GPU usage stayed under 10%. Video quality was superb (i couldnt tell the difference).

ffmpeg -vsync 0 -hwaccel cuvid -i 00800.m2ts -c:a copy -c:v h264_nvenc -b:v 5M prince_egypt.mp4
RAW frame from blu-ray
Same frame after ffmpeg/nvenc transcoding

My setup:

  • GPU: RTX 2070 Super (8GB ram)
  • CPU: i7-8700K (6 core HT @3.7Ghz)
  • RAM: 32GB
  • Disk: 1TB PM981 (NVME)