It is fairly easy to check out and build dlib. Getting it to work in a performance-optimized manner – python bindings included -takes a little more work.
Per the dlib github one can build the bindings by simply issuing:
python setup.py install
First problem I found is that the setup process decided to latch on to an old version of CUDA. That was my bad – fixed by moving my PATH variable to point to the new cuda’s bin dir.
Second problem is that during compilation I saw the following:
Invoking CMake build: 'cmake --build . --config Release -- -j12'
[ 1%] Building NVCC (Device) object dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cusolver_dlibapi.cu.o
[ 2%] Building NVCC (Device) object dlib_build/CMakeFiles/dlib.dir/cuda/dlib_generated_cuda_dlib.cu.o
/home/carson/code/2020/facenet/dlib/dlib/cuda/cuda_dlib.cu(1762): error: calling a constexpr __host__ function("log1p") from a __device__ function("cuda_log1pexp") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
As it suggests this is resolved by passing in a flag to the compiler. To do this modify the setup.py line:
python setup.py install --set USE_AVX_INSTRUCTIONS=1 --set DLIB_USE_CUDA=1 --set CUDA_NVCC_FLAGS="--expt-relaxed-constexpr"
Everything went just peachy from there except when I attempted to use dlib from within python I got an error (something like):
dlib 19.19.99 is missing cblas_dtrsm symbol
After which i tried importing face_recognition and got a segfault.
I fixed this by install openblas-devel, then re-ran the setup.py script as above. Magically this fixed everything.
Again, not bad – dlib seems cool – just normal troubleshooting stuff.