I use my RTX for heavier ML workflows, like training and high-quality inference (say, centernet). Additionally my security system has edge-based (coral TPU) cameras (one TPU per camera) that perform their own inference workloads. Thus all my compute resources are busy burning any solar power I get, and then some.
This poses a problem as it means i have no spare compute to handle any remaining workloads. For example, during regression testing of new models I need to re-run inference against old data. Neither the camera TPUs nor the RTX is available. Thankfully, one can just add extra TPUs to the main workhorse server to gain additional inference capacity. Each of these TPUs is capable of 90ish inferences a second of a mobilenetv2-based net. All this on the same server performing inference and training using the RTX without impacting performance.
Each TPU operates independently. Currently I have two TPUs in addition to my RTX. CPU utilization sits around 25% while the TPU and GPU resources are pegged. A busy box is a happy box.
For reference, a good docker container for running coral: https://github.com/pklinker/coral-container.git – the scripts provided for objdet are good – just remember you can pass “@:0” or “@:1” to reference your 1st, 2nd, etcth TPU.