The raspberry pi’s ecosystem makes it a tempting platform for building computer vision projects such as home security systems. For around $80, one can assemble a pi (with sd card, camera, case, and power) that can capture video and suppress dead scenes using motion detection – typically with motion or MotionEye. This low-effort, low-cost solution seems attractive until one considers some of its shortfalls:
False detection events. The algorithm used to detect motion is suseptible to false positives – tree branches waving in the wind, clouds, etc. A user works around this by tweeking motion parameters (how BIG must an object be) or masking out regions (don’t look at the sky, just the road)
Lack of high level understanding. Even in tweeking the motion parameters anything that is moving is deemed of concern. There is no way to discriminate between a moving dog and a moving person.
The net result of these flaws – which all stem from a lack of real understanding – is wasted time. At a minimum the user is annoyed. Worse they are fatigue and miss events or neglect responding entirely.
By applying current state of the art AI techniques such as object detection, facial detection/recognition, one can vastly reduce the load on the user. To do this at full frame rate one needs to add an accelerator, such as the Coral TPU.
In testing we’ve found fairly good accuracy at almost full frame rate. Although Coral claims “400 fps” of speed – this is inference, not the full cycle of loading the image, running inference, and then examining the results. In real-world testing we found the full-cycle results closer to 15fps. This is still significantly better than the 2-3 fps one obtains by running in software.
In terms of scalability, running inference on the pi means we can scale endlessly. The server’s job is simply to log the video and metadata (object information, motion masks, etc.).
Here’s a rough sketch of such a system:
This approach is currently working successfully to provide the following, per rpi camera:
moving / static object detection
3d object mapping – speed / location determination
This is all done at around 75% CPU utilization on a 2GB Rpi 4B. The imagery and metadata are streamed to a central server which performs no processing other than to archive the data from the cameras and serve it to clients (connected via an app or web page).
Carson’s Web Development Maxim: If you don’t like any of the web toolkits that are available just wait a few months: there will be ten new ones waiting for you.
I like, and still use, Angular, but I wanted to try something new. Enter React: a framework from the Facebook side of the fence. What better way to learn than with a project! I decided to make an app to help my kids clean up.
The concept around the application is to track the kids progress as they pick up toys. Specifically, they are asked to pick up six toys at a time, take a picture of each toy, and put the toy away. Thankfully I had an old Nexus 7 tablet laying around that nobody was using. Also, chrome support for HTML5 video lets a web app gain access to your tablet. Looks like I am set!
After reading through the “comments” tutorial for React I whipped up the following (shown running on the Nexus 7):
The user interaction is designed as follows: each kid has a row for their “pickup tiles.” For each toy, they click a new tile and the app takes a picture. They can retake on a square as many times as they want. When they get all six squares a tantilizing reward appears: an animated gif of their choosing (yes, yes — the 90s are calling and they want their webpage back!) This is shown below:
This app was interesting from a HCI perspective. The following observations were made:
Kids had a hard time using the tablet to “click” on a tile. I think their little fingers don’t register too well on a tablet.
There was no way to preview: maybe clicking should have shown a live preview which snapped the picture after a 2-second delay.
The kids were more interested in taking pictures of toys than putting them away. This resulted in toys being piled up near the tablet, rather than being put away, as kids ran around finding new toys to take pictures of.
Pickup actually took longer than without the app. Even when the kids put things away they took a lot longer staring at the screen.
Originally I was going to back this app with a server-side component that would store all pictures. The thought being that it would be fun to see the silly things the kids picked up. In the end I decided to skip this because of my observations from some trial runs, which basically indicates that the app makes pickup worse. I’ll just stick the traditional approach of motivating kids (e.g. “Pick up your toys or I’ll give them to Chuck at work”).
From a programming perspective here are a few observations:
React is awesome. I was amazed at Angular’s novel approach that effectively extended html to allow you to embed display logic into your markup. My mind was blown again with JSX, which essentially goes the opposite route, letting you embed markup naturally inside your code. I liked the way you could easily form components and interact with their data models.
Also in terms of html 5 video in Chrome: this worked perfectly. Only annoyance is that you must use https to get at the camera, which doesn’t seem necessary on a local net. Also the aspect ratio on my Nexus 7 is weird when in portrait view.
There is a ton of wasted space on chrome on the tablet. I couldn’t find any currently-supported method for making a web app full screen. There had been support in the past, but when i looked in chrome’s settings it appears the options are no longer available. Wouldn’t it be nice to see your app in full screen, especially on such a space-constrained device?