I’m working on a small project that will eventually involve object detection and sorting using a robot arm. In terms of object I considered various items from small plastic bottles to figurines and eventually settled on dice given their size, weight and suitability for what is basically a $20 plastic robot arm. Easy, I figured, grab some training images and train a tiny-yolo model for detection and we should be good to go, right? Not really.
This is what we are trying to detect, below. We want to know the color of the dice, as well as the number of pips it sees, from 1 to 6. We want to detect this off video, ideally, too, and fast, in real-time.
Training a yolo model, well any CNN basically requires a fair number of training images. In my previous post I trained a yolov3 model to detect rats and that took 600 images, carefully labelled and trained and I’ll be the first to admit that labeling hundreds of images is not my idea of a good time. It worked well, and I even managed to retrain it on tiny-yolo to fit on a Raspberry Pi3 and was happy with the result. The FPS rate wasn’t great but it worked well enough. So I figured I’d give it a go first with 40 odd images of a white dice, divided into 6 classes to denote the number of pips. Several hours later I had a model which detected pretty much nothing. Zero. Maybe I messed up the parameters or whatnot, but it made me consider alternatives.
Searching around I found a number of promising examples using OpenCV which I tried, with mixed results. OpenCV is fast, doesn’t require 60MB plus trained models and you can nicely break the problem down into different parts. So I started from scratch and assembled a fairly good model to detect not only different color dice but the pip count on each as well. Keep in mind I have specific requirements including dice location, rotation, color and distance from dice to camera being within a specific, fixed range. So don’t expect this to work for your casino tables right off the bat.
Key to making this work, and also the most painful part, is choosing an HSV color mask to extract the dice from the background. Now I assume that most of the time, and very much for what I need to do, you will have an idea what the background is going to be, say flat black, a green gaming board or whatever. You also have an idea of the distance between camera and dice too.
So the first step is to figure out the HSV color mask (lower and upper bounds of each) as shown below here on a white dice, within your own parameter constraints. It turns out green and red is easy, white is quite a pain to get right.
You will notice in the screenshot above that I tuned the parameters using the trackbars to isolate the dice as much as possible. This won’t be 100%, there will be residual noise, but you want to be able to detect the pips, as circles, which you count using OpenCV’s HoughCircles method. We know what color we have if we detect pips within one of our 3 defined HSV color masks, basically. For bonus points you can detect the dice using contours too if you wish.
Having done the above for all three dice we get some very good results as shown below:
The full Python code is included below. You’ll need to tune the HSV masks and the parameters for the HoughCircles method (minRadius, maxRadius) depending on your own requirements.