R Web API from Dynamics 365 FinOps

Microsoft gives us a fair number of options to seamlessly connect machine learning models to our production code, and I honestly love using them all. AzureML is fantastic for many use cases, and with the Data Factory, Databricks and Data Lakes combo virtually every possible scenario can be covered really nicely.

Except of course if the model you need to use is hosted by a 3rd party which does not support any of these services. Then again, you might want to quickly test a few models first in a POC context before committing to “productizing” these into AzureML. Perhaps you just don’t want all your eggs in one vendor basket, or all your raindrops in one single cloud.

Worse, you might have a requirement to call an R API from D365 FinOps. In this blog post I’ll show you how.

First things first, let’s build a simple R model using the Prophet library from Facebook to do forecasting. This uses a data frame with two columns, y & ds to feed a time series set of values (y) based on time (ds). Prophet supports a lot of parameters for seasonality and such and I suggest reading up on it.

For our example I’ll keep things simple, and assume the R script won’t be doing any munging or wrangling as such. Clean data frame goes in, Prophet predicts, but instead of returning the y-hat values (Ŷ) we’ll make it interesting and return a set of base64 encoded PNG plots containing the forecast and seasonality trends instead.

So there are a number of challenges for us:

  • We need to host this R model as an API
  • We need to grab the resulting plot predictions created by Prophet
  • Encode the plots to base64 and return it from the API as JSON
  • Call and display this all in D365 from a form

The best way I’ve found to host R as an API is to use the Plumber library. So I’ve deployed a Linux environment in my cloud of choice and installed all the required R libraries, including Plumber, and set up NGINX to route incoming traffic on port 80 to Plumber which listens on port 8000. To call this API from D365 you’ll need to install a certificate as only HTTPS will do between D365 and our Linux box.

The R code is shown below, detailing how we grab the plots and encode it to base64. We also receive our data frame as part of the call so we need to URIDecode it. This will do for small data sets; if you want to tackle a large data set, use a different mechanism of passing a reference to the data, perhaps a POST call with the data in the body as JSON. In our case our API returns JSON containing three base64 encoded plots.

library(prophet)
library(dplyr)
library(ggplot2)
library(png)
library(plumber)
library(urltools)

encodeGraphic <- function(g) {
  png(tf1 <- tempfile(fileext = ".png"))
  print(g)
  dev.off()
  encoded <- RCurl::base64Encode(readBin(tf1, "raw", file.info(tf1)[1, "size"]), "txt")
  return(encoded)
}

#* Do a forecast
#* @param data a CSV containing ordered, orderdate
#* @get /forecast
function(data="")
{
  json = '{"forecast":"'
  tmp<-URLdecode(data)
  stats <- read.csv(text=tmp, header=TRUE, sep=',',colClasses = c('numeric','Date'))
  names(stats) <- c("y","ds")
  stats$ds <- as.Date(stats$ds) # coerce to ensure date type

  m <- prophet(stats, yearly.seasonality=TRUE)
  future <- make_future_dataframe(m, periods = 4, freq="m")
  forecast <- predict(m, future)

  g<-plot(m, forecast) +
    xlab("Date") +
    ylab("Data") +
    theme_grey() +
    theme_grey() +
    theme(panel.grid.major = element_blank(),
          panel.grid.minor = element_blank(),
          axis.line = element_line(colour = "black")) +
    ggtitle("Sales Forecast");

  encodedForecast<-encodeGraphic(g)
  json <- paste(json, encodedForecast,sep='')
  g<-prophet_plot_components(m, forecast)
  json <- paste(json, '","trend":"', sep='')
  encodedTrend <- encodeGraphic(g[1])
  json<-paste(json, encodedTrend,sep='')
  json<-paste(json,'","yearly":"', sep='')
  encodedYearly <- encodeGraphic(g[2])
  json<-paste(json, encodedYearly,sep='')
  json<-paste(json, '"}', sep='')
  return(json)
}

 

Next up we’ll create an extensible control in D365 to host our plots. I like wrapping things in extensible controls as it gives me the ability to obfuscate the JavaScript to protect any commercial IP. So I try to keep as little as possible in X++ and as much as possible in JavaScript.

Here is the code for our BuildControl, just a single CSV property is defined:

[FormDesignControlAttribute("Forecast")]
class ForecastControlBuild extends FormBuildControl
{
    str csv = "";

    [FormDesignPropertyAttribute("CSV","Forecast")]
    public str parmCSV(str _csv=csv)
    {
        if (prmIsDefault(_csv))
        {
            csv = _csv;
        }
        return csv;
    }
}

 

Followed by the code for our Control class that contains our CSV property that we will populate from our X++ form.

[FormControlAttribute('Forecast','',classstr(ForecastControlBuild))]
class ForecastControl extends FormTemplateControl
{
    FormProperty csv;

    public void new(FormBuildControl _build, FormRun _formRun)
    {
        super(_build, _formRun);
        this.setTemplateId('Forecast');
        this.setResourceBundleName('/resources/html/Forecast');
        csv = properties.addProperty(methodStr(ForecastControl, parmCSV), Types::String);
    }

    [FormPropertyAttribute(FormPropertyKind::Value, "CSV")]
    public str parmCSV(str _value = csv.parmValue())
    {
        if (!prmIsDefault(_value))
        {
            csv.setValueOrBinding(_value);
        }
        return csv.parmValue();
    }

    public void applyBuild()
    {
        super();
        ForecastControlBuild build = this.build();

        if (build)
        {
            this.parmCSV(build.parmCSV());
        }
    }
}

 

We’ll add a minimal control HTML file to host our image placeholders. Three simple DIV controls with image controls with their ID’s set to forecastImage, trendImage and yearlyImage respectively, so we can get hold of them from our JavaScript code.

Finally the JavaScript for our control containing the actual Ajax call to our R API.

(function () {
    'use strict';
    $dyn.controls.Forecast = function (data, element) {
        $dyn.ui.Control.apply(this, arguments);
        $dyn.ui.applyDefaults(this, data, $dyn.ui.defaults.Forecast);
    };

    $dyn.controls.Forecast.prototype = $dyn.ui.extendPrototype($dyn.ui.Control.prototype, {
        init: function (data, element) {
            var self = this;
            $dyn.ui.Control.prototype.init.apply(this, arguments);
            $dyn.observe(data.CSV, function (csv)
            {
                document.getElementById('forecastImage').style.display = "none";
                document.getElementById('trendImage').style.display = "none";
                document.getElementById('yearlyImage').style.display = "none";
                if (csv.length>0)
                {
                    var url = 'https://yourboxhere.australiaeast.cloudapp.azure.com/forecast?data=' + csv;
                    $.ajax({
                        crossOrigin: true,
                        url: url,
                        success: function (data) {
                            var obj = JSON.parse(data);
                            var forecast = obj.forecast;
                            var trend = obj.trend;
                            var yearly = obj.yearly;

                            document.getElementById('forecastImage').src = 'data:image/png;base64,' + forecast;
                            document.getElementById('trendImage').src = 'data:image/png;base64,' + trend;
                            document.getElementById('yearlyImage').src = 'data:image/png;base64,' + yearly;
                            document.getElementById('forecastImage').style.display = "block";
                            document.getElementById('trendImage').style.display = "block";
                            document.getElementById('yearlyImage').style.display = "block";
                        }
                    });
                }
            })
        }
    });
})();

 

So far it’s all fairly simple, and we can add a demo form in X++ to use our extensible control. We’ll grab some sales orders from D365, URI encode it manually and then send it off to our extensible control to pass to our R API sitting somewhere outside the D365 cloud.

class ForecastFormClass
{
    ///

    ///
    ///

 

    ///
    ///
    [FormControlEventHandler(formControlStr(ForecastForm, FormCommandButtonControl1), FormControlEventType::Clicked)]
    public static void FormCommandButtonControl1_OnClicked(FormControl sender, FormControlEventArgs e)
    {
        FormCommandButtonControl callerButton = sender as FormCommandButtonControl; 
        FormRun form = callerButton.formRun();
        ForecastControl forecastControl;
        forecastControl = form.control(form.controlId("ForecastControl1"));

        SalesLine   SalesLine;
        date        varStartPeriodDT    = mkdate(1, 1, 2015);
        date        varEndPeriodDT      = mkDate(1,7,2016);
        str         csv                 = "ordered%2Corderdate%0D%0A";

        while select sum(QtyOrdered), ShippingDateRequested  from SalesLine group by ShippingDateRequested
            where SalesLine.ShippingDateRequested >= varStartPeriodDT && SalesLine.ShippingDateRequested <= varEndPeriodDT &&  SalesLine.ItemId == 'T0001'
        {
            csv = csv + int2str(SalesLine.QtyOrdered) + "%2C" + date2str(SalesLine.ShippingDateRequested,321,2,3,2,3,4) + "+00%3A00%3A00%0D%0A";
        }
        forecastControl.parmCSV(csv);
    }
}

 

A second or two later and we receive our plots.

AXForecast

Pretty simple stuff. We can extend this further by passing various parameters to the R API, for example, which time-series model we would like to use, whether to return the predicted values (Ŷ) or not, seasonality parameters and anything else we need.

Visualize 3D Models in D365 FinOps

In this short blog post I’m going to show you how to build a 3D extensible control using the Extensible Control Framework in Dynamics 365 FinOps (AX). This can come in handy for ISV’s working in the manufacturing or additive manufacturing space (3D Printing).

Being able to fully visualize and interact with 3D models of parts within D365 brings us one step close to having a full end to end ERP > 3D printing interface, which is a side project I am working on.

Extensible controls allows us to build self-contained visual controls that we can share and allow other developers to simply drop onto a form. There are basically 3 main parts to it, the HTML, optional JavaScript, and the X++ class for the control itself, which allows us to communicate between the web browser front-end and the X++ back-end side of things.

For this post I’ll focus on the STL file format, arguably the most popular of the 3D formats, and widely used by 3D printers. We’ll add some basic properties to the control, including the URL of the STL file we want to visualize, object color and control height and width. This can be extended further of course, but we’ll keep things simple for a start.

We’ll start with the X++ class (or classes, in this case) which consists of the Control and BuildControl classes. The BuildControl class is where we define our controls public properties that once dropped on a form can be set by the X++ developer, and maintained during runtime. The source for our class is shown below.

/// <summary>
/// Build Control for 3D STL Viewer
/// </summary>
[FormDesignControlAttribute("XalSTL")]
class XalSTLControlBuild extends FormBuildControl
{
    str url = "";
    int innerHeight = 540;
    int innerWidth = 1024;
    int objectColor = 925765; //dark blue
    int objectShininess = 200;

}

[FormDesignPropertyAttribute("URL","XalSTL")]
public str parmURL(str _url=url)
{
    if (prmIsDefault(_url))
    {
        url = _url;
    }
    return url;
}

[FormDesignPropertyAttribute("InnerHeight","XalSTL")]
public int parmInnerHeight(int _innerHeight=innerHeight)
{
    if (prmIsDefault(_innerHeight))
    {
        innerHeight = _innerHeight;
    }
    return innerHeight;
}

[FormDesignPropertyAttribute("InnerWidth","XalSTL")]
public int parmInnerWidth(int _innerWidth=innerWidth)
{
    if (prmIsDefault(_innerWidth))
    {
        innerWidth = _innerWidth;
    }
    return innerWidth;
}

[FormDesignPropertyAttribute("ObjectColor","XalSTL")]
public int parmObjectColor(int _objectColor=objectColor)
{
    if (prmIsDefault(_objectColor))
    {
        objectColor = _objectColor;
    }
    return objectColor;
}

[FormDesignPropertyAttribute("ObjectShininess","XalSTL")]
public int parmObjectShininess(int _objectShininess=objectShininess)
{
    if (prmIsDefault(_objectShininess))
    {
        objectShininess = _objectShininess;
    }
    return objectShininess;
}

 

Next up is our main control class, with source below. Not much happening here, just basic framework stuff.

/// <summary>
/// Defines a 3D STL Viewer Control
/// </summary>
[FormControlAttribute('XalSTL','',classstr(XalSTLControlBuild))]
class XalSTLControl extends FormTemplateControl
{
    FormProperty url;
    FormProperty innerHeight;
    FormProperty innerWidth;
    FormProperty objectColor;
    FormProperty objectShininess;

}

protected void new(FormBuildControl _build, FormRun _formRun)
{
    super(_build, _formRun);
 
    this.setTemplateId('XalSTL');
    this.setResourceBundleName('/resources/html/XalSTL');

    url = properties.addProperty(methodStr(XalSTLControl, parmURL), Types::String);
    innerHeight = properties.addProperty(methodStr(XalSTLControl, parmInnerHeight), Types::Integer);
    innerWidth = properties.addProperty(methodStr(XalSTLControl, parmInnerWidth), Types::Integer);
    objectColor = properties.addProperty(methodStr(XalSTLControl, parmObjectColor), Types::Integer);
    objectShininess = properties.addProperty(methodStr(XalSTLControl, parmObjectShininess), Types::Integer);
}

[FormPropertyAttribute(FormPropertyKind::Value, "URL")]
public str parmURL(str _value = url.parmValue())
{
    if (!prmIsDefault(_value))
    {
        url.setValueOrBinding(_value);
    }
    return url.parmValue();
}

[FormPropertyAttribute(FormPropertyKind::Value, "InnerHeight")]
public int parmInnerHeight(int _value = innerHeight.parmValue())
{
    if (!prmIsDefault(_value))
    {
        innerHeight.setValueOrBinding(_value);
    }
    return innerHeight.parmValue();
}

[FormPropertyAttribute(FormPropertyKind::Value, "InnerWidth")]
public int parmInnerWidth(int _value = innerWidth.parmValue())
{
    if (!prmIsDefault(_value))
    {
        innerWidth.setValueOrBinding(_value);
    }
    return innerWidth.parmValue();
}

[FormPropertyAttribute(FormPropertyKind::Value, "ObjectColor")]
public int parmObjectColor(int _value = objectColor.parmValue())
{
    if (!prmIsDefault(_value))
    {
        objectColor.setValueOrBinding(_value);
    }
    return objectColor.parmValue();
}

[FormPropertyAttribute(FormPropertyKind::Value, "ObjectShininess")]
public int parmObjectShininess(int _value = objectShininess.parmValue())
{
    if (!prmIsDefault(_value))
    {
        objectShininess.setValueOrBinding(_value);
    }
    return objectShininess.parmValue();
}

public void applyBuild()
{
    super();
 
    XalSTLControlBuild build = this.build();
 
    if (build)
    {
        this.parmURL(build.parmURL());
        this.parmInnerHeight(build.parmInnerHeight());
        this.parmInnerWidth(build.parmInnerWidth());
    }
}

 

We also add the control HTML which contains little more than a DIV which we will use as a canvas for our 3D viewer. I reference four additional files containing a modified version of the THREEJS library, which I’ll share upon request.

<meta name="viewport" content="width=1024, user-scalable=no, initial-scale=0.5, minimum-scale=0.2, maximum-scale=0.5">

src="/resources/scripts/three.js">
src="/resources/scripts/STLLoader.js">
src="/resources/scripts/Detector.js">
src="/resources/scripts/OrbitControls.js">
id="XalSTL" style="max-height:400px;" data-dyn-bind="visible: $data.Visible">
/>
/>
src="/resources/scripts/XalSTL.js">

 

Finally, our control JavaScript contains the nuts and bolts that ties all this down into our control and makes this all work, fast and efficient, in D365. You’ll notice that our control has a URL parameter, and this allows us to store our (large) 3D models in Azure Blob Storage or via the dedicated storage account available within D365, via X++ code.

(function () {
    'use strict';
    $dyn.controls.XalSTL = function (data, element) {
        $dyn.ui.Control.apply(this, arguments);
        $dyn.ui.applyDefaults(this, data, $dyn.ui.defaults.XalSTL);
    };
 
    $dyn.controls.XalSTL.prototype = $dyn.ui.extendPrototype($dyn.ui.Control.prototype, {
        init: function (data, element) {
            var self = this;

            var _url = "";
            var _innerHeight = 540;
            var _innerWidth = 1024;
            var _objectColor = 0x0e2045;
            var _objectShininess = 200;

            $dyn.ui.Control.prototype.init.apply(this, arguments);
 
            if (!Detector.webgl) Detector.addGetWebGLMessage();
            var camera, scene, renderer;
            scene = new THREE.Scene();
            scene.add(new THREE.AmbientLight(0x999999));
            camera = new THREE.PerspectiveCamera(35, _innerWidth / _innerHeight, 1, 500);
            camera.up.set(0, 0, 1);
            camera.position.set(0, -9, 6);
            camera.add(new THREE.PointLight(0xffffff, 0.8));
            scene.add(camera);
            var grid = new THREE.GridHelper(25, 50, 0xffffff, 0x555555);
            grid.rotateOnAxis(new THREE.Vector3(1, 0, 0), 90 * (Math.PI / 180));
            scene.add(grid);
            renderer = new THREE.WebGLRenderer({ antialias: true });
            renderer.setClearColor(0x999999);
            renderer.setPixelRatio(window.devicePixelRatio);
            renderer.setSize(_innerWidth, _innerHeight);
            $(".XalSTL").context.activeElement.appendChild(renderer.domElement)

            $dyn.observe(data.URL, function (url) {
                if (url.toString().length > 0) {
                    _url = url;
                    RefreshModel();
                }
            });

            $dyn.observe(data.InnerHeight, function (innerHeight) {
                _innerHeight = innerHeight;
                RefreshModel();
            });

            $dyn.observe(data.InnerWidth, function (innerWidth) {
                _innerWidth = innerWidth;
                RefreshModel();
            });

            $dyn.observe(data.ObjectColor, function (objectColor) {
                _objectColor = objectColor;
                RefreshModel();
            });

            $dyn.observe(data.ObjectShininess, function (objectShininess) {
                _objectShininess = objectShininess;
                RefreshModel();
            });

            function RefreshModel()
            {
                if (_url.toString().length > 0) {
                    var loader = new THREE.STLLoader();
                    var material = new THREE.MeshPhongMaterial({ color: _objectColor, specular: 0x111111, _objectShininess: 200 });
                    var controls = new THREE.OrbitControls(camera, renderer.domElement);
                    loader.load(_url, function (geometry) {
                        var mesh = new THREE.Mesh(geometry, material);
                        mesh.position.set(0, 0, 0);
                        mesh.rotation.set(0, 0, 0);
                        mesh.scale.set(.02, .02, .02);
                        mesh.castShadow = true;
                        mesh.receiveShadow = true;
                        scene.add(mesh);
                        render();
                        controls.addEventListener('change', render);
                        controls.target.set(0, 1.2, 2);
                        controls.update();
                        window.addEventListener('resize', onWindowResize, false);
                    });
                }
            }

            function onWindowResize() {
                camera.aspect = _innerWidth / _innerHeight;
                camera.updateProjectionMatrix();
                renderer.setSize(_innerWidth, _innerHeight);
                render();
            }

            function render() {
                renderer.render(scene, camera);
            }
        }
    });
})();

 

We can construct a basic demo dialog form as shown below, hit the Reload button and wait for the magic. Using the mouse, we can zoom in and out, and rotate the object in 3D.

XalSTL

Adding a Fabricate button can trigger an event to kick-start our 3D printing process. To tie this all together we can use the rest of the services in D365 for a proper end to end manufacturing pipeline containing CRM, BOM, Invoicing, Projects and everything else normally involved in manufacturing, without leaving the D365 UI.

For a demo video showing this control in action, click here

Gesture Controls for Robotics

In my previous post I built a dice detection library via OpenCV, the idea being that using a small camera I can detect the dice and maneuver a robotic arm to pick it up and move it around, sorting it by color. Well it turns out that was way too easy and a bit lame to take up a whole blog post. Suffice it to say it works unbelievably well.

Instead, I figured maybe I can train a model to recognize hand gestures and have the robotic arm respond to commands made via these gestures. Turns out that is fairly easy too but let’s do it anyway.

Hand gesture recognition is really, really hard. I started off with HAAR Cascades I found on the web and some worked really well. Palm, fist. However I needed at least four and finding the remaining two turned out harder than expected. There are plenty of posts with photos showing it working but for some reason recognizing an “okay” or “vickey” just failed for me.

Instead I pulled out my trusty multi-label Keras model I used previously for X-Ray detection and using a few dozen video clips with frames split out into folders I managed to get together around 2000 training images, 500 for each gesture I want to respond to, split into 4 different folders, one for each class. These are shown below.

Gestures

We have flat palm for forward motion, flipped backhand for backward motion of the robotic arm, and then one each to open and close the claw for grabbing.

The Keras model training code in Python is shown below, a very simple model.

import numpy as np
import keras
from keras.preprocessing.image import img_to_array
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.callbacks import ModelCheckpoint
from imutils import paths
import random
import pickle
import cv2 as cv2
import os

# set our parameters
BatchSize = 32
Classes = 4
Epochs = 15 
InputShape = (64, 64, 3)
data = []
labels = []

print("Loading training data...")
imagePaths = sorted(list(paths.list_images('/home/gideon/Pictures/Hands')))
random.seed(42)
random.shuffle(imagePaths)

# loop over the input images
for imagePath in imagePaths:
    image = cv2.imread(imagePath)
    image = cv2.resize(image,(64,64)) # larger than 64x64 results in a model too big for RPi, this yields 86MB
    image = img_to_array(image)
    data.append(image)
    # augment the data here if required
    # rotate or swap on hor & ver axis

    # train images are spread across four folders based on their classes
    label = imagePath.split(os.path.sep)[-2].split("_")[0]
    labels.append(label)

data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)
mlb = LabelBinarizer()
labels = mlb.fit_transform(labels)

# partition the data into training and test splits (80/20)
(x_train, x_test, y_train, y_test) = train_test_split(data, labels, test_size=0.20, random_state=42)

# construct our model
model = Sequential()
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', input_shape=InputShape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(Classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy'])

# train
model.fit(x_train, y_train, batch_size=BatchSize, epochs=Epochs, verbose=1, validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

# save final model
model.save('hands.model')
f = open('hands.pickle', "wb")
f.write(pickle.dumps(mlb))
f.close()

print("Done...")

 

15 Epochs later we had really good results for accuracy and loss as shown here. The model is not going to give us bounding boxes, only a detected class, but that is good enough. If you want bounding boxes use Yolo3 instead.

Training

Assembling the robotic arm is much less enjoyable. For $20 you get a box of screws and some acrylic parts and instructions not even an IKEA engineer could make sense of. If you do buy one of these, make sure to center your servos prior to assembly. You do NOT want to disassemble this thing and start again, trust me. A sense of humor and patience truly is a virtue in this department.

If you are thinking of buying a robotic arm I highly recommend spending more and getting one that is aluminum, with 6 degrees of freedom, a decent claw, and preferably already assembled. Make sure the servos are high-quality with good torque too.

The servos run off 5 volts and need 1500 to 2000 amps ideally, off a separate power supply, and connecting the data pins directly to your Raspberry Pi is not advised, so I built a small circuit to protect the Pi from any malfunctioning servo using four 100K resistors as shown below. You could use one of the more expensive servo drivers available as well. I opted to just make my own.

Circuit

The final assembly with Pi and circuit board is shown below mounted on a heavy board. The arm moves really fast and makes a lot of noise, so make sure you add weight to the floor portion to keep things steady when it’s in motion.

Arm

Using the fantastic servoblaster library I wrote a couple of functions to control the arm movements, and then connected it all together with the trained model and image detection code.

My model works off 64×64 input images which keeps the final model under 90MB. Bigger than that and it won’t run on the Pi. If you want to use Yolo3 instead, tiny-yolo is the way to go for sure.

from keras.preprocessing.image import img_to_array
from keras.models import load_model
import numpy as np
import cv2 as cv2
import pickle
import time
import os

model = load_model("hands.model")
mlb = pickle.loads(open("hands.pickle", "rb").read())

state_open = False
state_forward = False
state_changed=False

#
# robotic arm movement functions
#
def ClawOpen():
  os.system("echo 3=2000us > /dev/servoblaster")
  time.sleep(1.5)

def ClawClose():
  os.system("echo 3=500us > /dev/servoblaster")
  time.sleep(1.5)

def ArmForward():
  os.system("echo 4=2000us > /dev/servoblaster")
  time.sleep(1.5)

def ArmBack():
  os.system("echo 4=1100us > /dev/servoblaster")
  time.sleep(1.5)

def ArmMiddle():
  os.system("echo 4=1400us > /dev/servoblaster")
  time.sleep(1.5)

def ArmUp():
  os.system("echo 0=2000us > /dev/servoblaster")
  time.sleep(1.5)

def ArmDown():
  os.system("echo 0=300us > /dev/servoblaster")
  time.sleep(1.5)

def BaseMiddle():
  os.system("echo 1=1300us > /dev/servoblaster")
  time.sleep(1.5)

def BaseLeft():
  os.system("echo 1=2500us > /dev/servoblaster")
  time.sleep(1.5)

def BaseLeftHalf():
  os.system("echo 1=1900us > /dev/servoblaster")
  time.sleep(1.5)

def BaseRight():
  os.system("echo 1=500us > /dev/servoblaster")
  time.sleep(1.5)

def BaseRightHalf():
  os.system("echo 1=900us > /dev/servoblaster")
  time.sleep(1.5)

# Init arm to default starting position and start video capture
ClawClose()
BaseMiddle()
ArmBack()
video_capture =  cv2.VideoCapture(0)

while True:
    ret, frame = video_capture.read()
    if ret == True:
        image = cv2.resize(frame, (64, 64))
        image = img_to_array(image)
        image = image.astype("float") / 255.0
        image = np.expand_dims(image, axis=0)
        proba = model.predict(image)[0]
        idxs = np.argsort(proba)[::-1][:2]

        for (i, j) in enumerate(idxs):
            if ((proba[j] * 100) > 90.00): # 90% or higher certainty before we react
                detected = mlb.classes_[j]
                if (detected == "close"):
                    if (state_open==True):
                        state_open=False
                        state_changed=True
                if (detected == "open"):
                    if (state_open==False):
                        state_open=True
                        state_changed=True
                if (detected =="forward"):
                    if (state_forward==False):
                        state_forward=True
                        state_changed=True
                if (detected =="back"):
                    if (state_forward==True):
                        state_forward=False
                        state_changed=True
            break # only care about the top prediction

        state=""
        if (state_forward==True):
            state=state+"F"
            if (state_changed==True):
                ArmForward()
        if (state_forward==False):
            state=state+"B"
            if (state_changed==True):
                ArmBack()
        if (state_open==True):
            state=state+"O"
            if(state_changed==True):
                ClawOpen()
        if (state_open==False):
            state=state+"C"
            if (state_changed==True):
                ClawClose()
        state_changed=False

        # display current state on lcd as a sanity check
        cv2.putText(frame, state, (10, (i * 30) + 25), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)

        cv2.imshow("Output", frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

video_capture.release()
cv2.destroyAllWindows()

 

So the idea is that using gestures you should be able to perform basic movements and grab hold of something, like a small plastic bottle. Keep in mind this is a $20 robotic arm so don’t expect it to lift anything heavier than a dry teabag (actually part of their online demo, which now makes perfect sense).

To see the system in action I’ve uploaded a 6 second clip over here: YouTube

It’s very basic, but considering the simplicity of the model and the low cost of the parts, this could make a nice attraction at a trade show, if nothing else.

Dice Detection using OpenCV

I’m working on a small project that will eventually involve object detection and sorting using a robot arm. In terms of object I considered various items from small plastic bottles to figurines and eventually settled on dice given their size, weight and suitability for what is basically a $20 plastic robot arm. Easy, I figured, grab some training images and train a tiny-yolo model for detection and we should be good to go, right? Not really.

This is what we are trying to detect, below. We want to know the color of the dice, as well as the number of pips it sees, from 1 to 6. We want to detect this off video, ideally, too, and fast, in real-time.

Dice

Training a yolo model, well any CNN basically requires a fair number of training images. In my previous post I trained a yolov3 model to detect rats and that took 600 images, carefully labelled and trained and I’ll be the first to admit that labeling hundreds of images is not my idea of a good time. It worked well, and I even managed to retrain it on tiny-yolo to fit on a Raspberry Pi3 and was happy with the result. The FPS rate wasn’t great but it worked well enough. So I figured I’d give it a go first with 40 odd images of a white dice, divided into 6 classes to denote the number of pips. Several hours later I had a model which detected pretty much nothing. Zero. Maybe I messed up the parameters or whatnot, but it made me consider alternatives.

Searching around I found a number of promising examples using OpenCV which I tried, with mixed results. OpenCV is fast, doesn’t require 60MB plus trained models and you can nicely break the problem down into different parts. So I started from scratch and assembled a fairly good model to detect not only different color dice but the pip count on each as well. Keep in mind I have specific requirements including dice location, rotation, color and distance from dice to camera being within a specific, fixed range. So don’t expect this to work for your casino tables right off the bat.

Key to making this work, and also the most painful part, is choosing an HSV color mask to extract the dice from the background. Now I assume that most of the time, and very much for what I need to do, you will have an idea what the background is going to be, say flat black, a green gaming board or whatever. You also have an idea of the distance between camera and dice too.

So the first step is to figure out the HSV color mask (lower and upper bounds of each) as shown below here on a white dice, within your own parameter constraints. It turns out green and red is easy, white is quite a pain to get right.

HSVConfig

You will notice in the screenshot above that I tuned the parameters using the trackbars to isolate the dice as much as possible. This won’t be 100%, there will be residual noise, but you want to be able to detect the pips, as circles, which you count using OpenCV’s HoughCircles method. We know what color we have if we detect pips within one of our 3 defined HSV color masks, basically. For bonus points you can detect the dice using contours too if you wish.

Having done the above for all three dice we get some very good results as shown below:

Red Dice:

Red3

Green Dice:

Green6

White Dice:

White3

The full Python code is included below. You’ll need to tune the HSV masks and the parameters for the HoughCircles method (minRadius, maxRadius) depending on your own requirements.

from imutils.video import VideoStream
import numpy as np
import cv2 as cv2
import imutils

# dice color in HSV
# measure these while on a typical expected background
greenLower = (43, 83, 103)
greenUpper = (99, 115, 182)
redLower = (137,26,149)
redUpper = (202,59,208)
whiteLower = (0,0,0) 
whiteUpper = (191, 160, 150) 

font = cv2.FONT_HERSHEY_SIMPLEX
topLeftCornerOfText = (10,30)
fontScale = 1
fontColor = (0,0,0)
lineType = 2

vs = VideoStream(src=1).start() #1=external USB cam

while True:
	frame = vs.read()
	if frame is None:
		continue

	frame = imutils.resize(frame, width=600)
	blurred = cv2.GaussianBlur(frame, (11, 11), 0)
	hsv = cv2.cvtColor(blurred, cv2.COLOR_BGR2HSV)
	
	#try red?
	mask = cv2.inRange(hsv, redLower, redUpper)
	mask = cv2.bitwise_not(mask) # invert
	circles = cv2.HoughCircles(mask, cv2.HOUGH_GRADIENT, 1, 20, param1=30, param2=15, minRadius=6, maxRadius=30)

	if circles is not None:
		circles = np.round(circles[0, :]).astype("int")
		if ((len(circles) > 0) and (len(circles) <=6)): # no point guessing
			cv2.putText(mask,"RED: " + str(len(circles)), topLeftCornerOfText, font, fontScale,fontColor,lineType)
	else:
		# try green?
		mask = cv2.inRange(hsv, greenLower, greenUpper)
		mask = cv2.bitwise_not(mask) # invert
		circles = cv2.HoughCircles(mask, cv2.HOUGH_GRADIENT, 1, 20, param1=30, param2=15, minRadius=6, maxRadius=30)
		if circles is not None:
			output = mask.copy()
			circles = np.round(circles[0, :]).astype("int")
			if ((len(circles) > 0) and (len(circles) <=6)):
				cv2.putText(mask,"GREEN: " + str(len(circles)), topLeftCornerOfText, font, fontScale, fontColor, lineType)
		else:
			# try white
			mask = cv2.inRange(hsv, whiteLower, whiteUpper)
			mask = cv2.bitwise_not(mask) # for white, depending on background color, remark this out
			circles = cv2.HoughCircles(mask, cv2.HOUGH_GRADIENT, 1, 20, param1=30, param2=15, minRadius=6, maxRadius=30)
			if circles is not None:
				output = mask.copy()
				circles = np.round(circles[0, :]).astype("int")
				if ((len(circles) > 0) and (len(circles) <=6)):
					cv2.putText(mask,"WHITE: " + str(len(circles)), topLeftCornerOfText, font, fontScale,fontColor,lineType)

	cv2.imshow("Preview", mask)
	key = cv2.waitKey(1) & 0xFF
	if key == ord("q"):
		break

vs.release()
cv2.destroyAllWindows()

 

From AI Model to Production in Azure

Problem Description (courtesy of DataDriven.com):

When a patient has a CT scan taken, a special device uses X-rays to take measurements from a variety of angles which are then computationally reconstructed into a 3D matrix of intensity values. Each layer of the matrix shows one very thin “slice” of the patient’s body.

This data is saved in an industry-standard format known as DICOM, which saves the image matrix in a set binary format and then wraps this data with a huge variety of metadata tags.

Some of these fields (e.g. hardware manufacturer, device serial number, voltage) are usually correct because they are automatically read from hardware and software settings.

The problem is that many important fields must be added manually by the technician and are therefore subject to human error factors like confusion, fatigue, loss of situational awareness, and simple typos.

A doctor scrutinising image data will usually be able to detect incorrect metadata, but in an era when more and more diagnoses are being carried out by computers it is becoming increasingly important that patient record data is as accurate as possible.

This is where Artificial Intelligence comes in. We want to improve the error checking for one single but incredibly important value: a field known as Image Orientation (Patient) which indicates the 3D orientation of the patient’s body in the image.

For this challenge we’re given 20,000 CT scan images, sized 64×64 pixels and labelled correctly for training. The basic premise is given an image, the AI model needs to predict the correct orientation as explained graphically below. The red arrow shows the location of the spine, which our AI model needs to find to figure out the image orientation.

Capstone

We’ll use Tensorflow and Keras to build and train an AI model in Python and validate against another 20,000 unlabelled images. The pipeline I used had three parts to it, but the core is shown in Python below and achieved 99.98% accuracy on the validation set. The second and third parts (not shown) pushed this to 100%, landing me a #6 ranking on the leader board. A preview of the 20,000 sample training images is shown below.

Sample

Our model in Python:

(x_train, x_test, y_train, y_test) = train_test_split(data, labels, test_size=0.15, random_state=42)

# construct our model
model = Sequential()
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', input_shape=InputShape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(Classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy'])

checkpoint = ModelCheckpoint("model.h5", monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

# start training
model.fit(x_train, y_train, batch_size=BatchSize, epochs=Epochs, verbose=1, validation_data=(x_test, y_test), callbacks=callbacks_list)
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

# save the model and multi-label binarizer to disk
model.save('capstone.model')
f = open('capstone.pickle', "wb")
f.write(pickle.dumps(mlb))
f.close()

 

I split the sample images into four folders according to their labels and I used ZERO, ONE, TWO and THREE as the class labels. So, given a test image the model will do a prediction and return one of those class labels to assign.

First things first, we’ll construct our model and start the training. On my dual-K80 GPU server this took about an hour. The model is saved at various stages, and once we are happy with the accuracy we’ll save the resulting model and pickle file (capstone.model & capstone.pickle in the code)

To deploy this as an API in Azure we’ll create a new web app with default Azure settings. Once deployed, we’ll add the Python 3.6 extension. Switch to the console mode and use pip to install any additional libraries we need, including Flask, OpenCV, Tensorflow and Keras. Modify the web.config to look like the one shown below. Note that our Python server script will be named run_keras_server.py

<configuration>
  <appSettings>
    <add key="PYTHONPATH" value="D:\home\site\wwwroot"/>
    <add key="WSGI_HANDLER" value="run_keras_server.app"/>
    <add key="WSGI_LOG" value="D:\home\LogFiles\wfastcgi.log"/>
  </appSettings>
  <system.webServer>
    <handlers>
      <add name="PythonHandler" path="*" verb="*" modules="FastCgiModule" scriptProcessor="D:\home\Python364x64\python.exe|D:\home\Python364x64\wfastcgi.py" resourceType="Unspecified" requireAccess="Script"/>
    </handlers>
  </system.webServer>
</configuration>

 

Our Python run_keras_server.py script:

import numpy as np
from keras.preprocessing.image import img_to_array
from keras.applications import imagenet_utils
from keras.models import load_model
import cv2
import flask
import io
import pickle

app = flask.Flask(__name__)

model = load_model("capstone.model")
mlb = pickle.loads(open('capstone.pickle', "rb").read())

def _grab_image(stream=None):
	if stream is not None:
		data = stream.read()
		image = np.asarray(bytearray(data), dtype="uint8")
		image = cv2.imdecode(image, cv2.IMREAD_COLOR)
	return image
	
@app.route("/predict", methods=["POST"])
def predict():
    
    data = {"success": False, "label":"None"}

    if flask.request.method == "POST":
        if flask.request.files.get('image'):
            image = _grab_image(stream=flask.request.files["image"])
            image = image.astype("float") / 255.0
            image = img_to_array(image)
            image = np.expand_dims(image, axis=0)
            proba = model.predict(image)[0]
            idxs = np.argsort(proba)[::-1][:2]
            label = mlb.classes_[idxs[0]]
            
            if label == "ZERO":
                label = "Spine at bottom, patient facing up."
            if label == "ONE":
                label = "Spine at right, patient facing left."
            if label == "TWO":
                label = "Spine at top, patient facing down."
            if label == "THREE":
                label = "Spine at left, patient facing right."
            
            data["label"] = label
            data["success"] = True

    return flask.jsonify(data)

if __name__ == "__main__":
    app.run()

 

Using your FTP tool of choice, upload the run_keras_server.py script, along with capstone.model and capstone.pickle, into the D:\home\site\wwwroot folder. Restart the web app from within Azure.

We can test our API using Postman, or the C# script shown below, which takes a sample image and performs a prediction.

using System;
using System.Net.Http;
using System.Threading.Tasks;

namespace CallPythonAPI
{
    class Program
    {
        private static readonly HttpClient client = new HttpClient();

        static void Main(string[] args)
        {
            string responsePayload = Upload().GetAwaiter().GetResult();
            Console.WriteLine(responsePayload);
        }

        private static async Task<string> Upload()
        {
            var request = new HttpRequestMessage(HttpMethod.Post, "http://mywebappdemo.azurewebsites.net/predict");
            var content = new MultipartFormDataContent();
            byte[] byteArray = System.IO.File.ReadAllBytes("20.png");
            content.Add(new ByteArrayContent(byteArray), "image", "20.png");
            request.Content = content;
            var response = await client.SendAsync(request);
            response.EnsureSuccessStatusCode();
            return await response.Content.ReadAsStringAsync();
        }
    }
}

 

Our sample image looks like this:

20

Running the prediction on this image yields the following result:

Prediction

That’s it. We can incorporate the API call into a web site, desktop client app or even a Raspberry PI device, since all the heavy lifting is done on the server-side.

Forensic Analysis with Python & Benford’s Law

Early in my career I specialised in Computer Security and more specifically Data Security. On one particular engagement I was confronted with a system that had virtually no audit log capability and very limited access control (mainframe based), and the suspicion was that staff was being paid off to alter transactional data.

The tools I had at my disposal was Microsoft Access, a basic CSV transaction log and a copy of Borland Delphi and I focussed on analysing and detecting changes in processing volume of data operators as an indication of suspicious activity, with some good success. Looking back, I wish I knew about Benford’s Law, as that would have certainly made my life much easier. Now 20 years later I work extensively in global payroll within the Microsoft Dynamics 365 ERP market, and while the threat of fraud remains, the tools and processing capability have advanced and improved dramatically.

From Wikipedia: “Benford’s law, also called Newcomb-Benford’s law, law of anomalous numbers, and first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. For example, in sets that obey the law, the number 1 appears as the most significant digit about 30% of the time, while 9 appears as the most significant digit less than 5% of the time. If the digits were distributed uniformly, they would each occur about 11.1% of the time. Benford’s law also makes predictions about the distribution of second digits, third digits, digit combinations, and so on.”

Payroll data as with any ERP financial data can consist of thousands or hundreds of thousands of transactions per pay run. Consider a typical worker with 10 to 15 different payments (or allowances) across a workforce of 5000 workers. This often generates 75,000 or more transactions per pay run and auditing of this volume, which can then be run weekly, fortnightly or monthly (thus 75,000 x 4 per month) presents a significant workload problem. Spot-checking becomes unfeasible unless you could reduce your focus to transactions that may require further scrutiny.

Consider a policy requiring approval of expenses that exceed $300. As long as you submit expenses totalling no more than $290 odd you might be able to sneak this through every so often, and while this is no heist, this amount can still add up over time. Anti-Money Laundering systems often utilize hundreds of rules, one typically detects money transfers exceeding a cut-off of $10,000 before raising a flag requiring bank approval. If you travel internationally often enough, you’ll see that $10,000 amount on arrival and departure cards all the time.

Let’s take a few thousand rows of allowance data, which includes salary and miscellaneous allowances and sanitize it by removing identifying columns, leaving only the amount column.

Our test data is shown below.

DataNotFake

I’ll be using a Python library available here that implements Benford’s Law by testing our null hypothesis and displaying a graph showing the digit distribution. A screenshot of the script is shown below, running in Visual Studio Code on Ubuntu Linux.

CodeView

I’ve modified the script and ran it against our clean, non-modified data and the resulting digit distribution is shown below.

NotFake

We can see a fairly good expected distribution curve with slight elevation of digit ‘6’ and ‘5’ being a bit low, but still within a fairly normal distribution. You need to understand the data fairly well to explain any deviations such as this. Here it could be that all employees receive a single allowance fixed at $60, producing the elevation. We are experimenting here after all, don’t assume you can load a bunch of numbers from a spreadsheet and this script will become your magic fraud detection silver bullet.

Let’s manually modify our test data by replacing some allowances with random numbers. An extract is shown below and notice the numerous 4xx digit amounts now occurring (my manually modified amount).

DataFaked

Running our script again produces the plot below, clearly indicating an elevation of digit ‘4’ occurring when the natural expectation of occurrence was much less. Other figures are also off as a consequence, especially ‘7’.

Fakes

With this in hand, we can now isolate these occurrences in our data and perform a deeper inspection and validation of these transactions, the associated workers and approver of the workflow, if that was required. Spot-checking, but across a more narrow area of focus.

For further reading I recommend the work done by Mark Nigrini on the subject.

Near-perfect YOLO3 Object Detection from scratch

I recently completed the Microsoft Professional Program in Artificial Intelligence and have been really impressed by some of the many computer vision examples I’ve seen. It’s a great course and if you are interested in AI I highly recommend it, along with the fantastic blog and training offered by Adrian Rosebrock at pyimagesearch.com.

There are a number of key technologies and platforms that will continuously come up in AI as you learn – Tensorflow, CNTK, OpenCV and of course Keras. Once you start exploring computer vision and specifically Convoluted Neural Networks you are bound to run into numerous examples of real-time object detection from video, whether it’s a car, person, dog or street-sign, and most of these examples will use a pre-built model, laboriously created to detect dozens or even thousands of classes of objects out of the box, and ready for you to use in your own models with little to no effort required.

That’s all great, but what if you wanted to detect something that is not included in the pre-built model? The solution lies in building and training your own from scratch, which is what I did for this post.

I’ve found YOLO3 to be really fantastic, and since I’m a Windows user my focus was on being able to build and train a model without having to struggle with code or tutorials designed for Linux. I found a pretty good set of scripts on GitHub and started off by getting it all running locally and training their example detector which detects raccoons.

Sometimes I use a laptop with Intel HD5000 GPU and PlaidML sitting between Keras and Tensorflow. This works well in most cases but for training a YOLO3 model you’ll need a better setup, and I used an Azure Windows 2016 Server VM I deployed and loaded it with Python 3.6, Tensorflow and Keras.

The VM comes with 112GB of RAM and dual Nvidia K80 GPU’s. It’s not cheap to operate so I do all my prep work locally, making sure the model starts training without obvious errors and then copy that all over to the VM for the training run.

For this post I decided that while raccoons are cool, rats would be more interesting. Rats are fast, come in all shapes, sizes and colours, and can unfortunately cause problems when not kept as pets. They nest, chew through electrical wiring, and cause havoc in agriculture and food manufacturing. They are also used for neuroscience research with the classic example being a rat running a maze.

Because of the speed they move and ways they can contort their bodies it should, in theory, be pretty hard to detect and classify using a CNN. Let’s give it a try.

I started off by collecting 200 assorted images of rats and mice using both Google and Bing, then did the annotation using LabelImg as shown below.

LabelImg

This presents the first major decision we need to make. Do we include the tail in the annotation or not? So, we need to take a step back and think carefully what it is we are trying to achieve.

  • We want to detect rats (and mice), and detecting their bodies or heads is good enough
  • Sometimes all you see is a tail, no body, and yet it’s still a rat!
  • Including the tail also introduces the visual environment around the tail, which could throw our training

Consider for a moment if our task was to build a model that detects both rats and earthworms. Suddenly a rat tail can (and likely will) be detected as an earthworm, or the other way around since they are both similar in shape and colour. I don’t really have an answer here, and I’ve opted to ignore tails completely, except for maybe a stump or an inch of the tail, no more. Let’s see how that works out. We don’t have a lot of training images so our options are limited.

I modified the config.json file as shown below to include our single class (rodent) and generated the anchors as recommended and changed that in the config file. I am not using the YOLO3 pre-trained weights file as I want to train from scratch completely. (Tip: I did a run with pre-trained weights as a test and the results were disappointing)

{
    "model" : {
        "min_input_size":       128,
        "max_input_size":       872,
        "anchors":              [76,100, 94,201, 139,285, 188,127, 222,339, 234,225, 317,186, 323,281, 331,382],
        "labels":               ["rodent"]
    },

    "train": {
        "train_image_folder":   "C:/Users/xalvm/Documents/Projects/keras-yolo3/data/rodent_dataset/images/",
        "train_annot_folder":   "C:/Users/xalvm/Documents/Projects/keras-yolo3/data/rodent_dataset/anns/",      
        "cache_name":           "rodent_train.pkl",
        "train_times":          10,             
        "pretrained_weights":   "",             
        "batch_size":           4,             
        "learning_rate":        1e-4,           
        "nb_epochs":             30,             
        "warmup_epochs":        3,              
        "ignore_thresh":        0.5,
        "gpus":                 "0,1",
        "grid_scales":          [1,1,1],
        "obj_scale":            5,
        "noobj_scale":          1,
        "xywh_scale":           1,
        "class_scale":          1,
        "tensorboard_dir":      "logs",
        "saved_weights_name":   "rodent.h5",
        "debug":                false            
    },

    "valid": {
        "valid_image_folder":   "",
        "valid_annot_folder":   "",
        "cache_name":           "",
        "valid_times":          1
    }
}

 

A typical training run in-progress is shown below, and I stopped the training at around 27 epochs since there was no loss reduction after epoch 24.

Training

Using a sample video off YouTube I ran predict.py and viewed the results frame by frame, noticing some good results and a fair amount of missed predictions. The best way to improve prediction is with more training data, so back we go to Google and Bing for more images, and we also grab some frames from random rat videos for more annotation.

My resulting set now contains 560 annotated training images which the script will split into a train/test set for me. With more training images comes longer training runs, and the next run took 20 hours before I stopped it at Epoch 30. This time the results were a lot more impressive.

There were still some failures, so let’s look at those first.

0127

Here are three consecutive frames where the first we have a hit, the second nearly identical frame was missed, while the third again got a hit. This is quite bizarre, as our predictor does a frame by frame prediction. It’s not seeing the video clip as a whole, it literally detects frame by frame and yet in the middle frame we failed.

0601

Again we see three frames where the first was missed, and we would assume the low quality of the frame is to blame. However, notice the following sequence:

0273

Here we barely have the silhouette of a head appearing and yet we get a 98% probability on what is a small, very fuzzy image.

1730

The final sequence above is quite impressive though, a good hit on what is no more than a ball of white fur. If you watch the full clip you will see a few more misses that should have been obvious, and then some pretty incredible hits.

All in all really impressive results, and we only had 560 training images.

Watch the clip here: (I removed 10 seconds from the clip to protect privacy)

YOLO3 Results