From AI Model to Production in Azure

Problem Description (courtesy of

When a patient has a CT scan taken, a special device uses X-rays to take measurements from a variety of angles which are then computationally reconstructed into a 3D matrix of intensity values. Each layer of the matrix shows one very thin “slice” of the patient’s body.

This data is saved in an industry-standard format known as DICOM, which saves the image matrix in a set binary format and then wraps this data with a huge variety of metadata tags.

Some of these fields (e.g. hardware manufacturer, device serial number, voltage) are usually correct because they are automatically read from hardware and software settings.

The problem is that many important fields must be added manually by the technician and are therefore subject to human error factors like confusion, fatigue, loss of situational awareness, and simple typos.

A doctor scrutinising image data will usually be able to detect incorrect metadata, but in an era when more and more diagnoses are being carried out by computers it is becoming increasingly important that patient record data is as accurate as possible.

This is where Artificial Intelligence comes in. We want to improve the error checking for one single but incredibly important value: a field known as Image Orientation (Patient) which indicates the 3D orientation of the patient’s body in the image.

For this challenge we’re given 20,000 CT scan images, sized 64×64 pixels and labelled correctly for training. The basic premise is given an image, the AI model needs to predict the correct orientation as explained graphically below. The red arrow shows the location of the spine, which our AI model needs to find to figure out the image orientation.


We’ll use Tensorflow and Keras to build and train an AI model in Python and validate against another 20,000 unlabelled images. The pipeline I used had three parts to it, but the core is shown in Python below and achieved 99.98% accuracy on the validation set. The second and third parts (not shown) pushed this to 100%, landing me a #6 ranking on the leader board. A preview of the 20,000 sample training images is shown below.


Our model in Python:

(x_train, x_test, y_train, y_test) = train_test_split(data, labels, test_size=0.15, random_state=42)

# construct our model
model = Sequential()
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', input_shape=InputShape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(128, activation='relu'))
model.add(Dense(Classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy'])

checkpoint = ModelCheckpoint("model.h5", monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

# start training, y_train, batch_size=BatchSize, epochs=Epochs, verbose=1, validation_data=(x_test, y_test), callbacks=callbacks_list)
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

# save the model and multi-label binarizer to disk'capstone.model')
f = open('capstone.pickle', "wb")


I split the sample images into four folders according to their labels and I used ZERO, ONE, TWO and THREE as the class labels. So, given a test image the model will do a prediction and return one of those class labels to assign.

First things first, we’ll construct our model and start the training. On my dual-K80 GPU server this took about an hour. The model is saved at various stages, and once we are happy with the accuracy we’ll save the resulting model and pickle file (capstone.model & capstone.pickle in the code)

To deploy this as an API in Azure we’ll create a new web app with default Azure settings. Once deployed, we’ll add the Python 3.6 extension. Switch to the console mode and use pip to install any additional libraries we need, including Flask, OpenCV, Tensorflow and Keras. Modify the web.config to look like the one shown below. Note that our Python server script will be named

    <add key="PYTHONPATH" value="D:\home\site\wwwroot"/>
    <add key="WSGI_HANDLER" value=""/>
    <add key="WSGI_LOG" value="D:\home\LogFiles\wfastcgi.log"/>
      <add name="PythonHandler" path="*" verb="*" modules="FastCgiModule" scriptProcessor="D:\home\Python364x64\python.exe|D:\home\Python364x64\" resourceType="Unspecified" requireAccess="Script"/>


Our Python script:

import numpy as np
from keras.preprocessing.image import img_to_array
from keras.applications import imagenet_utils
from keras.models import load_model
import cv2
import flask
import io
import pickle

app = flask.Flask(__name__)

model = load_model("capstone.model")
mlb = pickle.loads(open('capstone.pickle', "rb").read())

def _grab_image(stream=None):
	if stream is not None:
		data =
		image = np.asarray(bytearray(data), dtype="uint8")
		image = cv2.imdecode(image, cv2.IMREAD_COLOR)
	return image
@app.route("/predict", methods=["POST"])
def predict():
    data = {"success": False, "label":"None"}

    if flask.request.method == "POST":
        if flask.request.files.get('image'):
            image = _grab_image(stream=flask.request.files["image"])
            image = image.astype("float") / 255.0
            image = img_to_array(image)
            image = np.expand_dims(image, axis=0)
            proba = model.predict(image)[0]
            idxs = np.argsort(proba)[::-1][:2]
            label = mlb.classes_[idxs[0]]
            if label == "ZERO":
                label = "Spine at bottom, patient facing up."
            if label == "ONE":
                label = "Spine at right, patient facing left."
            if label == "TWO":
                label = "Spine at top, patient facing down."
            if label == "THREE":
                label = "Spine at left, patient facing right."
            data["label"] = label
            data["success"] = True

    return flask.jsonify(data)

if __name__ == "__main__":


Using your FTP tool of choice, upload the script, along with capstone.model and capstone.pickle, into the D:\home\site\wwwroot folder. Restart the web app from within Azure.

We can test our API using Postman, or the C# script shown below, which takes a sample image and performs a prediction.

using System;
using System.Net.Http;
using System.Threading.Tasks;

namespace CallPythonAPI
    class Program
        private static readonly HttpClient client = new HttpClient();

        static void Main(string[] args)
            string responsePayload = Upload().GetAwaiter().GetResult();

        private static async Task<string> Upload()
            var request = new HttpRequestMessage(HttpMethod.Post, "");
            var content = new MultipartFormDataContent();
            byte[] byteArray = System.IO.File.ReadAllBytes("20.png");
            content.Add(new ByteArrayContent(byteArray), "image", "20.png");
            request.Content = content;
            var response = await client.SendAsync(request);
            return await response.Content.ReadAsStringAsync();


Our sample image looks like this:


Running the prediction on this image yields the following result:


That’s it. We can incorporate the API call into a web site, desktop client app or even a Raspberry PI device, since all the heavy lifting is done on the server-side.

Forensic Analysis with Python & Benford’s Law

Early in my career I specialised in Computer Security and more specifically Data Security. On one particular engagement I was confronted with a system that had virtually no audit log capability and very limited access control (mainframe based), and the suspicion was that staff was being paid off to alter transactional data.

The tools I had at my disposal was Microsoft Access, a basic CSV transaction log and a copy of Borland Delphi and I focussed on analysing and detecting changes in processing volume of data operators as an indication of suspicious activity, with some good success. Looking back, I wish I knew about Benford’s Law, as that would have certainly made my life much easier. Now 20 years later I work extensively in global payroll within the Microsoft Dynamics 365 ERP market, and while the threat of fraud remains, the tools and processing capability have advanced and improved dramatically.

From Wikipedia: “Benford’s law, also called Newcomb-Benford’s law, law of anomalous numbers, and first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. For example, in sets that obey the law, the number 1 appears as the most significant digit about 30% of the time, while 9 appears as the most significant digit less than 5% of the time. If the digits were distributed uniformly, they would each occur about 11.1% of the time. Benford’s law also makes predictions about the distribution of second digits, third digits, digit combinations, and so on.”

Payroll data as with any ERP financial data can consist of thousands or hundreds of thousands of transactions per pay run. Consider a typical worker with 10 to 15 different payments (or allowances) across a workforce of 5000 workers. This often generates 75,000 or more transactions per pay run and auditing of this volume, which can then be run weekly, fortnightly or monthly (thus 75,000 x 4 per month) presents a significant workload problem. Spot-checking becomes unfeasible unless you could reduce your focus to transactions that may require further scrutiny.

Consider a policy requiring approval of expenses that exceed $300. As long as you submit expenses totalling no more than $290 odd you might be able to sneak this through every so often, and while this is no heist, this amount can still add up over time. Anti-Money Laundering systems often utilize hundreds of rules, one typically detects money transfers exceeding a cut-off of $10,000 before raising a flag requiring bank approval. If you travel internationally often enough, you’ll see that $10,000 amount on arrival and departure cards all the time.

Let’s take a few thousand rows of allowance data, which includes salary and miscellaneous allowances and sanitize it by removing identifying columns, leaving only the amount column.

Our test data is shown below.


I’ll be using a Python library available here that implements Benford’s Law by testing our null hypothesis and displaying a graph showing the digit distribution. A screenshot of the script is shown below, running in Visual Studio Code on Ubuntu Linux.


I’ve modified the script and ran it against our clean, non-modified data and the resulting digit distribution is shown below.


We can see a fairly good expected distribution curve with slight elevation of digit ‘6’ and ‘5’ being a bit low, but still within a fairly normal distribution. You need to understand the data fairly well to explain any deviations such as this. Here it could be that all employees receive a single allowance fixed at $60, producing the elevation. We are experimenting here after all, don’t assume you can load a bunch of numbers from a spreadsheet and this script will become your magic fraud detection silver bullet.

Let’s manually modify our test data by replacing some allowances with random numbers. An extract is shown below and notice the numerous 4xx digit amounts now occurring (my manually modified amount).


Running our script again produces the plot below, clearly indicating an elevation of digit ‘4’ occurring when the natural expectation of occurrence was much less. Other figures are also off as a consequence, especially ‘7’.


With this in hand, we can now isolate these occurrences in our data and perform a deeper inspection and validation of these transactions, the associated workers and approver of the workflow, if that was required. Spot-checking, but across a more narrow area of focus.

For further reading I recommend the work done by Mark Nigrini on the subject.

Near-perfect YOLO3 Object Detection from scratch

I recently completed the Microsoft Professional Program in Artificial Intelligence and have been really impressed by some of the many computer vision examples I’ve seen. It’s a great course and if you are interested in AI I highly recommend it, along with the fantastic blog and training offered by Adrian Rosebrock at

There are a number of key technologies and platforms that will continuously come up in AI as you learn – Tensorflow, CNTK, OpenCV and of course Keras. Once you start exploring computer vision and specifically Convoluted Neural Networks you are bound to run into numerous examples of real-time object detection from video, whether it’s a car, person, dog or street-sign, and most of these examples will use a pre-built model, laboriously created to detect dozens or even thousands of classes of objects out of the box, and ready for you to use in your own models with little to no effort required.

That’s all great, but what if you wanted to detect something that is not included in the pre-built model? The solution lies in building and training your own from scratch, which is what I did for this post.

I’ve found YOLO3 to be really fantastic, and since I’m a Windows user my focus was on being able to build and train a model without having to struggle with code or tutorials designed for Linux. I found a pretty good set of scripts on GitHub and started off by getting it all running locally and training their example detector which detects raccoons.

Sometimes I use a laptop with Intel HD5000 GPU and PlaidML sitting between Keras and Tensorflow. This works well in most cases but for training a YOLO3 model you’ll need a better setup, and I used an Azure Windows 2016 Server VM I deployed and loaded it with Python 3.6, Tensorflow and Keras.

The VM comes with 112GB of RAM and dual Nvidia K80 GPU’s. It’s not cheap to operate so I do all my prep work locally, making sure the model starts training without obvious errors and then copy that all over to the VM for the training run.

For this post I decided that while raccoons are cool, rats would be more interesting. Rats are fast, come in all shapes, sizes and colours, and can unfortunately cause problems when not kept as pets. They nest, chew through electrical wiring, and cause havoc in agriculture and food manufacturing. They are also used for neuroscience research with the classic example being a rat running a maze.

Because of the speed they move and ways they can contort their bodies it should, in theory, be pretty hard to detect and classify using a CNN. Let’s give it a try.

I started off by collecting 200 assorted images of rats and mice using both Google and Bing, then did the annotation using LabelImg as shown below.


This presents the first major decision we need to make. Do we include the tail in the annotation or not? So, we need to take a step back and think carefully what it is we are trying to achieve.

  • We want to detect rats (and mice), and detecting their bodies or heads is good enough
  • Sometimes all you see is a tail, no body, and yet it’s still a rat!
  • Including the tail also introduces the visual environment around the tail, which could throw our training

Consider for a moment if our task was to build a model that detects both rats and earthworms. Suddenly a rat tail can (and likely will) be detected as an earthworm, or the other way around since they are both similar in shape and colour. I don’t really have an answer here, and I’ve opted to ignore tails completely, except for maybe a stump or an inch of the tail, no more. Let’s see how that works out. We don’t have a lot of training images so our options are limited.

I modified the config.json file as shown below to include our single class (rodent) and generated the anchors as recommended and changed that in the config file. I am not using the YOLO3 pre-trained weights file as I want to train from scratch completely. (Tip: I did a run with pre-trained weights as a test and the results were disappointing)

    "model" : {
        "min_input_size":       128,
        "max_input_size":       872,
        "anchors":              [76,100, 94,201, 139,285, 188,127, 222,339, 234,225, 317,186, 323,281, 331,382],
        "labels":               ["rodent"]

    "train": {
        "train_image_folder":   "C:/Users/xalvm/Documents/Projects/keras-yolo3/data/rodent_dataset/images/",
        "train_annot_folder":   "C:/Users/xalvm/Documents/Projects/keras-yolo3/data/rodent_dataset/anns/",      
        "cache_name":           "rodent_train.pkl",
        "train_times":          10,             
        "pretrained_weights":   "",             
        "batch_size":           4,             
        "learning_rate":        1e-4,           
        "nb_epochs":             30,             
        "warmup_epochs":        3,              
        "ignore_thresh":        0.5,
        "gpus":                 "0,1",
        "grid_scales":          [1,1,1],
        "obj_scale":            5,
        "noobj_scale":          1,
        "xywh_scale":           1,
        "class_scale":          1,
        "tensorboard_dir":      "logs",
        "saved_weights_name":   "rodent.h5",
        "debug":                false            

    "valid": {
        "valid_image_folder":   "",
        "valid_annot_folder":   "",
        "cache_name":           "",
        "valid_times":          1


A typical training run in-progress is shown below, and I stopped the training at around 27 epochs since there was no loss reduction after epoch 24.


Using a sample video off YouTube I ran and viewed the results frame by frame, noticing some good results and a fair amount of missed predictions. The best way to improve prediction is with more training data, so back we go to Google and Bing for more images, and we also grab some frames from random rat videos for more annotation.

My resulting set now contains 560 annotated training images which the script will split into a train/test set for me. With more training images comes longer training runs, and the next run took 20 hours before I stopped it at Epoch 30. This time the results were a lot more impressive.

There were still some failures, so let’s look at those first.


Here are three consecutive frames where the first we have a hit, the second nearly identical frame was missed, while the third again got a hit. This is quite bizarre, as our predictor does a frame by frame prediction. It’s not seeing the video clip as a whole, it literally detects frame by frame and yet in the middle frame we failed.


Again we see three frames where the first was missed, and we would assume the low quality of the frame is to blame. However, notice the following sequence:


Here we barely have the silhouette of a head appearing and yet we get a 98% probability on what is a small, very fuzzy image.


The final sequence above is quite impressive though, a good hit on what is no more than a ball of white fur. If you watch the full clip you will see a few more misses that should have been obvious, and then some pretty incredible hits.

All in all really impressive results, and we only had 560 training images.

Watch the clip here: (I removed 10 seconds from the clip to protect privacy)

YOLO3 Results