Luminoth’s Web API: Hackathon-Speed Integration of Neural Networks
One of the more challenging aspects of developing SDKs with machine learning models is deployment and productionization. TensorFlow in particular can be difficult to set up, and requires GPUs to evaluate large models. This post will share my experiences in skirting this process entirely to quickly evaluate a FasterRCNN-based model during a hackathon last year, usable on any office or CI machine.
During this hackathon, I implemented and trained a model from a paper from ICDAR 2017 on one of our physical machine learning-equipped machines. To achieve quick deliverables, rather than try to get the trained model and data off the machine, I simply used a tool called Luminoth running on the machine to expose the model’s prediction functionality. This also allowed anybody on my team to continue developing the model afterward with minimal friction, and required only a small networking shim in our codebase.
Luminoth is a Python-based tool that I like to refer to as “a command line wrapper around TensorFlow.” While the use of a YAML file to quickly set up and train some popular networks such as FasterRCNN is its main use, it also exposes a Flask-based server which allows prediction queries via a web page. As it turns out, it also exposes an (undocumented) API which is usable programmatically.
My codebase is in C++ with a C# assembly wrapping it. That being the case, I had to get my model’s predictions (a number of bounding boxes) into C++ code, and fast. Figuring out TensorFlow’s shaky C++ API (or even using Python-based TensorFlow) wasn’t an option. The model was already trained on our machine-learning computer, and would have required a large setup cost and data duplication by anyone else evaluating the model. I had my eye on a particular C++ networking library, CPR, that I have been meaning to use; so I thought, why not tackle all of these problems at once?
Let’s start by figuring out Luminoth’s API from the source and web page itself.
First, using Lunimoth’s server as per the documentation shows requests being made to an endpoint named `api/fastercnn/predict`. We can see it’s returning some JSON–great, we now know it’s probably possible to invoke programmatically!
Digging in Luminoth’s web.py, around line 31 at the time of writing, the corresponding endpoint `/api//predict/` method is our ticket.
The first thing we see is an attempt to retrieve the image data from the request to predict:
try:
image_array = get_image()
except ValueError:
return jsonify(error='Missing image'), 400
except OSError:
return jsonify(error='Incompatible file type'), 400
What is get_image() ? Well, it shows an expectation of a POST’ed file by the name of ‘image’.
def get_image():
image = request.files.get('image')
if not image:
raise ValueError
image = Image.open(image.stream).convert('RGB')
return image
This is a Flask web server. The Flask documentation for the files property in the Request object shows that this only appears (for our purposes) in a POST request, with a <form> object, and when an encoding of enctype=”multipart/form-data” is given. Right, sounds like we now know how to use the endpoint programmatically. Now, how can we call this from C++ using CPR?
Let’s start with the POST request. Using CPR, this is very straightforward. The required multipart/form-data encoding is handled by the cpr::Multipart object. At the time of writing, there is a bug with that and data buffers; so in order to proceed with the hackathon, the image was first written to a file, reloaded, and then sent. Don’t do that if possible.
extern "C" __declspec(dllexport) void* SendImagePostRequest(const char* url, unsigned char* data, int data_size)
{
std::string filename = WriteTemporaryFile(data, data_size);
auto response = cpr::Post(
cpr::Url{ url },
cpr::Multipart{{ "image", cpr::File{ filename } }});
std::remove(filename.c_str());
return ParseJsonResults(response.text);
}
Where url is the URL of the Luminoth endpoint we found, and data and data_size are the image we are trying to use FasterRCNN to predict. When used, it looks like this:
void* resultsHandle = predictTables("http://beast-pc:5000/api/fasterrcnn/predict/", image.data(), (int)image.size());
The POST request returns a JSON string. We need to decode it. Luckily, there is superb header-only Json library, Nlohmann Json (which I think has the potential to be part of the C++ STL; by all means use it), we can drop right in and get a vector of RECTs and their confidences back:
static std::vector* ParseJsonResults(const std::string& response)
{
auto json = json::parse(response);
std::vector* results = new std::vector();
for (const auto& object : json["objects"])
{
const auto& bbox = object["bbox"];
float confidence = object["prob"];
results->emplace_back(RECT { bbox[0], bbox[1], bbox[2], bbox[3] }, confidence);
}
return results;
}
Note that the boxes are returned in a X/Y/Right/Bottom format. If you need a X/Y/Width/Height format, it’s easily convertible. From then on, the bounding boxes can be passed on throughout the codebase, and improvements of the method over current methods can be measured.
You’ll have to excuse the use of void pointers, pointers to vector, new, and other frowned-upon items. The use of CPR also required an additional problem here. The C++ codebase is in MSVC 11.0, and CPR requires MSVC 14.0 or later. To integrate this, a separate DLL was created and loaded dynamically via LoadLibrary in the main source, so a C API was created. But these are implementation details. And again, it was simply the quickest way to get results.
That’s about it for this post. All-in-all, I believe Luminoth is an underrated, but also unfinished, machine learning tool. It’s a good choice for having a quick way to train, save state, and evaluate neural networks. The API allows high-speed integration of a model into existing code in any language, after which a results analysis can determine if to further productionize the model or not.