Data Science From Scratch To Production MVP Style: API

April 10, 2020
data science production flask api
Estimated Reading Time: 2 minute(s)

This is a post in the Data Science From Scratch To Production MVP Style series


Serializing

If you’re not familiar with serialization and deserialization, “serialization” is the basic concept is taking an object in memory and converting it into a state that can be written to a file or sent over the network. While “deserialization” is the act of reversing that process to turn the file back into a object that can be used in code again.

We’re leverage this so we can package up our model in a way that can be deployed along side our API and not merried to it, allowing us to separate the notion of the API and the model.

with open("api/artifact/pipe.dill", 'wb') as f:
    dill.dump(pipe, f)
with open("api/artifact/model.dill", 'wb') as f:
    dill.dump(model, f)

API

Now we’re ready to write a small generic API that can be passed a pipeline and model to be interacted with over HTTP using REST.

import dill
import requests
import pandas as pd
from flask import Flask, request, jsonify


app = Flask(__name__)

with open("api/artifact/pipe.dill", 'wb') as f:
    dill.dump(pipe, f)

with open("api/artifact/model.dill", 'wb') as f:
    dill.dump(model, f)

@app.route("/predict", methods=["post"])
def predict():
    raw_json = request.get_json(force=True)
    flat_table_df = pd.json_normalize(raw_json)
    processed = pipe.transform(flat_table_df)
    return str(model.predict(processed)[0])

One thing to point out is our API only deserializes the pipe.dill and model.dill upon launch. This is a benefit as it will be faster to respond to requests after initial boot but must be restarted if a newer pipe.dill or model.dill file are provided.

The above code creates a single endpoint that can be interacted with over HTTP REST, making a GTE request with a JSON body. An example of that using curl would look like:

!curl --request POST -H "Content-Type: application/json" --data '{"temperature_celsius": 5.004}' "127.0.0.1:5000/predict"

This is a post in the Data Science From Scratch To Production MVP Style series.
Other posts in this series:

Data Science From Scratch To Production MVP Style: Deploy To AWS Lambda

April 11, 2020
data science production flask api aws s3 lambda

How To Use Jupyter Notebook As Hugo Blog Post

April 10, 2020
data science hugo

Data Science From Scratch To Production MVP Style: Model

April 9, 2020
data science production scikit-learn model
comments powered by Disqus