Data Science From Scratch To Production MVP Style: API

This post is a part of the Data Science From Scratch To Production MVP Style series.

If you’re not familiar with serialization and deserialization, “serialization” is the basic concept is taking an object in memory and converting it into a state that can be written to a file or sent over the network. While “deserialization” is the act of reversing that process to turn the file back into a object that can be used in code again.

We’re leverage this so we can package up our model in a way that can be deployed along side our API and not married to it, allowing us to separate the notion of the API and the model.

1
2
with open("api/artifact/pipe.dill", 'wb') as f:
    dill.dump(pipe, f)
1
2
with open("api/artifact/model.dill", 'wb') as f:
    dill.dump(model, f)

Now we’re ready to write a small generic API that can be passed a pipeline and model to be interacted with over HTTP using REST.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import dill
import requests
import pandas as pd
from flask import Flask, request, jsonify


app = Flask(__name__)

with open("api/artifact/pipe.dill", 'wb') as f:
    dill.dump(pipe, f)

with open("api/artifact/model.dill", 'wb') as f:
    dill.dump(model, f)

@app.route("/predict", methods=["post"])
def predict():
    raw_json = request.get_json(force=True)
    flat_table_df = pd.json_normalize(raw_json)
    processed = pipe.transform(flat_table_df)
    return str(model.predict(processed)[0])

One thing to point out is our API only deserializes the pipe.dill and model.dill upon launch. This is a benefit as it will be faster to respond to requests after initial boot but must be restarted if a newer pipe.dill or model.dill file are provided.

The above code creates a single endpoint that can be interacted with over HTTP REST, making a GTE request with a JSON body. An example of that using curl would look like:

1
!curl --request POST -H "Content-Type: application/json" --data '{"temperature_celsius": 5.004}' "127.0.0.1:5000/predict"

This is a post in the Data Science From Scratch To Production MVP Style series.
Other posts in this series: