Data Science From Scratch To Production MVP Style: Deploy To AWS Lambda

Push Pipeline and Model To AWS S3 Create Buckets AWS S3 is a wonderful managed service that lets us upload files and access them later. We’ll leverage this by uploading our pickled files so our API can fetch them later. First we’ll create two “buckets”, think of these like a folder, one for our pipelines and one for our models. import boto3 session = boto3.Session(profile_name="personal") s3 = session.client("s3") s3.create_bucket(Bucket="data-science-from-scratch-pipeline") s3.create_bucket(Bucket="data-science-from-scratch-model") If you get any errors like:...

April 11, 2020 · 6 min · Greg Hilston

Data Science From Scratch To Production MVP Style: API

Serializing If you’re not familiar with serialization and deserialization, “serialization” is the basic concept is taking an object in memory and converting it into a state that can be written to a file or sent over the network. While “deserialization” is the act of reversing that process to turn the file back into a object that can be used in code again. We’re leverage this so we can package up our model in a way that can be deployed along side our API and not married to it, allowing us to separate the notion of the API and the model....

April 10, 2020 · 2 min · Greg Hilston

Data Science From Scratch To Production MVP Style: Model

Modeling Now this section is of least important so we’re going to be incredibly sloppy here. We’ll perform a simple train test split and create a simple Linear Regression model. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20) model = LinearRegression().fit(X_train, y_train) model.score(X_test, y_test) 0.498875410976377 model.coef_ array([635.1964214]) model.intercept_ 5810.549007860056 pyplot.scatter(X_test, y_test, color = 'red') pyplot.plot(X_train, model.predict(X_train), color = 'blue') pyplot.title('temperature_fahrenheight vs ice_cream_sales_usd (Test set)') pyplot.xlabel('temperature_fahrenheight') pyplot.ylabel('ice_cream_sales_usd') pyplot.show() While our model isn’t great, lets pretend we’re satisfied with it and move on to preparing to wrap our model in an API and getting it into production....

April 9, 2020 · 1 min · Greg Hilston

Data Science From Scratch To Production MVP Style: Pipeline

Goal Of This “Data Science From Scratch To Production MVP Style” Series? The goal of this series is to walk through four steps: Create a pipeline to process raw data Create a simple model to make predictions on future raw data Wrap the pipeline and model in a simple API Deploy the API to production, using AWS Lambda Each step above will have its own blog post. This is the first of the four....

April 8, 2020 · 6 min · Greg Hilston