Once model is finalized, how to get it into production i.e. store predictions in data warehouse/s3 -> use Amazon Sagemaker

1 SageMaker Custom Algorithm: Creating an Inference Handler

1.1 How to make trained model and code available for production

Docker: packages up code and it’s dependencies (software, packages, etc) into a docker container image that’s a standalone executable package of software containing everything needed to run an application (i.e. model)

  • Process: create Docker container, tell SageMaker which container to use for predicting

    • Sagemaker starts ec3 instance and downloads Docker container

1.1.1 Creating Docker Container

  • Start with base image, set environment variables, install necessary software, copy code from directory into container, copy/install python package requirements, set variables/run tests

  • Built into CI/CD (continuous integration/continuous delivery) pipeline in Buildkite

    • A container is create for every commit in the repo

    • The git sha is used by Sagemaker to reference the Docker Container to use for the model run

1.1.2 Input Data Sourcing

  • Each query is given 4 files:

    • _ddl.sql - Defines schema

    • _etl.sql - Actual query to run

    • _test.sql - Tests

    • upload_<>_script.rb - Script to run above

2 Testing

A nice overview of Pytest