Subscribe to DSC Newsletter

Do I need Docker to deploy a ML pipeline that only does scheduled training and batch prediction

I am trying to deploy a machine learning workflow on AWS. The most common way is probably to automate it in SageMaker and deploy the model as an endpoint for inference. For my project though, the training needs to be scheduled and runs probably once a week. The prediction only happens every time after the retraining is done and that's it. Therefore I think SageMaker is an overkill. The easier way is probably to train the model and run prediction on an EC2 instance by piecing together a few python scripts. Someone suggested containerizing the pipeline and use ECS. In general the advantage of using Docker is to maintain a fixed environment which can be done with Anaconda virtual environment. With Rest API, real-time inference can be made with docker image which is not required in this case. So the question is "Is docker really necessary for deploying my ML workflow"?

Tags: aws, docker, python

Views: 398

Reply to This

Replies to This Discussion

I guess, yes. It will give you a batch process, version control and proper governance on the models.



  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service