What is Docker for ML?
Docker packages your ML code with its exact dependencies — Python version, libraries, CUDA — into an image that runs identically anywhere. For ML, where environments are notoriously fragile, it is how you make training and serving reproducible across your laptop, a GPU server, and the cloud.
Why it matters
ML environments break constantly: a CUDA mismatch, a library version, a missing system package. "It trained on my machine" is a real and painful problem. Containers freeze the environment so a model that runs today runs the same next month and in production. It is the bridge from notebook to deployment.
What to learn
- Why ML environments are so fragile
- Writing a Dockerfile for a Python ML project
- Base images with the right Python and CUDA
- Pinning dependencies for reproducibility
- Keeping large model files and data out of the image
- GPU access from containers
- Separate images for training and serving
Common pitfall
Baking large datasets or model weights into the image, producing multi-gigabyte images that are slow to build and push. Data and weights belong in mounted volumes or object storage, not in image layers. Keep the image to code and dependencies, and bring the heavy artifacts in at run time.
Resources
Primary (free):
- Docker — Get started · docs
- Docker — GPU support · docs
- NVIDIA — Container toolkit · docs
Practice
Write a Dockerfile for a small training script with pinned dependencies, keeping the dataset outside the image via a mounted volume. Build it and run the training in the container. If you have a GPU, confirm the container can use it. Done when the same image trains the model on a different machine.
Outcomes
- Containerize an ML project with pinned dependencies.
- Keep data and weights out of the image.
- Give a container GPU access.
- Reproduce a training environment across machines.