KServe
Kubernetes
Model Serving
KServe on Kubernetes: Serving ML Models in Production
Learn how to install and configure KServe on Kubernetes for production ML model serving — InferenceService, autoscaling, and canary deployments.
February 23, 2026Luca Berton
What Is KServe?
KServe (formerly KFServing) is a Kubernetes-native platform for serving ML models. It provides:
- Serverless inference — scale to zero when idle
- Autoscaling — handle traffic spikes automatically
- Canary deployments — roll out new model versions safely
- Multi-framework support — TensorFlow, PyTorch, scikit-learn, XGBoost
Installing KServe
Prerequisites
- A running Kubernetes cluster (Kind, Minikube, or cloud)
- kubectl configured
Install with kubectl
bashkubectl apply -f https://github.com/kserve/kserve/releases/download/v0.12.0/kserve.yaml
Verify the installation:
bashkubectl get pods -n kserve kubectl get pods -n kserve | grep kserve-controller
Creating an InferenceService
Here's a minimal InferenceService for an MLflow model:
yamlapiVersion: serving.kserve.io/v1beta1 kind: InferenceService metadata: name: wine-quality-model spec: predictor: model: modelFormat: name: mlflow storageUri: "gs://your-bucket/mlflow-model"
Deploy it:
bashkubectl apply -f inference-service.yaml
Check readiness:
bashkubectl get inferenceservice wine-quality-model
Sending Inference Requests
bashMODEL_NAME=wine-quality-model SERVICE_HOSTNAME=$(kubectl get inferenceservice $MODEL_NAME \ -o jsonpath='{.status.url}' | cut -d"/" -f3) curl -v -H "Host: ${SERVICE_HOSTNAME}" \ http://localhost:8080/v2/models/$MODEL_NAME/infer \ -d '{ "inputs": [{ "name": "input", "shape": [1, 13], "datatype": "FP32", "data": [7.4, 0.7, 0.0, 1.9, 0.076, 11.0, 34.0, 0.9978, 3.51, 0.56, 9.4, 5.0, 6.0] }] }'
Autoscaling
KServe automatically scales based on traffic:
yamlspec: predictor: minReplicas: 1 maxReplicas: 10 model: modelFormat: name: mlflow storageUri: "gs://your-bucket/model"
Canary Deployments
Roll out a new model version to 20% of traffic:
yamlspec: predictor: canaryTrafficPercent: 20 model: modelFormat: name: mlflow storageUri: "gs://your-bucket/model-v2"
Monitoring
Check pod health and logs:
bashkubectl get pods -l serving.kserve.io/inferenceservice=wine-quality-model kubectl logs -l serving.kserve.io/inferenceservice=wine-quality-model
Learn More
Get hands-on experience deploying models with KServe in our MLflow for Kubernetes course — from training to production inference.
Ready to Learn by Doing?
Go beyond blog posts with hands-on video courses. Build real projects with Docker, Ansible, Node.js, and more.