Kubeflow is an open source machine learning MLOps platform which makes it easy to deploy and manage ML stack on Kubernetes. In this tutorial, I will demonstrate how to create ML pipeline in Kubeflow. We will train and serve image classification model using MNIST dataset. The goal here is to create pipeline for getting data, pre-processing it and creating the model and finally inference the model.

Install Kubeflow

Make sure your system have minimum 8vCPU and 16GB RAM. Also you need to tweak kernel in order to support many pods. More info in kubeflow installation guide here

Make sure to install all kubeflow components. Wait several minutes. Finally you should see all the pods are healthy

$ kubectl get pods -n kubeflow
NAME                                                     READY   STATUS    RESTARTS        AGE
admission-webhook-deployment-67fd864794-hhlr7            1/1     Running   12 (100m ago)   29d
cache-server-5945b96448-ptgk6                            2/2     Running   11 (100m ago)   29d
centraldashboard-5f49c896c7-v8nxz                        2/2     Running   11 (100m ago)   29d
jupyter-web-app-deployment-555bd4c7f6-s7sq5              2/2     Running   11 (100m ago)   29d
katib-controller-5674c8b4d6-kbx6m                        1/1     Running   15 (100m ago)   29d
katib-db-manager-85987474b8-7sfks                        1/1     Running   12 (100m ago)   29d
katib-mysql-c688997bd-4mhgr                              1/1     Running   12 (100m ago)   29d
katib-ui-585dc5766-s4xdl                                 2/2     Running   11 (100m ago)   29d
kserve-controller-manager-5fbbbcdd64-vjfrk               2/2     Running   24 (100m ago)   29d
kserve-localmodel-controller-manager-5fcbb75c44-gp5r2    2/2     Running   11 (100m ago)   29d
kserve-models-web-app-678949ffdd-lfhv2                   2/2     Running   11 (100m ago)   29d
kubeflow-pipelines-profile-controller-699dc67f96-wxwgl   1/1     Running   12 (100m ago)   29d
metacontroller-0                                         1/1     Running   13 (100m ago)   29d
metadata-envoy-deployment-78dc9bd89-8x7j9                1/1     Running   12 (100m ago)   29d
metadata-grpc-deployment-6786fdf748-rwqx2                2/2     Running   21 (100m ago)   29d
metadata-writer-b74948545-l6hrd                          2/2     Running   15 (100m ago)   29d
minio-6d486b66cd-4wb5x                                   2/2     Running   11 (100m ago)   29d
ml-pipeline-65ff55599d-gk4gp                             2/2     Running   15 (100m ago)   29d
ml-pipeline-persistenceagent-c58647ff5-qmqsl             2/2     Running   11 (100m ago)   29d
ml-pipeline-scheduledworkflow-6d8dc9b889-x4kl6           2/2     Running   11 (100m ago)   29d
ml-pipeline-ui-5f96555b97-p82nn                          2/2     Running   11 (100m ago)   29d
ml-pipeline-viewer-crd-5745b89f8f-4hvhb                  2/2     Running   11 (100m ago)   29d
ml-pipeline-visualizationserver-64dbbb8d96-vxcwk         2/2     Running   11 (100m ago)   29d
mysql-6868b5b465-hfnws                                   2/2     Running   11 (100m ago)   29d
notebook-controller-deployment-c4f4fb986-vtqk9           2/2     Running   11 (100m ago)   29d
profiles-deployment-d675596d7-rpdfs                      3/3     Running   22 (100m ago)   29d
pvcviewer-controller-manager-556b9c9586-zn7p2            3/3     Running   22 (100m ago)   29d
spark-operator-controller-6dfb845b84-4bk6v               1/1     Running   14 (100m ago)   29d
spark-operator-webhook-5746fc6666-h9chf                  1/1     Running   13 (100m ago)   29d
tensorboard-controller-deployment-76d7f8f55-zp6cq        3/3     Running   22 (100m ago)   29d
tensorboards-web-app-deployment-66d4b74977-c78hr         2/2     Running   11 (100m ago)   29d
training-operator-7597cf8fcc-dkjjj                       1/1     Running   17 (100m ago)   29d
volumes-web-app-deployment-5698b6c5c9-nwf2x              2/2     Running   11 (100m ago)   29d
workflow-controller-75b848f885-gl57x                     2/2     Running   11 (100m ago)   29d

Explore Kubeflow

In order to access the Kubeflow dashboard, forward the port

kubectl port-forward -n istio-system svc/istio-ingressgateway --address 8000:80

Visit localhost:8000, You should see the login form. Default email is [email protected] and password 12341234

Preparing notebook

Kubeflow Notebook provides web-based environments that run within your Kubernetes cluster inside Pod. Before we create a notebook in kubeflow, we need to allow the notebook to access kubeflow pipeline. Apply the below manifest

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
  name: access-ml-pipeline
  namespace: kubeflow-user-example-com  #YOUR_USER_PROFILE_NAMESPACE
spec:
  desc: Allow access to Kubeflow Pipelines
  selector:
    matchLabels:
      access-ml-pipeline: "true"
  volumes:
    - name: volume-kf-pipeline-token
      projected:
        sources:
          - serviceAccountToken:
              path: token
              expirationSeconds: 7200
              audience: pipelines.kubeflow.org      
  volumeMounts:
    - mountPath: /var/run/secrets/kubeflow/pipelines
      name: volume-kf-pipeline-token
      readOnly: true
  env:
    - name: KF_PIPELINES_SA_TOKEN_PATH
      value: /var/run/secrets/kubeflow/pipelines/token

Now while you are in kubeflow-user-example-com namespace, create the notebook from dashboard Notebooks > New Notebook

Name it anything and make sure to select jupyter-tensorflow-full:v1.10.0

Select at least 2 CPU and 4GB RAM

Under the Advanced Options, make sure to select Allow access to Kubeflow Pipeline config. It won’t show if you don’t apply the above manifest. With this configuration, we will have the access to kubeflow pipeline directly from the notebook.

Now select Launch and wait few minutes. Soon notebook will be ready. Click connect

Explore the notebook

While in notebook, create a terminal and clone the following repo

git clone https://github.com/k4mrul/kubeflow-mnist
cd kubeflow-mnist

Also, make sure to install the following packages

pip install minio==7.2.15
pip install kserve==0.15.2

We will use minio package to upload the model to the MinIO storage (comes with Kubeflow components) and also use kserve package to test our model against test data.

Open digits_recognize.ipynb in jupyter notebook. Notebook includes the following steps:

  • Importing the MNIST handwritten digits dataset
  • Exploring and preparing the data
  • Building and training a model to recognize digits
  • Evaluating model accuracy and visualizing results with a confusion matrix
  • Saving and exporting the trained model to MinIO storage for deployment

Running pipeline

When you are done exploring, open digits_recognize_pipeline.ipynb notebook. This notebook will trigger Kubeflow pipeline which will builds machine learning pipeline for digit recognition using the MNIST dataset. It loads and uploads the data to MinIO storage, reshapes and normalizes it, trains a deep learning model, evaluates its performance, and then deploys the trained model for serving with KServe.

(Note: if you get 401 authorization error even after allowing notebook to access kubeflow pipeline, apply this manifest

If you go to Pipeline > Runs, you should see the pipeline has been triggered and running

If you click on it, you should see all the steps are successfully passed

Testing the model inference

Go to KServe Endpoints. You should see digits-recognizer inference service is ready and healthy

Now we will test the model. Open kserve-test.ipynb notebook. This notebook tests a deployed KServe model for MNIST digit recognition. It sends an example image of the digit “5” to the model’s prediction endpoint, receives the predicted probabilities, and prints out the predicted digit along with its one-hot encoding.

Run the cell (make sure you installed kserve package)

You should see output like this

Actual Number: 5
One-hot: [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0]
Predicted digit: 5

Our trained model can predict the number accurately.

And that’s it, we have successfully configured kubeflow pipeline for training and deploying model.