Cilium is an advanced, eBPF-based CNI plugin for Kubernetes which offers direct routing, eliminates the needs of kube-proxy & iptables, network policies, load balancing, and observability (real-time network traffic monitoring and tracing with Hubble).

One of the notable feature of Cilium is its deep integration of eBPF, which facilitates advanced networking capabilities, such as direct routing, without the performance penalties associated with overlay networks. This results in better scalability and improved resource usage, especially for large-scale clusters.

Cilium also replaces the need for separate solutions like MetalLB, a bare-metal load balancing solution. In this tutorial, we will walk you through setting up Cilium’s IPAM feature to expose a local service type LoadBalancer which works in conjunction with BGP.

BGP is used to exchange routing information. It allows networks to learn and advertise routes to reach different destinations. We will setup BGP in our router in order to advertise routing information from Cilium.

Prerequisite

  • Local Kubernetes cluster without CNI network plugin installed
  • Install Cilium Cli on you host machine.
  • Use a single host subnet. In our case, 192.168.8.0/24 which will be using for all the nodes.
  • Router based on OpenWrt or MikroTik.

Install Cilium & manifests

In my setup, I don’t have a CNI plugin. Without it, the nodes and pods won’t work, so the nodes show as NotReady.

$ kubectl get nodes -owide
NAME       STATUS      ROLES           AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
kmaster    NotReady    control-plane   158m   v1.31.1   192.168.8.220   <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.7.22
kworker1   NotReady    <none>          157m   v1.31.1   192.168.8.221   <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.7.22
kworker2   NotReady    <none>          156m   v1.31.1   192.168.8.222   <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.7.2

Some pods are in Pending state.

$ kubectl get pods -n kube-system
NAME                              READY   STATUS    RESTARTS   AGE
coredns-7c65d6cfc9-hgsgb          0/1     Pending   0          3m6s
coredns-7c65d6cfc9-hmgtx          0/1     Pending   0          3m6s
etcd-kmaster                      1/1     Running   0          3m12s
kube-apiserver-kmaster            1/1     Running   0          3m12s
kube-controller-manager-kmaster   1/1     Running   0          3m12s
kube-proxy-59mgv                  1/1     Running   0          3m6s
kube-proxy-hsffx                  1/1     Running   0          116s
kube-proxy-qpdt6                  1/1     Running   0          39s
kube-scheduler-kmaster            1/1     Running   0          3m12s

Let’s fix it by installing Cilium CNI

Install Cilium CNI

git clone [email protected]:cilium/cilium.git
cd cilium
cilium install --chart-directory ./install/kubernetes/cilium --set bgpControlPlane.enabled=true

The above command will install Cilium and enable the BGP control panel

Wait few minutes and recheck the status again

$ kubectl get nodes -owide
NAME       STATUS   ROLES           AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
kmaster    Ready    control-plane   258m   v1.31.1   192.168.8.220   <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.7.22
kworker1   Ready    <none>          257m   v1.31.1   192.168.8.221   <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.7.22
kworker2   Ready    <none>          256m   v1.31.1   192.168.8.222   <none>        Ubuntu 22.04.1 LTS   5.15.0-58-generic   containerd://1.7.2

All the nodes now in Ready state. Pods status are also Running

$ kubectl get pods -n kube-system
NAME                               READY   STATUS    RESTARTS   AGE
cilium-6x9k8                       1/1     Running   0          7m36s
cilium-envoy-4d96r                 1/1     Running   0          7m36s
cilium-envoy-fk25s                 1/1     Running   0          7m36s
cilium-envoy-h78k6                 1/1     Running   0          7m36s
cilium-jqt7b                       1/1     Running   0          7m36s
cilium-operator-864795745c-xqr64   1/1     Running   0          7m36s
cilium-vbpcd                       1/1     Running   0          7m36s
coredns-7c65d6cfc9-58d8z           1/1     Running   0          14m
coredns-7c65d6cfc9-tlwmv           1/1     Running   0          14m
etcd-kmaster                       1/1     Running   0          14m
kube-apiserver-kmaster             1/1     Running   0          14m
kube-controller-manager-kmaster    1/1     Running   0          14m
kube-proxy-mpq5d                   1/1     Running   0          12m
kube-proxy-mxzlt                   1/1     Running   0          14m
kube-proxy-snhqm                   1/1     Running   0          11m
kube-scheduler-kmaster             1/1     Running   0          14m

Install BGP manifests

Create CiliumBGPAdvertisement resource which specifies which CIDRs and service should be advertised via BGP.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPAdvertisement 
metadata:
  name: pod-cidr
  labels:
    advertise: pod-cidr
spec:
  advertisements:
    - advertisementType: "PodCIDR"
    - advertisementType: "Service"
      service:
        addresses:         
          - LoadBalancerIP
      selector:            
        matchExpressions:
         - {key: somekey, operator: NotIn, values: ['never-used-value']}   #Select all the LoadBalancer services to be available externally   

Create CiliumBGPPeerConfig resource which configures how a peering behaves, and can refer to one or multiple CiliumBGPAdvertisement resources (using selectors)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPPeerConfig 
metadata:
  name: peer-config
spec:
  families:
    - afi: ipv4
      safi: unicast
      advertisements:
        matchLabels:
          advertise: "pod-cidr"

Finally, create CiliumBGPClusterConfig resource which sets up the peering endpoints, and can refer to one or multiple CiliumBGPPeerConfig resources (using resource names). At line 8, we are specifying the node label. BGP peering will be applied to the nodes which have rack=rack0 label. Also at line 15, specify your router IP address.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
apiVersion: "cilium.io/v2alpha1"
kind: CiliumBGPClusterConfig 
metadata:
  name: bgp-cluster-config
spec:
  nodeSelector:
    matchLabels:
      rack: rack0  #Should match the label with nodes
  bgpInstances:
    - name: "instance-64512"
      localASN: 64512
      peers:
        - name: "peer-64512-rack0"
          peerASN: 64512
          peerAddress: "192.168.8.1"  #Router IP address
          peerConfigRef:
            name: "peer-config"

Label the worker nodes

Next, we will label the nodes where our BGP policy will be applied

kubectl label nodes kworker1 rack=rack0
kubectl label nodes kworker2 rack=rack0

Check the peering status

Run cilium bgp peers. You should see the peering information

$ cilium bgp peers
Node       Local AS   Peer AS   Peer Address   Session State   Uptime   Family         Received   Advertised
kworker1   64512      64512     192.168.8.1    active          0s       ipv4/unicast   0          0    
kworker2   64512      64512     192.168.8.1    active          0s       ipv4/unicast   0          0 

Nodes session state is active however, since we haven’t configured our upstream router to peer with the nodes, the connection hasn’t been established yet.

LB IPAM

In our bare-metal Kubernetes cluster, we use Cilium to assign IP addresses to services. Cilium uses BGP Control Plane to advertise the IP addresses assigned by LB IPAM over BGP and advertise them locally.

Let’s apply Load balancer IPAM manifest:

1
2
3
4
5
6
7
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: "lb-pool"
spec:
  blocks:
    - cidr: "20.0.10.0/24"

Cilium will allocate LoadBalancer IP from 20.0.10.0/24 cidr. IP Pools are not allowed to have overlapping CIDRs.

$ kubectl get CiliumLoadBalancerIPPool lb-pool
NAME      DISABLED   CONFLICTING   IPS AVAILABLE   AGE
lb-pool   false      False         256             10s

CONFLICTING status should be false.

Install quagga package on OpenWRT router

Quagga is an opensource software suite which provides BGPv4 for Unix platforms. Let’s install the quagga package in our upstream router. I will setup quagga to my GL-AX1800 router which is based on OpenWrt. SSH and run following command:

opkg update
opkg install quagga quagga-bgpd quagga-zebra

Modify the bgp config file

Edit the /etc/quagga/bgpd.conf replace the config with the following. Make sure to replace router-id (router’s IP address) and neighbor’s IP which are the node’s internal IP addresses

1
2
3
4
5
password zebra
router bgp 64512
 bgp router-id 192.168.8.1
 neighbor 192.168.8.221 remote-as 64512
 neighbor 192.168.8.222 remote-as 64512

Save and exit. Restart and enable quagga service

/etc/init.d/quagga restart
/etc/init.d/quagga enable

Test

BGP peering status

Let’s check the BGP peering status again, after some minutes, if you run cilium bgp peers command, you should see the status established

$ cilium bgp peers
Node       Local AS   Peer AS   Peer Address   Session State   Uptime   Family         Received   Advertised
kworker1   64512      64512     192.168.8.1    established     1m17s    ipv4/unicast   0          1
kworker2   64512      64512     192.168.8.1    established     25s      ipv4/unicast   0          1

From router side, running ip route, you should see the corresponding routes in route table.

1
2
3
4
5
6
7
8
9
root@GL-AX1800:~# ip route
default via 172.16.100.39 dev pppoe-wan proto static metric 10
default via 192.168.5.1 dev pppoe-Zoom proto static metric 20
10.0.1.0/24 via 192.168.8.222 dev br-lan proto zebra metric 20
10.0.2.0/24 via 192.168.8.221 dev br-lan proto zebra metric 20
20.0.10.0 via 192.168.8.221 dev br-lan proto zebra metric 20
172.16.100.39 dev pppoe-wan proto kernel scope link src 103.181.42.28
192.168.5.1 dev pppoe-Zoom proto kernel scope link src 10.28.33.116
192.168.8.0/24 dev br-lan proto kernel scope link src 192.168.8.1

Expose a pod service with type LoadBalancer

Now let’s create a pod with a service type LoadBalancer. Apply the following manifest

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  namespace: default
  labels:
    app: nginx-pod
spec:
  containers:
  - name: nginx-container
    image: nginx:latest
    ports:
    - containerPort: 80
---
# service.yaml
apiVersion: v1
kind: Service
metadata:
  namespace: default
  name: nginx-service
spec:
  selector:
    app: nginx-pod
  ports:
  - protocol: TCP
    port: 8080
    targetPort: 80
  type: LoadBalancer

After applying above manifest, you should see the service gets LoadBalancer IP

$ kubectl get pods -n default
NAME            TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
kubernetes      ClusterIP      10.96.0.1        <none>        443/TCP             1h
nginx-service   LoadBalancer   10.105.189.103   20.0.10.0     8080:31586/TCP      1h

Cilium assigned an external IP for our load balancer from the IP pool and advertised the route through BGP. Wait a few minutes for the route to update

If you visit http://20.0.10.0:8080 from your network, you should see the famous Welcome to nginx page!

Thanks for reading 😀