• Kloudnative
  • Posts
  • Untangling the Kubernetes Networking Puzzle: Mastering Load Balancing, BGP, IPVS, and More

Untangling the Kubernetes Networking Puzzle: Mastering Load Balancing, BGP, IPVS, and More

A Guide to Navigating the Complexities of Kubernetes Network Infrastructure

In partnership with

Ever wonder how a bustling city keeps traffic flowing smoothly? Imagine Kubernetes as a city where applications (pods) are constantly on the move, needing efficient routes to communicate with each other and the outside world. Without a solid transportation system, chaos ensues — and the same is true for Kubernetes networking.

Think of roads as the network, traffic rules as routing protocols, bus stops as services, and dispatch centers as load balancers. Each piece plays a crucial role in making sure data gets where it needs to go.

At the heart of this system lies the Container Network Interface (CNI) — the city planning department. The CNI sets the rules for how everything connects, ensuring pods can communicate seamlessly, no matter which node they’re running on.

Choosing a CNI implementation like Calico, Cilium, or Flannel is like deciding on your city’s infrastructure. Will you build wide highways for high-speed traffic, or a secure network of monitored lanes? Your choice impacts performance, security, and scalability — just like a city’s road layout affects traffic flow and accessibility.

A poorly planned city leads to gridlock; a poorly planned Kubernetes network leads to bottlenecks. Let's dive in and design the network that keeps your cluster running like a well-oiled machine.

Kloudnative is committed to staying free for all our users. We kindly encourage you to explore our sponsors to help support us.

There’s a reason 400,000 professionals read this daily.

Join The AI Report, trusted by 400,000+ professionals at Google, Microsoft, and OpenAI. Get daily insights, tools, and strategies to master practical AI skills that drive results.

☝️ Support Kloudnative by clicking the link above to explore our sponsors!

The Big Picture of Kubernetes Networking

Kubernetes networking can feel like a maze, with countless websites and articles making it tough to piece everything together. I've done that work for you, and now I want to simplify it.

In this guide, we’ll quickly and clearly explore how these key concepts connect and integrate within Kubernetes:

  • Load balancing

  • IPVS

  • iptables

  • BGP

  • Bridge

  • CNI

  • PureLB

  • Endpoints

  • Services (svc)

  • Overlay and underlay networks

  • IP-in-IP (ipip)

  • kube-proxy

  • Ingress controllers

Let's break it down — fast and step-by-step.

1. The Relationship Between CNI, Load Balancer Controller, and Kube-Proxy

Container Network Interface (CNI)

The CNI is responsible for configuring Kubernetes networking by creating and setting up a network interface for each container. When a pod is launched, the Kubelet calls the CNI to set up these interfaces and assign IP addresses to the containers.

CNI operates under two primary networking models: Encapsulated (Overlay) and Unencapsulated (Underlay).

Encapsulated (Overlay) Model

This model uses technologies like VXLAN and IPSec to create a Layer 2 network over a Layer 3 infrastructure. Essentially, it's a way of spanning the network across multiple Kubernetes nodes without needing to distribute routing information.

  • Layer 2 is isolated, making routing distribution unnecessary.

  • An additional IP header is created to encapsulate the original IP packet.

  • This approach creates a virtual bridge connecting worker nodes and pods.

  • The Container Runtime Interface (CRI) manages communication between containers and nodes.

Use Case: Overlays are ideal for environments where simplicity in routing and isolation are priorities.

Unencapsulated (Underlay) Model

In the underlay model, the network operates at Layer 3 (L3), where packets are routed directly between containers.

  • Worker nodes manage route distribution using BGP (Border Gateway Protocol) to share pod routing information dynamically.

  • Unlike overlays, no additional encapsulation happens — packets are routed natively between nodes.

  • This model extends the network router between worker nodes to handle inter-pod traffic.

Use Case: Underlays are beneficial when performance and direct routing are critical, and BGP-based routing can be effectively managed.

How CNI, LB Controller, and Kube-Proxy Work Together

  1. CNI sets up the fundamental network interface and IP addresses for each container.

  2. Kube-Proxy manages routing rules and ensures that network traffic is efficiently forwarded to the appropriate pods.

  3. Load Balancer Controllers distribute external traffic among multiple pods to ensure high availability and scalability.

Together, these components form the backbone of Kubernetes networking, ensuring seamless communication within and outside the cluster.

Load Balancer Controllers: MetalLB, PureLB, and Their Role in Kubernetes

Load Balancer Controllers like MetalLB and PureLB provide the functionality needed for Kubernetes LoadBalancer service types. When a LoadBalancer service is created, these controllers assign an external IP, which is then added as a secondary address to the primary network interface. This configuration allows the BGP BIRD router to capture the external IP and dynamically update routes, addresses, and other network settings.

Behavior When a New IP is Assigned

  1. Overlay Network (Encapsulated):

    • IPVS or iptables handles traffic distribution and forwarding.

  2. Underlay Network (Unencapsulated):

    • In addition to IPVS or iptables, a routing table is updated to manage direct routes between nodes.

In both cases, the network configuration involves elements like network policies, NAT (Network Address Translation), and forwarding rules to ensure proper traffic flow.

Example:

When you create a Service (svc), the kube-proxy component adds the necessary rules to iptables for traffic routing.

Key Points to Know

  • netfilter (the packet filtering framework behind iptables) can be replaced by eBPF for more efficient and dynamic packet processing.

  • iptables can be replaced by IPVS for more advanced and scalable load balancing.

Summary

  • Kube-Proxy: Maintains routing rules using iptables, IPVS, and other mechanisms to manage service traffic.

  • CNI: Provides a consistent interface to the underlying network, routes traffic to the correct destinations, and performs network configuration tasks.

  • Load Balancer Controllers: Assign external IPs for LoadBalancer services, handle load balancing functionality, and update the host network interface with secondary IP addresses.

2. POD to POD / Container to Container — single node (IP addr based)

In Kubernetes, networking between pods or containers on the same node is facilitated by a combination of components like Custom Bridge (CBR), Veth (Virtual Ethernet) pairs, and the Ethernet interface (eth). The entire networking setup is managed by container runtimes such as containerd, CRI-O, and Mirantis. These runtimes leverage Container Networking Interface (CNI) plugins like Calico, Flannel, and Cilium to handle networking configuration.

Key Concepts

  1. Shared Network Namespace:
    All containers within a pod share the same network namespace. This allows them to communicate with each other using localhost and share the same IP address.

  2. Pause Container:
    The pause container is a foundational container in Kubernetes responsible for managing the pod’s network setup and inter-process communication (IPC). It serves as the parent container holding the network namespace for all other containers within the pod.

  3. Veth (Virtual Ethernet):
    Each pod is assigned a Veth pair — a virtual network interface. One end of the Veth pair is attached to the pod, and the other end is connected to the Custom Bridge (CBR).

  4. Custom Bridge (CBR):
    The CBR functions as a Layer 2 (L2) bridge, facilitating packet routing between pods on the same node. When pod1 communicates with pod2 on the same node, the traffic passes through the CBR, and Network Address Translation (NAT) does not occur.

Flow Example

  • Pod1 sends a packet to Pod2 on the same node.

  • The packet travels via the Veth interface to the CBR.

  • The CBR performs Layer 2 routing, delivering the packet directly to Pod2.

This setup ensures efficient, low-latency communication between pods on the same node.

3. POD to POD / Container to Container — multi node (IP address based)

In a multi-node Kubernetes cluster, communication between pods on different nodes is achieved through a well-orchestrated network setup managed by the Container Network Interface (CNI). Here’s a breakdown of how it works and the key components involved.

How a Pod’s IP Address is Routable Across Nodes

  1. Nodes on the Same Network:

    • For communication to work seamlessly, all nodes in the cluster are typically part of the same network and can reach each other directly.

  2. Routes Created by CNI:

    • The CNI plugin (e.g., Calico, Flannel, Cilium) dynamically creates and maintains routes for each pod on each node. These routes allow traffic to be directed to the appropriate node based on the pod’s IP address.

  3. Node-Specific CIDRs:

    • Each node is assigned a unique CIDR block (a range of IP addresses) for its pods. This segmentation makes it possible to determine which node a particular pod belongs to by looking at the pod’s IP address.

Routing Traffic Between Pods on Different Nodes

  1. Routing Table:

    • When a pod on node-1 (e.g., pod1) wants to communicate with a pod on node-2 (e.g., pod4), node-1’s routing table directs the packet to the appropriate interface.

  2. No MAC Address in CBR:

    • Since the Custom Bridge (CBR) on node-1 doesn’t know the MAC address of pod4 (on node-2), the packet is sent out through the specified interface. This interface could be:

      • A network tunnel (e.g., VXLAN or IP-in-IP)

      • eth0 (the primary network interface)

      • Another network interface depending on the CNI configuration

  3. Tunneling and Encapsulation:

    • For encapsulated (overlay) networks, protocols like VXLAN or IPSEC are used to create tunnels between nodes.

    • For unencapsulated (underlay) networks, routing is handled directly at Layer 3 (L3), often with BGP (Border Gateway Protocol) managing the distribution of routing information.

Example Flow

  1. pod1 (on node-1) sends a packet to pod4 (on node-2).

  2. The routing table on node-1 directs the packet to the correct interface (tunnel or eth0).

  3. The packet travels across the network to node-2.

  4. On node-2, the packet is delivered to pod4 via its Veth interface and CBR.

Summary

  • Same Network: Nodes must be able to communicate directly.

  • CNI: Creates and manages pod routes across nodes.

  • Node CIDR: Each node has a unique IP range for its pods.

  • Routing Mechanism: Uses tunnels or direct interfaces depending on the CNI setup.

This architecture ensures efficient and scalable communication between pods, regardless of which node they reside on.

4. POD to POD / Container to Container — multi node (Service IP-based)

In Kubernetes, services play a crucial role in abstracting and load-balancing communication between pods. When pods across multiple nodes communicate via a Service IP, components like kube-proxy, iptables/IPVS, and CoreDNS work together to handle traffic efficiently.

How It Works: The Role of iptables/IPVS and Kube-Proxy

  1. Service IP Translation:

    • In Kubernetes, when a request is sent to a Service IP, it needs to be routed to one of the associated pod IP addresses.

    • Netfilter (via iptables or IPVS) handles this translation by selecting a destination pod IP based on the load-balancing algorithm (e.g., round-robin, least connections).

  2. Kube-Proxy's Role:

    • Kube-proxy is responsible for updating Netfilter rules, ensuring that Service IP addresses are mapped to the correct pod IP addresses dynamically.

    • When a node receives a packet with a Service IP as the destination, Netfilter matches the rule for that service and routes the packet to one of the backend pod IPs.

Endpoint Slices and Service Selectors

  • Endpoint Slices contain the IP addresses of pods that match a service’s selector (defined by labels).

  • When a Service is created, Kubernetes monitors pods that match the service selector and updates the endpoint slices with relevant pod information, such as:

    • Pod IP addresses

    • Ports

    • Protocols

Example Flow:

  1. A Service selects pods based on matching labels.

  2. The endpoint slice is updated with the selected pod IP addresses.

  3. Kube-proxy uses this information to create/update iptables/IPVS rules.

How DNS Resolves Service Names

  1. Service Discovery:

    • In addition to IP-based routing, Kubernetes supports name-based resolution for services.

    • CoreDNS (or KubeDNS) acts as the cluster’s DNS server and assigns a static ClusterIP to itself.

  2. Name Resolution Inside Containers:

    • Each container is configured with this DNS server at startup.

    • When a pod tries to reach a service by name (e.g., my-service.default.svc.cluster.local), the DNS query is handled by CoreDNS, which resolves the name to the correct Service IP.

Example:

  • When a container calls a service by its name, CoreDNS resolves it to the corresponding Service IP.

  • Netfilter then routes the request to an appropriate pod IP based on the rules set by kube-proxy.

Summary

  • iptables/IPVS: Translates Service IPs to backend pod IPs using Netfilter rules.

  • Kube-proxy: Updates Netfilter rules to keep Service-to-Pod mappings current.

  • Endpoint Slices: Track pod IPs for services based on label selectors.

  • CoreDNS: Resolves service names to Service IPs for seamless name-based communication.

This dynamic system allows Kubernetes services to efficiently load-balance traffic, abstract pod IPs, and support both IP-based and name-based routing within a cluster.

Are you ready for the BIG picture? It’s time.

Integrating Calico (CNI), IPVS, PureLB, IPIP (Overlay), and Ingress Controller: Here’s a complete breakdown of how these components interact within a Kubernetes cluster, forming a cohesive networking architecture.

1. Role of Calico (CNI)

  • Calico handles all networking operations within the cluster, supporting both:

    • Overlay Networking (IPIP): Uses IP-in-IP tunneling to connect pods across different nodes.

    • Underlay Networking (BGP): Uses Border Gateway Protocol (BGP) to manage routing distribution between nodes.

  • CBR (Custom Bridge): In this setup, pod IP addresses are assigned to the kube-ipvs bridge interface.

  • Each pod has a virtual interface connecting it to the bridge.

  • tunl0: A virtual interface for IPIP tunnels, allowing pods’ IP traffic to traverse nodes seamlessly.

2. PureLB (Load Balancer Controller)

  • PureLB provides the LoadBalancer service type functionality in Kubernetes.

  • It creates a virtual interface called kube-lb0 to manage LoadBalancer IPs.

  • Secondary IP Addresses: PureLB adds external LoadBalancer IPs to the primary network interface.

  • BGP Compatibility: Since Calico already implements a BGP BIRD router, PureLB integrates seamlessly and doesn’t need to deploy another BGP router.

  • Routing: BGP dynamically updates routing tables with IPs assigned to interfaces.

3. IPVS (IP Virtual Server)

  • IPVS replaces iptables for efficient load balancing and routing.

  • All Services, Endpoints, NodePorts, and LoadBalancers are essentially rules in IPVS.

  • When a service IP is called, IPVS translates it to the corresponding backend pod IP based on load-balancing rules.

4. Packet Flow Across Nodes

  1. Pod-to-Pod Communication (Across Nodes):

    • When pod-1 on node-1 wants to reach pod-5 on node-2:

      • kube-ipvs handles local routing.

      • If IPVS doesn’t have the necessary rule, the BGP routing table provides the route.

      • Traffic is forwarded via the tunl0 (IPIP) interface due to the overlay network.

  2. Service-Based Routing:

    • When a Service IP is called, IPVS routes the request to the correct pod IP based on the service’s endpoint slice.

  3. Ingress Controller:

    • An Ingress Controller is exposed via a LoadBalancer service managed by PureLB.

    • When a request hits the external LoadBalancer IP:

      • PureLB forwards it to the appropriate node.

      • IPVS routes the traffic to the associated NodePort.

      • The NodePort maps to a ClusterIP, which resolves to the Ingress Controller pods.

      • The Ingress Controller then forwards the request to the target service and ultimately to the correct pod.

End-to-End Workflow

  1. External Request → Hits the LoadBalancer IP (PureLB).

  2. PureLB forwards the request to a node.

  3. IPVS directs the request to the NodePort.

  4. NodePort maps to the ClusterIP of the Ingress Controller.

  5. Ingress Controller routes the request to the appropriate service.

  6. Service forwards the request to a backend pod IP.

  7. BGP and tunl0 handle pod-to-pod communication if the pod is on another node.

Summary of Components

  • Calico: CNI handling pod networking via overlay (IPIP) and underlay (BGP).

  • tunl0: IPIP tunnel interface connecting nodes.

  • PureLB: LoadBalancer controller managing LoadBalancer IPs and integrating with BGP.

  • IPVS: Efficient service routing and load balancing.

  • Ingress Controller: Manages external HTTP/HTTPS traffic into the cluster.

This architecture ensures smooth communication, efficient routing, and load balancing across pods, services, and nodes within the Kubernetes cluster.

Conclusion

In this article, we explored the intricate networking mechanisms within Kubernetes, focusing on key components like CNI, Calico, IPVS, PureLB, IPIP, and the Ingress Controller. These elements work together to ensure seamless networking, routing, and load balancing across pods, services, and nodes in a Kubernetes cluster.

  • CNI (Container Networking Interface), exemplified by Calico, provides a unified interface for pod-to-pod communication, supporting both overlay (IPIP) and underlay (BGP) networking models.

  • IPVS enables efficient service routing, replacing traditional iptables for better scalability and performance in handling service requests.

  • PureLB, the LoadBalancer controller, integrates with BGP to manage external IPs and ensures smooth load balancing across nodes and services.

  • IPIP creates a virtual network tunnel, facilitating overlay networking and ensuring communication across different nodes without complex configuration.

  • Ingress Controllers, exposed via LoadBalancer services, manage external HTTP/HTTPS traffic, ensuring that it reaches the correct services and pods efficiently.

Together, these technologies form a robust, scalable, and efficient networking system in Kubernetes. Understanding how they interact allows Kubernetes clusters to handle large-scale, dynamic workloads, enabling developers to focus on application logic while Kubernetes handles networking and traffic management efficiently.