How to troubleshoot gRPC deployment on Google Kubernetes

deploying grpc on google kubernetes

Deploying a Golang gRPC application with a design that includes Google Cloud Kubernetes Engine requires additional modification to the service and the k8s configurations. Depending on the scenario, configurations are reasonably simple to execute; however, due to the lack of proper logging at the ingress or application level, one may need to learn what is causing an error when deploying the gRPC application. 

I was migrating my services from the GCP VM-based deployment to the one hosted on GKE. I assumed it would be more of a lift-and-shift, but things took a different turn when I encountered 404 errors on ingress for all of my gRPC calls. In this post, I will share my insights on the issue and how I resolved it. 

TL;DR

The backend for deploying HTTP/2-based gRPC applications on the GKE with an L7 load balancer should be TLS-enabled; if that’s not possible, one can look at the option of using an L4 load balancer.

Introduction

The Kubernetes configurations required for deploying a gRPC golang application are no different than the other microservices. At the most superficial level, we create configs for deployment, backend, ingress, etc., and deploy them using kubectl commands, pipelines, or platforms such as argoCD.

For the most part, you have no additional requirements to deploy your application. However, when using a Google L7 load balancer, some specific configurations are required to ensure the load balancer can recognize and route the gRPC HTTP/2 traffic rather than throwing errors. In an error case, your client will throw 5xx errors with a message saying “Unhealthy Upstream” or a timeout exception indicating that ingress cannot recognize and route the traffic.

grpc error 404 GCP ingress

In my situation, ingress could understand the traffic but kept throwing 404 on all requests. Even the gRPCurl utility I used to test my setup threw timeout and connection errors. 

Tip: gRPCurl is a powerful utility that tests everything related to gRPC. It’s a cURL utility but for gRPC. This small write-up covers most of the significant options this unique utility offers. 

Meanwhile, going through a lot of Google Cloud documentation about resolving this error, I stumbled upon a piece explaining why the ingress behaves in a particular manner. It explained how every gRPC application requiring a Google L7 load balancer must have a TLS-enabled backend. If your backend is plaintext, GCP L7 LB will fail to route the requests to your application correctly. 

Deploying gRPC golang service

Now, there are two ways to resolve this. The first is to fix our application and enable TLS on our backend, and the second is to avoid using L7 LB. As an alternative to GCP L7 LB, we can configure Nginx or proxies such as Envoy.

Another way to fix this issue is to use Google’s internal pass-through load balancer or L4 LB. This approach has many caveats and is generally not recommended for production applications; however, it also depends on your application’s purpose and the traffic it is supposed to handle.

Using L4 Load Balancers

The very nature of L4 or internal pass through load balancers is to ensure that traffic from the same source routes to the same backend. If every client is connecting to one of the gRPC pods, then any surge from one of the clients would lead to an uneven distribution of traffic, as the pod attached to the client would only handle that load. 

Considering the above points, if it is still something that can work in your favor, the following is a snippet that can help you integrate the services using an L4 load balancer: 


---
# Source: grpc-app/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: "grpc-app"
  namespace: grpc-app
  annotations:
    networking.gke.io/load-balancer-type: "Internal"
    networking.gke.io/internal-load-balancer-subnet: grpc-app-subnet
  labels:
    helm.sh/chart: grpc-app-0.0.1
    app.kubernetes.io/name: grpc-app
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/version: "1.0.0"
    app.kubernetes.io/managed-by: Helm
spec:
  type: LoadBalancer
  externalTrafficPolicy: Cluster
  loadBalancerIP: <static_ip>
  ports:
    - port: 8443
      targetPort: 8443
      protocol: TCP
      name: http
  selector:
    app.kubernetes.io/name: grpc-app
    app.kubernetes.io/instance: release-name
---

Since we need to mention the IP address in the config, please reserve an IP for yourself from the pool; otherwise, every time you re-create your deployment for any reason, the service, you may or may not get the same IP address. 

Implementing L7 Load Balancer

We can use a GCP L7 load balancer on the exact configuration to ensure the application in question deploys server-side TLS at the minimum. The L7 LB, unlike L4, can help you avoid the potential of overloading one or more pods in the deployment. Because we now have a TLS-enabled backend, we do not need to configure IP address settings unless required, and we can easily manage the SSL certificate as part of the Kubernetes secret manager. 

You can configure your application to use TLS by putting the certificate pair as a credential in the golang server options. There are more than one configurations you can choose to enable TLS on your application, but for our use case having TLS on the server side is good enough. I have written a post on integrating and testing it using the gRPCurl utility. 

Note: As a part of the best practices, we should not store a certificate pair in our application on GitHub. Instead, we should use a secret server to fetch those certificates during run time. 

Conclusion:

  • We need TLS enabled backend for L7 Google Load Balancer to route the traffic.
  • Otherwise, we need to switch to the L4 load balancer, reserve the IP
  • L4 is not recommended in this case, as there are load-balancing issues when using it for gRPC applications.
author avatar
Tushar Sharma