As a globally-used system with millions of developers employing it as their primary building environment, Kubernetes is one of the most well-known tools for container management in the world. With its background of being developed by Google, it’s not difficult to see why, with strong foundations and incredible documentation, making this one of the best tools for building.
Of the total user base of Kubernetes, over 64% of all developers claim that monitoring is the key to constructing a successful environment. Within the sphere of monitoring, the most important metrics to follow and understand are the four golden signals.
Within the Kubernetes ecosystem, there are four general signals that relay information about certain nodes back to the developer. As even one ecosystem could potentially have hundreds, if not thousands, of nodes, Kubernetes doesn’t want to give you an alert every single time something goes wrong. Instead, they’ve turned to golden signals, which is a term lifted from Google’s SRE handbook.
There are four golden signals, each of which gives you a better idea of the performance of individual nodes, as well as the application as a whole. These four signals are the most important elements of a system that should be focused on. Instead of having to take note of every single one of the different aspects that your system is processing, these will allow you to quickly distinguish overall performance.
In short, these are the four most important areas to monitor, hence their titling as ‘Golden Signals”:
Let’s break these down further.
Latency is the amount of time that it takes for a service request to be fulfilled. If this is particularly high, then you’re able to quickly realize that your service or application is degrading.
By taking an average of the whole system, the Kubernetes environment is then able to signal if any particular element of the application is running too slowly. If a specific part of the system registers a disproportionately high latency, then this golden signal will demonstrate which part of the requests is slowing down the system.
From there, you’re able to spring into action, fixing your system to ensure that everything continues to run smoothly.
The golden signal of traffic is any information that’s related to the direct number of actions taken. This can vary, and expand to include a range of different metrics, like the number of sessions, total transactions per second, or even the number of HTTP requests.
While these different traffic metrics all alter, they all have one thing in common – they’re directly measuring the total traffic on the system, helping you to work out if an area of your system is being over or underused.
With this in mind, you can calculate the relative strain on each of your nodes or pods, helping you to gain a more concrete understanding of consumption within your Kubernetes application.
Errors are most commonly the signal that comes to mind when people mention the golden signals, with this being the most pressing signal that will require immediate attention if triggered within your Kubernetes environment. This signal will demonstrate the rate of errors, both implicit and explicit, within your system, reflecting the general health of your application.
- Explicit errors – Explicit errors are anything that’s obvious to spot and understand, often being directly categorized with documentation that can be followed to overcome the error. A great set of examples for explicit errors are the HTTP errors, which are all categorized depending on their type of error (E.g., 404, not found).
- Implicit Errors – On the other hand, an implicit error would be any errors that do not generate an HTTP error or a content error that is delivered through a seemingly successful action. These are harder to spot naturally, making their inclusion in this a further useful aspect of the system.
As errors will be produced at a fairly continual rate, this golden signal will be conveyed in rates, with the overall goal being to lower that rate as much as possible. By using the overall rate, you can then trace individual errors and get more information.
Saturation is the metric that allows you to assess overall performance in terms of how much capacity is being used. At 100% capacity, your system would be incredibly saturated, meaning that it is receiving a huge number of requests at any given time.
Saturation metrics are most commonly understood, as they are fairly easy to trace once established. Often, this golden signal will derive its figures from memory usage or total CPU use. Due to this, they are collected directly from the main system, making them incredibly straightforward.
The best approach when it comes to saturation is to only select the specific saturation metrics that are most important to your particular application. If you have an app that is incredibly processor intensive, then CPU would be the most appropriate metric to select. On the other hand, pick memory if you’re creating a memory-intensive environment.
Try to aim for under 80% saturation within this metric; with that, you’re able to push your system without driving it to the breaking point.
Kubernetes is a masterful containerized application builder, helping developers around the world create efficient ecosystems. Part of what makes this service so useful is the comprehensive approach it has to monitoring, maintenance, and automation. Looking at the four golden signals, Kubernetes’ background in Google tech is made even more clear, with these four key focuses being integral to the system’s success.
By learning what each of the golden signals is and why you should be monitoring them, you’ll be well on your way to keeping your application ecosystem as healthy as can be.