What is observability?
Observability is the technical term for understanding the behavior of a system using externally recognizable factors. However, microservices pose a particular challenge. Collecting data and analyzing it is complicated.
The term observability originally goes back to the engineer Rudolf E. Kalman. This brought him up in the context of a thesis on control theory in linear dynamic systems. He defined it as the value for how well the internal states of a system can be derived from externally recognizable results.
To put it simply: can external measurements determine whether there is an error? Can you possibly even tell what it is? The advantage is obvious: the system does not have to be “blindly” drilled to eliminate a problem.
Observability and software
This basic definition applies to all areas to this day. Yuri Shkuro also added a simple demarcation to monitoring: That is the measurement of functions that were developed in advance. Observability, on the other hand, is about determining things that were previously unknown via your own system.
Or, to put it briefly and simply: The monitoring provides information about misconduct. Observability asks the “why?” Behind the problem. In the software area it is therefore a quality criterion – similar to usability, for example . And in monolithic systems, handling is also relatively easy.
The SLIs (service level indicators that measure the degree of functionality of a service) can be easily identified. If a value falls, the reason is found at the same time. As an example: If a word processing program no longer fulfills all functions, there is an error in the code, which leads to failure. Since it is one of the SLOs (Service Level Objectives, objectives for the degree of functionality of a service) that all functions work, the error is eliminated.
Observability and Microservcies
However, monolithic systems are becoming increasingly rare. Microservices are taking their place . These are often provided jointly by different cloud services. A flood of data arises. It is often difficult to determine their origin and their precise purpose. For this reason, too much information is often collected for observability. In other cases there are far too few.
An example is a smartphone app that is connected to a cloud service. At the same time, the cloud of the provider uses the operating system to authenticate the user and to interact with the rest of the device. The following problems are not uncommon, but they cause major problems when assigning data to observability:
- Should the login time be measured even though the user is typically not logged out?
- Certain functions provided by the operating system (e.g. copy and paste) do not work in the app – how should this be taken into account?
- The loading times of the app differ greatly between users, even though they use the same device model – what does this mean?
- Log files are not collected in full or do not provide the relevant information?
Observability tools in microservices
To solve this problem, observability tools rest on three pillars that work in microservices. They flow together, but should be named separately for easy understanding.
First, metrics are used – everything that can be measured is measured. Second, traces are also determined as part of application performance management . Put simply, the “traces” are traced as paths that functions take. The measuring points are noted on these. Third, the logs are evaluated. These are the logs of previous actions.
These pillars make it possible to evaluate within the framework of microservices how well a service works, where it runs, how it has behaved in the past and with whom it interacts. However, the need for computing power is so high to carry out this process permanently that it can practically only be done by the developers themselves in everyday life.