Log Analysis: What Is It and How Does It Work?
If you work in Information Technology, you have doubtless encountered logs- in fact depending on your area of expertise, you may be inundated with them on a daily basis. Nearly every piece of digital technology produces some kind of log, from complex web applications to the drivers that power your mouse and keyboard. As such, the definition of what a “log” actually is, is necessarily loose; any output received from a piece of software could be considered a log. For the purposes of this article, we’ll define a log as a structured piece of information produced in real-time by a piece of software that informs the user of that software’s current status.
Log analysis, then, is the act of analyzing that software’s status over time. More specifically, it is the process of aggregating, categorizing, and deriving new information from the logs that you receive from the software you use. There are endless tools and methodologies that are used for log analysis, so in this article, we will go over a few of the most common strategies used in the industry and attempt to give readers the knowledge they need to make an informed decision for their use case.
The Fundamentals of Log Analysis
As mentioned above, log analysis can be broken down into three basic components regardless of context: aggregation, categorization, and derivation. In this section, I will break down those three concepts and explain how they translate into the real world.
Note: the next few sections are largely for those new to log analysis. If you are only looking for advice on establishing a log review process or deciding on which tools to use, you may consider skipping ahead to the Popular Log Analysis Tools section.
In order to analyze any set of data, you must first get all of it in one place. In the case of logs, this usually means storing the logs you receive in real-time to some kind of database that you can reference later. Even when analyzing logs in close-to-real time, such as network traffic analysis, you must have somewhere for the logs to go so that they can be accessed.
The first step to aggregation is to find out where your logs are going by default. Since almost all software produces logs, part of a software developer’s job is to decide where those logs will go. The most common locations tend to be a combination of log files, databases, and standard output.
Log files can be in a variety of places depending on the software, but some operating systems such as Linux do a bit of the aggregation for you- the /var/log directory tends to be the default logging location for software made for that platform. For other platforms, you’ll have to consult the software’s documentation.
Databases are a popular option because they don’t necessarily have to exist on the same device that is running the software. If the software and device you’re using are connected to the internet, it’s a safe bet that logs are being sent to a central database server. Unfortunately, these databases aren’t always accessible to the end-user as they can be expensive to maintain and are designed for use by the software manufacturer, not you.
Standard output is probably the option you see the most because the “standard output” tends to be your screen. For example, if you run a program on Linux and see the output in the terminal, you have just received a log via the standard output.
Once you know where the logs are going by default, the next step is to decide where you want them to go. Usually, this will either be a database (if you want easy remote access) or a directory on your file system. Consider where you want your logs to be stored carefully- it should be somewhere that is easy for you to access, but secure from prying eyes.
The final step to aggregation is to redirect the logs from their default location to the place where you want them to go. This could involve modifying the software to change the default location, copying all or portions of log files, redirecting data from standard output, or making requests to a log database and storing the result.
Once you have established a method for capturing and storing your logs, the next step is categorization. While you could technically just analyze your logs at face value, breaking them down into categories and assigning labels to your data will not only make your analysis more effective, it will also greatly expedite the process. There are a nearly infinite number of ways to categorize data, but it is helpful to me to break categorization down into two rough types: intrinsic and comprehensive categorization.
Intrinsic categorization is assigning labels to data based on that data’s intrinsic characteristics. In the case of logs, that could be things like the time or time period in which it arrived, where it came from, or whether it is a part of a set. Knowing this kind of information about your data allows you to be more granular in your analysis down the line- it could be useful to know a statistic about your data, but it is probably more useful to know that statistic in relation to a time period and locality.
Comprehensive categorization is assigning labels based on what the data says- that is, what information is the data trying to convey. A comprehensive breakdown of a log could be determining if it is an error, its severity or importance, or the specific action that generated the log. The comprehensive type of categorization is also where you would apply any labels specific to your use case. These comprehensive labels are typically what you really want to know about the log, and the intrinsic labels help you navigate the data effectively.
Deriving New Information
Finally, once you have collected and categorized your logs, you can begin the process of discovering new information about the software you’re using. This usually comes in the form of a statistical analysis of the labels you have assigned to the logs in relation to one another. The possibilities for derived information are only limited by your ability to categorize the data- some examples could be success rates over a time period, the distribution of traffic over a network, or which servers are operating most efficiently.
Log Analysis in Practice
The previous three sections have been very general in nature, and that’s for good reason. In practice, the steps for implementing a log analysis system are going to be different for everyone. There are a variety of tools to choose from for each step (some of which will be discussed below) and plenty of opportunities for do-it-yourself solutions. One thing that you can be sure of when establishing a log analysis process in practice, however, is that there will almost certainly be one extra step: normalization. Normalization is something that needs to be done in all kinds of real-world statistical analyses, and log analysis is no exception. While it’s possible that all of your logs will be perfectly uniform in format and structure, it’s not all that likely. Almost all log analysis systems have an intermediary step between aggregation and categorization in which you decide on a common format in which to store your data and enforce that format on all incoming logs. The formatting requirements for storing logs vary from case to case, but I will go over some best practices in a further section.
What are the Benefits of Log Analysis?
Before delving further into how you can go about establishing your own log analysis process, it’s important to know what benefits you will reap from the investment. The benefits you get depend largely on your role in the software development process, so this section will be broken down into three parts: benefits to the developer, benefits to the manager, and benefits to the entrepreneur.
The benefits of log analysis to the developer are the most straightforward- with streamlined log collection, categorization, and review, you can immediately know how changes to your codebase affect your product’s performance and stability. A good log review system can also help pinpoint difficult-to-pin-down errors that would have otherwise gone unnoticed.
Log Analysis Best Practices
Keep It Consistent
Especially when digesting logs from multiple sources, it’s important to make sure that commonly indexed data fields are formatted in a consistent way. Put simply, there should be some parts of your processed logs that contain the same kind of information formatted in the same way every time. A timestamp is an excellent example- if you want to find logs from a specified period of time or some change in log output over time, your job will be infinitely easier if you pick one date format and stick to it.
Look For Patterns
This is particularly useful in the IT world, and I’ll give an example of what this means and why it’s important instead of explaining it outright. Years ago I worked as an intern in the IT networking department at my university and we were experiencing some regular interference on our wireless network. I sat with the wireless engineer for hours trying to find the source, until one day I noticed something that we’d overlooked- the outages moved across campus throughout the day. Our logging software provided us with a map of the physical access points and, plotting their status over time, we could clearly see that this interference swept across campus at a set time every day. Using that observation and the rest of our logging information, we deduced that the culprit must be the large doppler radar located not too far from campus. Case closed!
Know Your Logs
Depending on the software, you could be receiving thousands or tens of thousands of logs a day and, to be frank, it’s extremely unlikely that all of them are important. That is why it’s important to know your logs and know what an important log looks like. If your software produces a standard log message during routine operation that you know will never be relevant to your use case, it’s best to simply not include it in your aggregate logs. This is a concept commonly referred to as artificial ignorance, and it helps save both time and computing resources.
Tagging: Do’s and Don’ts
Tagging your logs is extremely important- it turns a mountain of hard-to-read reports and error codes into an easily searchable set of data. Unfortunately, however, poorly tagged logs can cause a huge swing in how relevant and useful your log analyses are. Over-tagging, or assigning too many specific tags to your logs, can leave you with logs that are hard to read and difficult to derive meaning from. After all, statistical analysis is all about large numbers, so if your hyper-specific tags only have a few logs apiece, it’s going to be difficult to come to any conclusions about the software they came from.
On the other hand, it is important to avoid under-tagging. Being too general in your tags will result in fundamentally different types of logs being lumped together, which not only bogs down search results but also makes any information derived from this data fundamentally less precise.
In practice, it tends to be better to have too few tags as opposed to too many, but both extremes come with consequences. It is best to come up with a list of attributes you know you want to identify in your logs and move from there- don’t spend too much time thinking up tags that might be useful, just add the ones you know you need and add more as necessary.
Popular Log Analysis Tools
The following selection of log analysis tools are all capable of aggregating your logs into a dashboard, data visualization, popular statistical analysis functions, and incident reporting. Some come with novel features such as machine learning tagging and analysis, integrations with popular development tools, and more, while some of these tools will only handle certain use cases.
LogEntries is great because it will accept just about any log under the sun. This is particularly useful if you have a lot of plaintext or json log files sitting around without a clear way to aggregate or analyze them. LogEntries also has the benefit of being hosted on a central server, so you can send all of your logs to the same place and don’t have to worry about server maintenance.
Splunk is similar to LogEntries in that it will accept any kind of data. It also has a wide variety of direct integrations with many enterprise applications and has a very aesthetic UI for all of its dashboards. Splunk is also probably the most popular log analysis tool among large companies, serving 91 of the Fortune 100. This comes at a price, however, as Splunk is easily the most expensive service on this list.
My favorite part of Logz.io is its transparency. At its core, Logz.io is a hosted version of the ELK stack (mentioned below) with additional visualizations, analysis, and exporting tools built on top. While it may seem odd that they are selling what is otherwise an open-source product, their business model actually makes a lot of sense. They essentially eliminate the work of setting up your own log management server while still maintaining the level of freedom you have when self-hosting. Their pricing structure is also different from others on this list, as they charge based on usage as opposed to an upfront charge.
Unfortunately, the world of truly free log analysis software is… sparse. The tools listed below have very usable free offerings, but also offer enterprise or hosted versions of their software for a fee.
“ELK” stands for ElasticSearch®, Logstash®, and Kibana®. Together, these three open-source services very neatly fulfill all of the roles required for an effective log analysis system. Logstash aggregates logs and the combined use of ElasticSearch and Kibana allow for tagging, data visualization, and analysis. The only downside of ELK is that it takes some amount of work and resources to get going. You’ll need a dedicated server to host it on, and while you don’t necessarily need to be a developer to get it going, it is quite involved.
Google Cloud Metrics
If your project uses a Google service in any capacity, you should take advantage of Google Cloud Metrics. Technically this service isn’t “free”, but their free tier extends to your first 50GiB of logs and unlimited standard metrics, so you can get a ton of use regardless of whether you pay or not. The actual log analysis product they offer is quite standard, but the real benefit here is in aggregation. Any Google project can have its logs automatically into Cloud Metrics for free, so I would recommend using it if any part of your operation utilizes the GCP.
Log analysis is a staple of the IT world and is something I would recommend any developer or IT professional has a plan for. Using the tools and strategies outlined in this article, you can (and should) regularly produce actionable insights into the day-to-day operations of your software tools.