Canary Testing - What It Is And Why You Should Use It

There are many testing methods that DevOps teams employ to ensure that they can accurately (or near accurately) gauge the success of their work. Every company that creates software knows that they have to deploy a series of tests on their product before deploying it for the masses.  

Software testing is imperative for the development lifecycle. The tests evaluate and assess the functionality of the application. Testing proves that the software meets the specific requirements set out by the developers and the clients. It also ensures that the program and the coding are free of any defects. Companies that often test their products prove to be proactive and customer-centric. It means that user experiences matter to them. Providing software that enhances the experience or makes things convenient for the end-user is the goal. 

Key terms that are getting all the buzz these days in the software developing world are Continuous Delivery (CD), Continuous Deployment (CD), and Continuous Integration(CI). Though they are different in unique ways, CI and CD are, at the core of it all, a practice that developers use to merge code changes into a central repository. Once the code joins into the source code, it automatically builds and runs tests. After the tests, codes deploy into the application. These methods allow developers to develop faster, release secure and easy to fix batches of code, and show users how the application improves as time progresses.  

Canary Testing is one such method that DevOps teams rely on to find defects in the performance of a software application. This test is used as a litmus test by studying how it behaves for a small percentage of end-users. It is similar to sampling in statistical analysis, where a predetermined number of observations need undertaking from a much larger population. This sample then provides an overview and an estimate of how the larger population could respond. Similarly, this allows DevOps teams to collect data to help them figure out if their code is behaving the way they want it to or not. To understand how developers use the canary test, let’s take a deep dive into what is Canary Testing?

Feel free to use these links to navigate this article:

What is Canary Testing?

Programmers use Canary Testing to push small changes to the application to a group of end-users. The group is usually a small percentage of the larger pool of users. Distributing this code to a sample group lets the DevOps team see where the issues in the code are.

Since developers are deploying code incrementally, canary testing is a powerful and practical method. It tests new features and functionalities during the production stage. This testing also ensures that there is minimal impact on the application’s users.

In many places, canary testing is used interchangeably with canary release and canary deployment. When we talk about canary testing, however, it essentially means releasing code to evaluate and test new features or versions with the help of real users existing in the live production environment.  

From Where Does Canary Testing Get Its Name?

Canary Testing gets its name from an old mining practice. In the old days, mines could get filled up by dangerous gases like carbon monoxide. When these gases reach toxic levels, they could cause an explosion or poisoning in the miners. They did not have sensors like we do today. So, miners would take a canary in a small cage into the mines with them. And as long as the canary kept singing (stayed alive), they knew they were safe. If carbon monoxide or any other toxic gas levels increased, the first thing that would be affected would be the bird. Due to its low tolerance to the gases, the poor thing would die, and miners received the signal to evacuate at the earliest.  

In a way, canary testing pays homage to the old practices. If a canary stopped singing, the miners knew they had to leave before the problem increased. In doing so, they were able to save many lives. It is similar to what software developers are now doing by employing Canary Testing. By using this testing, companies can become efficient. The efficiency leads to better customer satisfaction, reduced costs of maintenance, and seamless deployment of the software on a larger scale. Additionally, it ensures better product quality while proving that the company is sensitive about security issues. The canaries helped miners in the past, and now they help developers deploy better software. 

How Do You Perform a Canary Test?

Performing a canary test is relatively simple on paper. You need to partition your users into subgroups and roll out a subset of your features to each subgroup. You need to implement solid and proactive monitoring to ensure that you are aware of issues and inconveniences faced by each subset of users as and when they occur. Depending on the response, you can fine-tune the size and composition of the subsets for better results. For a more detailed take on implementing Canary Testing, hop over to the last section of this guide.

Benefits of Canary Testing

Testing, in general, has been known to reduce so many issues from impacting end users. At every stage of the software development life cycle, canary testing has saved time and effort and created software that makes life convenient. It also reflects on the company’s good reputation and goodwill. Since there are fewer issues impacting users and annoying them, it does not impede sales efforts. In turn, it keeps the developers happy too. 

This section will break down all the advantages of canary testing so you know what you can leverage to produce the best possible software out there. 

Simplicity

Canary tests and deployment are simple. It does not assume too much from the application as the majority of the work is already prepared. Developing teams only have to inject new code into the selected partitioned branch. If any issues arise, developing teams can reverse the changes without difficulty. In both situations, whether it succeeds or needs fixing, the wider end-user is not affected.   

Low Maintenance

Canary testing is low maintenance. It is done for a short-term period only, and once the results are analyzed, the developers immediately move on to the next phase. Since it only looks at a small subset of the end-users, it also requires fewer resources to monitor the performance. The best thing about it is that you are in control of maintaining the performance of the canary test. You get the information immediately without third-party interference.  The chain of command is smaller and requires less maintenance. It reduces the risk of unwanted outcomes affecting the larger group of the user base. 

Low Cost

Internal teams of developers conduct canary tests. For the canary tests, developers only need a small amount of infrastructure to run it. Apart from that, it also ensures a reduction of the cost of fixing issues. Since the impacted users are a small percentage of the overall user base, the best remedy would be to roll back to the previous version. This way, developing teams do not have to deal with the outage of the services. The customer base will not be affected, and the company does not have to deal with angry customers. The fix is cheaper, considering that the impact is negligible.

Zero Production Downtime

Since canary testing has a low impact on the users, it ensures that the overall system does not have a production downtime. If and when an error occurs, the traffic can be re-routed back to its original baseline. It would be as if nothing ever happened. In the meantime, developers can start determining what the cause of the error is to fix it. 

Flexibility

Canary tests encourage developers to experiment and innovate. Because the tests impact a smaller group of users, developers can be confident about their code and its effects after the updates. Developers can also play around with the number of users to have to test work on. While best practice suggests a smaller percentage, they can gradually increase the percentage with confidence. If the first 5% worked well, developers further decide to roll out the update to 10%, 25%, 50%, and even 100% when the work is complete. 

All Deployment Sizes

While the impact of a canary test is low, it vets the system and the new code. In doing so, it supports many kinds of systems. It does not matter whether the environment serves users in a specific geographical region or if the system wants to work for users worldwide. The flexibility allows for all kinds of deployment sizes, making it easy and flexible for all DevOps teams. 

Involvement of Users in Development

Another great advantage of canary testing is building a beta version of the program. The beta testing phase can invite users to take part in the testing. Involving existing users to check out the new code gets developers immediate feedback on how the latest version of the application is working. DevOps teams can seamlessly figure out areas of improvement and find errors while building a relationship with their users. 

Developers get a direct window of audience requirements for an application. Having this information equips them with the perspective of users, making the application more user-centric. Canary testing also provides you with an idea of what the real-world implications of the new code are. This practice reflects on the strong risk management capabilities of the developing team. 

Potential Drawbacks of Canary Testing

You should be aware that despite canary testing having a lot of advantages, some drawbacks could affect your development processes. Knowing them can help you plan and execute the test better, allowing you to leverage it to your advantage. It is essential to know everything you can about canary testing so that you can use it strategically. 

Affected Users

One crucial drawback that developers should be well aware of is that if there is an issue in the new feature, it will affect the sample group of users. However, please note that this group would be a tiny percentage of users. Even if they are involved, developers can easily roll it back to the baseline, and thanks to these users, the problem won’t affect everyone else.  

Time Consuming and Error-Prone

Without automation, canary testing can become error-prone. The problem is that while it helps with gathering information, companies still have to assign a DevOps engineer to oversee the collection of data and logs. Once the data is collected, the engineer also has to analyze it manually. In most cases, a single DevOps engineer is tasked with this. Alone, having to deal with all the data and without the help of another engineer, it can get overwhelming. 

This process can take time. Working on combing through data to analyze alone can also be a daunting task. One wrong decision can affect the entire release, which can cause an epic rollback. The manual work required for this testing increases the chances of errors. However, there are ways to counter this through automation. 

On-Premise/Thick Client Applications

Canary testing is not suited for offline, standalone applications for personal devices. Personal devices include users’ mobile phones or laptops. One way to counter this problem is to set up or encourage the end-user to turn on auto-update for the application. This way, users get updates regularly on their standalone programs, and developers can roll out a canary or an updated release without any problems. 

Mobile Applications

Mobile applications are usually distributed with the help of an app store. For Apple, it is the Apple Store; for Android, it is often the Play Store. Through this method. It becomes harder for developers to choose the users to test. To mitigate this issue, developers can conduct canary tests with feature flags. 

Adding Complexity

Canary testing’s implementation can become complicated. It is comparatively easy if the developing team plans to manage a different version of an application with a canary. However, if the team wants to manage a database, it adds to the complexity. It happens because it is easier to deal with the application itself. It becomes even trickier when a developer tries to alter the application to deal with the database or implement a change on the database schema. It increases the complexity of managing the system. However, developers can prevent these issues before they arise by knowing these aspects of the database that can be affected. 

Dealing With Different Variants

Essentially, when using canary testing, developers are dealing with multiple variants of the same software. They have to handle the variants simultaneously, which adds to the complexity. It is easier to manage just one or two other variants of the program. But as the system and the software grows, multiple features and demands increase as well. So planning a canary test has to be done meticulously to be successful. 

What is important to note here is that the challenges mentioned above are entirely dependent on the capabilities of an organization. Despite the drawbacks, there are ways to counter them. Developers can still leverage this information to make things work for them. One of the ways to do that is to reduce the number of versions at a time. Keeping things simple will always be conducive for teams to focus on one aspect before improving the next. Tightening monitoring, setting up good practices for data collection for analytics, and implementing a well-planned strategy will support developers through this process. 

How To Implement Canary Testing?

Software testing is integral to ensuring that the quality of the code is in check. It validates the developers’ work, confirming that the code they wrote is correct. If you have put in hours on coding, you would want to know whether it is working correctly or not. Testing supports that. Many developers would be familiar with A/B testing or Blue-Green for this purpose.  

During a canary test, a subset of end-users is selected who receive the new version of the software application. In simple words, monitoring the performance of the code in this version becomes easy for developers. In many cases, experts say this small subset is usually about 5% to 10% of the users. However, this percentage depends on the developers. They can increase it or keep it low as the requirement of their threshold governs. The output of the new code gets assessed by the developers who study this small group, and the evaluation reflects on its future performance output.

What Does a Canary Test Do?

The canary test checks for the following issues to ensure that there are no problems: 

If things do not work out, you can always roll it back and send the users back to the original infrastructure.  

Three Phases of Canary Testing

The process for canary testing and development is simple and has only three stages:

Plan and Create

This phase can be the longest and the most arduous of them all. In the first step of canary testing, it is imperative to plan. Here, you are looking for what your intended output is. Knowing what you are looking for will help you find what you need to look for once the canary is deployed to test the new features. You should be looking out for things like metrics. Some examples of what you might want to monitor are as follows:

Additionally, you will also plan what your thresholds will be. You have to identify your random subset of users who will be testing the new features through the canary. Are you looking at routing the canary to 5% or 10% of your user base? You can also consider selecting it by assigning it to a specific region. Once you have finalized these points, you and your team can start working on the canary infrastructure. Once you create the canary and partition your user base, you can route the new code to the selected user base. Here you are going to prepare everything. It focuses on the deployment on a staging server. You will have to prepare the following:

After that, you will have to create a canary node through a process called load balancing. You will be cloning your production environment. Essentially, you will be making a similar infrastructure to the software environment that is already active. One of the clones will be the original: your baseline. This clone is the one that you rely on if the new code does not work. In case it does not work, you can roll back to it. You can create clones based on the number of features you want to test, but the minimum is, of course, two. 

In the planning phase, you also want to set boundaries on the duration of the test. Usually, canary tests run anywhere between minutes and hours, so keeping a close eye on it is imperative.

Analyze

Once you have routed the code to the selected user base, you will start seeing some traffic sent to both the baseline and the canary test nodes. In this phase, your team will be testing the new version. You will be gathering data for the metrics that you have designated from the previous stage. You want to know whether the latest version is performing with consistency and checking its system health. Look for data about latency, memory usage, error count, and volume. You will have logs provided to you that will give details about where there are bottlenecks. 

Roll

The information provided by your monitoring setup will help you make informed decisions for the next step. If problems are identified, it will provide information that would help your team fix them. If there were no issues, you can easily consider rolling the version out entirely to the whole baseline or try another test with another subset of users.  

Here are some possibilities to choose from:

When To Use Canary Testing

As a rule of thumb, a canary test should only happen when the development team wants to evaluate how the newer version performs. With that said, it is still imperative to test code before its deployment to avoid issues in the future. Conducting a canary test is always a good practice to understand what the code can do before updating the entire environment. It is beneficial for developers who are working on applications that depend on continuous deployment or integration. 

Given today’s prevalence of software updates deployed regularly and the fast-paced updates in technology, canary testing helps ensure low downtime for application performances. Additionally, applications based on legacy or third-party systems and infrastructure can only have code tested in a live production mode. It is cheaper to do it this way because replicating those systems could be more complex and expensive. It also supports applications that employ several microservices that work independently, so the testing must be done in the live production mode. 

Its implementation is simple and beneficial only if you consciously plan all the steps mentioned above. Only then can you successfully conduct and leverage canary tests and deployments. You will have all the information you need to take the next step in innovating your best application. 

Summary

Calling back to the old days when miners would rely on a canary for their safety, software developers today rely on a canary test to monitor the output of their code. The simplicity of this method also uses statistical analysis to understand how the code will behave by using sampling. Developers let a small group of users deploy a test code to them and study the performance and the user experience to find out if there are any issues. If the test shows errors, then developers have to get back to the drawing boards. If no errors show up, developers can safely consider rolling the software out to the rest of the users and updating the infrastructures.  

Canary testing gives developers the freedom to see how new code will behave without affecting the entire user base. The impact of the code only affects anywhere between 5% to 10% of the users as designated by the team. Once the developers are satisfied that the code is working as planned, they can easily roll it out to the larger population by updating the application environment with the new code.