Identifying the right moment
For E-commerce that is in the process of scaling up, ensuring that applications are updated to their newest versions is critical.
This article will share best practices for upgrading the framework without frustration. This case study is based on the Spree Commerce platform. Regardless of the platform – most of the problems and challenges are similar. Applications are often built as either a PoC and/or an MVP, and they have often evolved over the years to become an integral piece of one or more large systems that provide significant economic value to the app’s home organization.
Unfortunately, it is also not uncommon that the amount of feature work required to support the application’s growth leaves little or no space for serious maintenance. With a dearth of development capacity, upgrades to the major frameworks used by the application can become postponed.
After reaching a phase of relative stability, the topic of upgrades often crops up. But as the end of the Long-Term-Support period for a specific version of the framework approaches, further security patches become unworkable. At the same time, the development team loses its ability to use new features of the framework or add modern libraries. This severely and negatively impacts the dev team’s overall productivity.
While the upgrade becomes an urgent topic of concern, there’s often some ongoing development that simply can’t be put on hold. In response to these challenges, our Upside team has identified a set of practices that make these processes seamless. These practices allow performing necessary upgrades in a way that reduces impacts for the feature Team. As you read through the entire article, you will gain insights into the specific strategies we used and the challenges we addressed.
Identify an Upgrade’s Milestones
While an attractive upgrade goal may be to move the framework in question to the latest, greatest, state-of-the-art version, it is more pragmatic to take a measured, step-by-step approach. This is especially true if current versions of a framework’s libraries are outdated.
The best approach, we have found, is to define a set of milestones for the upgrade, where each milestone relies on as few dependency upgrades as possible. This greatly reduces the risks of production deployment and of developers getting stuck in a “dependency hell.”
As an example, while upgrading a Spree framework, you may simultaneously be upgrading its underlying Ruby on Rails version and Bootstrap (which comes with some versions of Spree). Tackling all these upgrades at once can be a nightmare. So, if possible, an ideal upgrade scenario would involve first upgrading Spree to a version that comes with a newer version of Rails, deploying it, and then performing another upgrade that requires upgrading Bootstrap.
Migrating from Spree 3.0 to 4.4
Regression Test Planning
While a framework upgrade usually won’t bring new features, the desired outcome is for the external behavior of the application to remain unchanged. This is exactly why QA is so important to such projects – QA staff will be critical to supporting the rest of the team and ensuring that no major issues are introduced along with the upgrade.
Their work usually starts with the creation of an in-depth regression checklist, which they will be able to execute after each stage of the upgrade. It’s also important that QA pros capture existing bugs in the system so that there’s clarity regarding what present issues look like. Once the application gets to a point where it appears to be stable with the upgraded framework, QA staff will then execute a detailed regression test of the whole application (which also covers all its integrations). This way, QA can effectively verify the details that weren’t (or can’t be) covered by automated tests.
Even the slightest change introduced to the software needs to be verified in the context of the entire application. Therefore, while executing the upgrades and migrations, it is crucial to ensure that the application maintains its integrity and that every feature works as expected. – Wojtek Wiśniewski – QA Lead at Upside
It’s important that QA is aware of the internals of each upgrade – this allows them to direct their attention to areas that are likely to break due to underlying changes, which in turn means they can design more thorough test cases.
Managing ongoing work: Shopify’s dual-booting approach
Coordinating upgrades with ongoing feature work can be a major challenge. A common practice among companies is to perform an upgrade on a separate, long-lived branch of the codebase, periodically merge the main branch to it, and resolve any potential issues “on the go.”
Then, there’s usually a longer code-freeze period, during which feature work is halted and everyone on the team prepares for the upgraded version’s upcoming release. This halt and preparation period needs to be planned well in advance, and there needs to be clear communication with the business teams so they can set proper expectations for the timelines of other features.
To offer an industry example, Github spent a few years working on a single rails upgrade, due to multiple conflicts between new syntax in Rails 3 and Github’s ongoing feature work, which was based on Rails 2.
But there’s an approach that can result in better timelines and a smoother development workflow. It’s based on a concept called “dual booting,” which was mastered by Shopify and Github and based on hard-learned lessons from their upgrades of large Rails applications.
Dual booting explained
The dual-booting approach is based on having a single codebase with two configurations (old dependencies and new dependencies) that can be toggled by, for example, setting an environment variable.
Parts of the codebase that need to differ depending on the version of the underlying framework (e.g., different syntax for Active Record) can be then hidden behind a feature flag. This allows for the avoiding of long-lived branches, which are difficult to maintain. What is more, the CI environment can run tests for both versions of the code at the same time – that makes it possible to observe the status of the upgrade but also prevents developers working on new features from introducing code that’s not compatible with the new version of the framework (since they have to maintain compatibility for both versions from the start).
Now, with the plan and basic setup done, let’s take a deeper look at the tactical steps that need to be taken in order to successfully perform an upgrade.
Get it to boot first, then make tests pass
After configuring dual-booting, the next step is to make the application boot in the new configuration – regardless of whether it actually works as expected. Being able to start a Rails environment allows for the setup of further automation (e.g., CI responsible for running tests). It also creates a test environment skeleton that will allow the team to verify their fixes on the new codebase.
At this stage, depending on your test coverage, it may be a good idea to review the output of automated tests run on the upgraded version. They will provide a good guideline for obvious problems that occur across the application after upgrading core frameworks. The majority of these problems will be easy fixes; if you’re able to turn all of them green, they will be equally important to the team that ends up doing feature work.
Configure dual booting on CI to enforce which new features are compatible with the upgraded codebase
After test suites start passing on the upgraded codebase, we reach a point when we’re able to start enforcing the compatibility of new features with the upgraded codebase.
It is costly to force developers to boot the application with the new version and perform two sets of regression tests before every deployment. But we can take advantage of some low-hanging fruit here by building an additional step into the pipeline. The goal is to run automated test suites based on the old and new configurations. Assuming that new features will be covered by unit tests at the very least, these tests will act as an assertion that new features can also execute correctly with the new configuration. Even without the hassle of dual-booting, developers will have a chance to make this confirmation and, if needed, adjust their features to a newer version of the framework.
There is of course a small price to pay for all this – your build time will likely double. Yet this is usually still far more efficient than having developers spend weeks resolving pesky conflicts that arise after the codebases have diverged.
Handling database migrations
Database migrations can be the most difficult part of a system upgrade, especially when dealing with large-scale systems. While you won’t encounter migration issues while upgrading basic frameworks, such as Ruby on Rails, you will come across plenty of issues when upgrading frameworks containing business logic, such as Spree.
Reviewing and assessing migrations is a critical step for the development team. It is important to recognize the possibility that migrations end up running long – indeed, they may need to be run well before “switching the flip” to the upgraded version. This means planning and coordination with other actions are vital. Migration issues are usually specific to a certain application, as run time will often depend on the utilization of certain tables or on the indices that were created.
When reviewing database migrations, don’t forget about external dependencies that may have direct access to the database – especially BI tools. Quite often, BI teams have their own set of reports that aren’t directly in the application code but that nevertheless need to be updated after migrations are executed.
As important as it is to have the new version of dependencies running, very often, when executing a regression on the new config, a QA may encounter bugs or inconsistencies. It’s very useful to have a reference environment available.
A reference environment allows for the quick verification of whether a specific bug was just introduced or whether it had already existed in the old version and simply didn’t surface while regressions were being prepared.
This is where dual booting will come in handy – it makes it easy to set up multiple staging environments with different versions of the dependencies. During the upgrade process, it’s best practice to have two staging environments:
Having a simple process to deploy the latest version of the codebase to these environments greatly improves both workflow and development speed. If you’re using CI and Docker-based deployments, it also significantly reduces hassles during ongoing work.
It’s also great to have a process for easily replicating staging data. In a best-case scenario, these would be anonymized production data, which would make it possible to test the upgrade in a state that’s as close to the real production environment as possible.
Identifying breaking changes in an underlying framework and adapting new code to them
When it comes to code, even after you fix the obvious issues, you may still encounter some changes in the logic of the underlying framework. These can range from changes in the ActiveRecord query interface between Rails 2 and 3 to more complex business logic changes especially if your application is based on a more complex framework, such as Spree Commerce.
Sometimes, an upgrade in a core framework may also require changes to other libraries used in the project. Spree upgrades often require the upgrading of tools like Paperclip. This needs to be planned for as well.
And still other times, a library used for a project may not be compatible with a newer version of the desired framework. In such cases, you may need to either look for a replacement library or fork the library and upgrade its internals as well.
Framework’s release notes are a great source that can help you pinpoint possible trouble spots that may require more attention. It’s important to plan ahead for dealing with those. Before executing the upgrade, prepare a specific plan for each such area. Sometimes pain points may require technical workarounds, while others may affect some sort of change in business logic.
When executing an upgrade, dual booting and feature flagging will make it easy to manage dependencies and divergences in your application’s code.
Deployment of a framework upgrade is usually riskier than deploying the majority of features.
Let’s face it: the deployment of a framework upgrade is usually riskier than deploying the majority of features. It involves changes in underlying technology that can have impacts across the application and may be difficult to isolate. Framework deployments also cause changes to business logic and, in some cases, to the underlying database.
While regression tests can give you a great deal of confidence in your upgrade, it’s important to plan carefully in order to avoid any major disruptions to business operations. There are several measures you can take to avoid such significant disruptions:
- Review the calendar:
Avoid deploying during intensive times for a business, e.g., the period around Black Friday when dealing with e-commerce applications.
- Plan development team availability:
Pick a time when the development team will be able to quickly react to any issues. The holiday season is not the best time for a deployment, nor are Fridays (unless the team is scheduled to monitor the deployment’s status over the weekend).
- Review time investment for migrations:
If an upgrade includes database migrations, make sure that they’re categorized – long-running migrations will need to be run ahead of the upgrade.
With dual booting, the deployment itself is as simple as changing an environment variable and restarting the application or updating the Kubernetes config to use another docker image. After the deployment, it’s once again a good time for the QA to execute a basic regression in production to verify that everything is working as intended.
When the upgrade is finished and verified in production, it’s important to clean up the feature flags. Otherwise, they’ll add unnecessary confusion and complexity to future upgrades.
After executing the last milestone of the overall upgrade, you may also want to remove the dual booting setup. Just remember to make it easy to enable it again, as it will probably be useful in the future.
Upgrading frameworks is a difficult and complex process. However preparing a solid plan that involves multiple milestones, covers the details of each upgrade, and ensures proper handling of data will make upgrades much smoother. The use of modern techniques that leverage dual booting will also reduce the effort required to coordinate work between upgrade and feature teams while also reducing the risk of incorrect code merges. Quality Assurance will also play an important role in the process – they’ll map out the state of the current system and give you confidence in the quality of the upgraded version.
If you’re planning an upgrade of your systems and are looking for support from an expert team who’s successfully completed tough upgrades in the past, don’t hesitate to contact us. We’d be happy to help you plan, test, and execute your upgrade. Upside’s team has migrated multiple systems for our clients. Projects have ranged in scope from upgrading a Rails framework to updating enterprise B2B tools to setting Spree-based stores to the latest framework version.
CTO at Upside
Technology leader with 10+ years of experience in managing technical projects – from large corporations to promising and scaling startups. In 2017 Rafal founded Upside becoming responsible for overlooking major projects.