Keep a pulse on drift in production, and verify weekly releases
The Customer
A large insurance company has automated weekly releases to a broad range of devices. One of the teams is particularly concerned with gaining visibility into the causes of drift, as well as verifying what is within the team’s scope of responsibility, and what is really the responsibility of other groups in the company.
The Situation
The company is under a massive development project, encompassing a complete rewrite of all their production systems – everything from infrastructure systems all the way up to applications like CRM. This has been going on for the past two and a half years, and has involved 25 high-level groups across the company, a project whose scope is in the hundreds of millions of dollars. All 25 teams will push their code to production on the same “big day.”
The project leader explained the issue as this: “We have multiple teams pushing code on the same day and then we’ll have all this trouble as people figure out problems. I want to be able to say, ‘my application is working and here’s why: I’m using an independent verification platform that proves all my code made it.’”
Drift is also a critical problem into which the customer wants to gain visibility. Mid-week, each week there are production issues and the question of “what changed?” There are two classes of drift. First, people change things – some are even approved modifications, like a hot fix. Then, all of a sudden, they’ve impacted production. The second class of change is when the build system doesn’t get it right.
“We’ve written all our deploy scripts by hand,” says the project leader. “We have to constantly tweak the scripts to accommodate new feature sets and new elements of the system, so it doesn’t always get it right. Sometimes the scripts don’t deploy. Sometimes the scripts don’t push change sets out appropriately. Sometimes the scripts push the change sets out, but don’t move it anywhere – in other words, the script drops the change set down on a file system, but the file system doesn’t apply the change set – it just sits there.”
The Challenge
The company has a very structured release process. They build the source code, deploy it to a particular environment, and in between the build and deploy they have an elaborate process of scrubbing and staging their code. They manage five different environments between development and production: development, base verification test, staging, UAT, then production. This process was necessary in their distributed server environment, where two-thirds of their production is in a DMZ environment. Deployment challenges ensued because of the quantity of different devices, and device-specific configurations.
As a result, almost every device to which they pushed code had its own unique configuration. The scrubbing process added different information to make the code appropriate for its target environment, e.g., IP addresses, database names, servers, etc.
SignaCert’s Solution: Integrate Verification into Automated Release Process
This middle tier scrubbing process posed a challenge for harvesting the code for verification. This could be accommodated with a more integrated harvest process. One of the company’s big requirements was the ability to automate. To solve this problem, they ran SignaCert’s Enterprise Trust Server (ETS) on their automated system, where almost nothing is manual. They wanted the ability within this automated framework to have an independent validation of the code, scrub it, and then harvest it to make it available within the ETS, do a deployment, and then measure that deployment back against the ETS. Because ETS has open APIs designed for easy integration, the company was able to integrate SignaCert’s ETS appliance into the automated release management process so that it remains completely hands-off.
The automated system set up: Builds go through the scrubbing process, which makes them appropriate for the devices they’re going to release to; scrubbed and staged builds then get placed on a file system. Next, the ETS client does a harvesting process, and submits them to the ETS, mid-deployment. The production deployment takes place, and immediately, the ETS begins automatically measuring for what is expected to be in production. Nobody has to do anything except approve the submissions to the ETS, which they do every morning. The ETS is automated so that there’s no intervention needed – it’s a fully automatic update to the baselines of their production servers based on the new code that is pushed out of staging.
Results
With a short implementation timeframe – a matter of hours – the company easily got ETS in place in advance of the “big day.” Now each night they monitor the state of their production servers and run comparisons so they can see drift over time. Their goal was to able to answer yes to all these different things: "Did the change set make it out to production? Is it in the right place or is it just sitting on a file system?" Their goal was to able to answer yes to these questions, and withSignaCert they can.
The first priority was to nail the date so that the project leader had proof his deployments worked. Going forward, as they stabilize this huge push, the project leader will use ETS to verify that his team is doing everything right, and isn’t wasting its resources figuring out other teams’ problems.
Now that the project leader has proven the value of ETS’s independent verification to his own team, he plans to recommend it to all the other release management teams. “It’s somewhat selfish, saying, ‘I’m tired of fixing your problems, tired of supporting issues that turn out to not be mine.’ So if everyone here used ETS, we’d all have lower support costs because we’d all know exactly where things were breaking,” says the project leader.