Unlocking the Power of CI/CD
We celebrated Remote’s 18th birthday today, 1st September 2017. It makes me smile to remember our early days; sat in the corner of our bedroom with a 586 PC, Jeannie designing our websites, and me putting them together with raw HTML. We couldn’t have imagined that all these years later we’d be creating all kinds of digital sorcery as a team of seven, delivering magic for brands like Volvo, Sony and Volkswagen Group. The tools and best practices in the industry have changed dramatically, too.
In the first year or so of Remote, I used Dreamweaver to piece together sites page-by-page. When I was done, I’d FTP the results to my hosting provider via my 56k modem, and would let them do whatever it was that they needed to do to make the site run on the web.
I didn’t need to know how it worked; it just worked. The files I uploaded wrote directly over the old files. And if I needed to change anything, I just wrote directly over those files too. And if I made a mistake and needed to go back, I just had to hope I’d remembered to back up to that external hard drive fairly recently, or I’d have a late night ahead!
Forward three or four years, we were a team of three, co-locating our own server network down in Telehouse in the London Docklands. Our sites were database-driven using SQL Server 2000, and the HTML was generated by ASP and VBscript. When we upgraded a site, we had to remember the files that we’d changed as part of the release, and upload those files.
After some time we learned to create a new folder for each upgrade, so that we could use that as a staging site, until we were ready to go live — then we’d work with IIS and domain bindings and DNS to make the staging site go live.
It worked, but it was prone to all sorts of errors. We might forget to change a configuration file during the go-live process. Or something that worked previously would suddenly stop working because of a change we’d made somewhere else in the project, and we hadn’t thought to test that aspect of the site. And if a very quick fix was needed, we would often give in to temptation and simply upload the fix right onto the live site, instead of using a new staging folder.
We installed, upgraded and managed the servers ourselves. If a component in the server broke down, it was up to us to fix it.
I remember one time, one of our web servers went down — I called up an engineer on site, who became my eyes, opening up the machine, and telling me what he could see. I deduced that the power supply had blown, and proceeded online to search the computer supply stores that were near the data center to find one who had this near-obsolete power supply (the server was our first, and it was getting on a bit). When I found one, I paid a London motorcycle courier to pick up the supply, and drive it to the data center where the engineer was waiting to install it. The sites were all back up again after three very tense hours.
Forward another eight or so years, there were five of us in the team and virtual servers have become a viable proposition, and, given the advantages, we dived in. If I needed a new server, I no longer needed to order it, install the software, drive down to London and screw it into the racks myself (although I always felt pretty space-age whenever I worked in the data center). I’d just need to press a few buttons on my VMWare vCloud Director console, and 20 minutes later the server would be spun up and ready to roll.
If I needed more hard drive space, it no longer meant downtime while I took the machine out of its racks and screwed the drive into place, configuring RAID (terrifying) and hoping the whole thing booted up again cleanly. Instead, now I’d just click a few buttons on the online console, do a quick reboot, and the drive would be available to me.
If a power supply failed, the server instance would simply move itself to another of the multiple blade servers available, and would pop back online again within minutes. It was still a little clunky to operate, and I still needed to manage software updates and stay on top of resources such as CPU, RAM and disk space, but it was a huge improvement.
Our deployment processes didn’t change much during that transition . We became more vigilant about our staging folders. Instead of using folders called ‘dev’, and ‘live’ — which would immediately become confused when we needed a new version (‘dev 2, or ‘new live’, later becoming ‘live final’, live final 2’ etc), we used release-version numbering for the folders instead — ‘v1’, ‘v1.5’, ‘v2’ etc. The process was more robust, along with our growing knowledge of how to avoid the common pitfalls to keep our sites live and error free, but human error was still possible, and doing small updates was still a fairly major operation if we were to do it properly without the error-prone shortcuts.
As we moved into more enterprise-level applications and mobile apps, our deployment processes became more sophisticated. Online applications required multiple configuration files, and mobile apps had many stages of build and deployment, with batch files, server scripts, and all sorts of manual configuration to get to the finished product. A lot of the knowledge that we needed for these processes was kept in the heads of the developers who needed it, and the concern with that is if those developers moved on, so could that knowledge.
But technology evolves at a dizzying pace, and at Remote we love to keep up with that pace.
Fast forward to 2017, now an Agile team of developers, designers and project managers, the company is unrecognizable from it’s humble beginnings and the Cloud is everything. We’ve moved most of our applications over to Microsoft Azure — the flexibility of the App Service is astonishing. We can fire up a new service, tell it whether it’s .NET, .NET Core, PHP, or whichever flavor of code we’re working with, and set up the usual things we’d set up in IIS, but in a simple form.
We can add the an application to the service plan, where all apps share the allocated memory, CPU and so on, giving back the resources to the pool when they’re not needed. The Service connects with Azure SQL, which behaves in a similar way, sharing resources with its pool, and reducing resources when they’re not needed. A customizable dashboard provides graphs of resource utilization over all the applications and services, and we can isolate a service and move it into its own resource pool in a couple of clicks, with no downtime.
If an application finds itself under heavier-than-usual demand, Azure can ‘Scale-out’, and create a new, copied instance of the application in real-time, sharing visitor traffic between the two, three, four, or ten copies of the application as needed — when the demand drops again, so do the copied instances. All while I sleep.
All of our version control is managed with Git. No more scrabbling through backups to find a previous version of a file. We just roll our version of the source code back to a time when the file was in place, and make whatever changes necessary. New changes are merged smoothly and easily with the ‘master’ trunk, often following a ‘pull request’, where peers can review and comment on new sections of code before they’re allowed to join the main source. And when that code is committed to the master, oh sweet delight…
Continuous Integration and Deployment is to me the most exciting development of all in recent years. I commit my code to the master trunk, and VSTS recognises that the code has been changed. Immediately, it runs my task list — it takes the full code, downloads any relevant NuGet packages, builds the solution, runs our unit tests to make sure that everything still does what it should, and then once all is well, it publishes the whole project to a staging folder and warms it up so that when we visit the site for the first time, we don’t have to wait for it to load up. Once we’ve given the staging site a series of QA manual tests to be sure that we’re happy with it, a single mouse click swaps the staging and production slots, making our new updates go live without any downtime at all for the website visitor. And the previous production version is now held in staging, just in case we have need to quickly roll back again.
It’s brilliant, it’s simple, it removes many, many possibilities for human error, and it’s so much faster. And with more complicated builds, these scripts and batch files can be programmed into the deployment — no more checklists and things to remember, just a Git check-in, and the project is deployed. We have finely detailed control on how the build happens, and what happens during that build, and the order of tasks can be altered as we desire. It feels like we’ve reached a kind of technological utopia.
Of course, the setup isn’t always completely straightforward and out-of-the-box . An online application that uses Session ID’s for example, will need to be re-coded to use something like the Azure Redis Cache to hold those states, so that they’re accessible from multiple instances, and not lost when an App Service switches servers.
In Azure, everything is volatile and a site which holds user files such as images or documents needs to be reprogrammed to hold those files on shared virtual storage drives, so that when the CD swaps deployment slots, the latest files aren’t swapped too. But, it’s easy to do and the benefits are immense.
Staying within the Microsoft Stack for the entire development cycle has huge benefits too — We develop using Visual Studio, and host our Git repositories on VSTS, which is where we’ve migrated our project management process to as well, so that we can work on an issue, relate Git commits to it, and see which build the issue was released in, all within the same software. Pull requests mean that other developers can review and comment on the code from within Visual Studio, which can also manage the Git pulls and commits. We can set up CI/CD in Azure and then make more granular changes to that setup within VSTS, all very quickly and transparently.
I can even get a Slack message when a build is completed, or, more importantly maybe, when it fails; we create a new Git branch for each issue we work on, and then commit that branch back to the trunk, so we get immediate feedback from the CI system if something has broken, and immediate feedback from the client, who can access the staging site — no more waiting two weeks for an official release, when we would then have to merge all of our development branches into the master at once — sheer hell — instead, a steady, gradual and safe deployment. The ability for clients to give such instant feedback means that they can see what element of a project we’re working on and give early feedback, so that we can implement any adjustments early in the life-cycle.
It really feels like we’re at the point where this kind of automation provides a juicy layer of immediacy, interaction and intelligence that has always been needed in managing these processes, and the result is a huge improvement in reliability and speed and quality. I’ll go more deeply into the details of all these technologies in future articles, but for now, here’s to the brave new world; I’m a very happy resident.