Client
The National Institute for Health and Care Excellence(NHS)
Project Information
Topic: Infrastructure as Code for building environments
Technology: AWS Services, Terraform and Ansible
Challenges
- Un-planned maintenance.
- Deployments are manual and risk of human error when installing packages and installations are un-managed in non-repeatable manner.
- Deployment process is difficult and hard to pinpoint failures, difficult to roll back to a known state and it takes long time to deploy new components.
The Solution
The solution is based on Hashicorp’s reference architecture for AWS. AWS Modules are used to build:
- VPC and its subnets, route tables and Internet Gateway
- NAT Instances and Elastic Ips.
- EC2 Instances and databases for compute and storage.
Tools Used:
- Git – SCM Repository
- Jenkins – Orchestrator
- Terraform – Infrastructure Build
- Ansible – Configuration Management and Software Deployment
- AWS Services & CLI.
The below architecture explains how the resources are build and configured in AWS using open source available tools.
The solutions steps are described as below:
- Engineers develops Terraform modules, Jenkinsfile and ansible playbooks using their local system.
- Code is checked-in and committed in a private git repository. And trigger to Jenkins is initiated using webhooks.
- Jenkins triggers the pipeline which execute the stages for downloading code on terraform server and initiating the build on AWS.
- Terraform build AWS services and updates the details in Ansible host to deploy software library.
- Ansible trigger the build and configures the environment as per playbooks and smoke the system for availability.
Results and Benefits
- No planned downtime required for maintenance.
- Deployments will be automated TO ENSURE risk of human error is removed when installing packages and to ensure installations are managed in a consistent and repeatable manner.
- Easily roll back to a known state and quicker and reliable deployment of any new/existing components.
- Average cost reduced to one-third of actual resulting in savings and best use of human resources.
- In the first month of operation no major incidents occurred, and an uptime of nearly 100% was achieved.