Keptn is an event-based control plane for continuous delivery and automated operations. Unleash is a feature toggling framework. It sounds like these two would play well together, even in non cloud-native environments…
Overview
Self healing software is a cornerstone in software automation maturity. Tooling comes second to the ability and willingness to change the way you create software.
Keptn already has an excellent tutorial on running Keptn & Unleash in Kubernetes or OpenShift based environments. The focus of this post is how we can leverage these components in a non-Kubernetes environment.
This post will create an application that is coded to include feature flag capability. While disabled, the application will deliver traffic in the normal operating mode ie. serving the content from the application itself. When the feature flag is enabled, the application will instead serve traffic from a static resource (a file hosted on a CDN).
The feature flag engine that makes this possible will be Unleash.
Keptn will be the orchestration layer responsible for reacting to “problem events” and self healing the application by enabling the feature flag.
This scenario is used in real-world scenarios when websites wish to include a safety switch that they can toggle in case of emergencies (traffic overload, DDoS attack etc.)
Rounding out the system will be a monitoring provider. The monitoring providers job is to inform Keptn whenever there is a problem with the system (in our demo, an increased server-side error rate).
Keptn can use metrics from any third party tool, the two most common being Prometheus or Dynatrace. For this demo, we will use Dynatrace.
Fast Developer Feedback
Not only is it important to have self healing software, but it’s imperative that the developers get instant feedback on their code. For this reason, Keptn automatically pushes a stream of comments on to the Dynatrace problem ticket whenever a remediation step is attempted / performed. In this way, the developers can understand exactly what led up to the issue & how Keptn resolved the issue.
Architecture & Sequence Diagram
Here is the system architecture and sequence of events:
- Application is monitored by Dynatrace.
- We will manufacture an increase in error rate to simulate a production issue.
- Dynatrace identifies the issue and sends a notification to Keptn.
- Keptn triggers the remediation workflow and informs Unleash to toggle the feature flag to “on”. Traffic is now served from the CDN rather than the app.
- Keptn notifies the monitoring solution (Dynatrace) of the remediation attempt.
Prerequisites
- A Dynatrace tenant (free trial available here)
- A full Keptn installation
- Dynatrace monitoring and the Dynatrace SLI Provider installed and configured on the Keptn machine.
- An Ubuntu VM used to host your website and the Unleash Feature Flag service.
Any linux based VM will work, but you will have to modify the instructions to suit your distro.
Networking Prerequisites
For this demo:
- The
keptn
VM will need to allow inbound HTTPS traffic from Dynatrace SaaS. - The
application
VM (runningproxy
,app
,unleash
andpostgres
) will need to allow inbound HTTP traffic from thekeptn
machine.
Containers?
I have chosen to deploy these workloads as containers purely for ease during the demo setup. There is nothing here that could not be achieved without using containers.
In terms of the demo and outputs, you may entirely ignore the fact that the application and Unleash service are running in containers. It makes no difference to the end result.
Install & Configure Git and Docker
On the application VM, run the following:
sudo apt update && sudo apt install git docker.io -y
sudo usermod -aG docker $USER
Launch a new terminal window to pick up the new user permissions. Validate it works with docker ps
You should see:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Clone Repo
git clone https://github.com/agardnerit/unleashtutorial
cd unleashtutorial && chmod +x loadGenErrors.sh
Create New Docker Network
This allows containers to talk to each other via their container name.
On the application VM, run the following:
docker network create agardner-net
Run a PostGresDB for Unleash
On the application VM, run the following:
docker run -d --name postgres --network agardner-net -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_DB=unleash postgres
Database = unleash
Username = postgres
Password = mysecretpassword
Build & Run the Unleash Container
On the application VM, run the following:
docker build -t unleash ./unleash && docker run -d --name unleash --network agardner-net -e DATABASE_URL=postgres://postgres:mysecretpassword@postgres:5432/unleash unleash
Build and Run the App
On the application VM, run the following:
docker build -t app . && docker run -d --name app --network agardner-net app
Build and Run the NGINX Reverse Proxy
On the application VM, run the following:
docker build -t proxy ./proxy && docker run -d -p 80:80 --name proxy --network agardner-net -e DT_CUSTOM_PROP="keptn_project=website keptn_service=front-end keptn_stage=production" proxy
Validate Containers
Running docker ps
should show 4x containers: proxy
, app
, unleash
and postgres
.
CONTAINER ID IMAGE ... PORTS NAMES
c1344de4e69c proxy 0.0.0.0:80->80/tcp proxy
676935d87028 app app
be6937f7641c unleash 4242/tcp unleash
fee962f54612 postgres 5432/tcp postgres
Validate User Interfaces
- The Unleash UI should now be available on
http://<APP-VM-IP>/unleash
- The app should now be available on
http://<APP-VM-IP>
Validate that both of these are available by visiting them in a browser.
You can login to unleash with any username & password.
Validate Dynatrace Tags
In your Dynatrace tenant, open the Transactions and Services
page, select the Keptn website production
management zone to filter your services and navigate to the unleash-demo
service.
Ensure that your service is tagged with the following:
keptn_project:website
, keptn_service:front-end
and keptn_stage:production
These tags are created when you installed the Dynatrace service on Keptn . If you do not see these tags, please STOP and ensure you follow this instructions linked above.
This tutorial WILL NOT WORK without these tags.
Validate Problem Notification Integration
Keptn automatically configures the problem notification integration when you onboard the Dynatrace Service.
Validate that it’s available now. In Dynatrace, go to Settings > Integration > Problem Notifications
and you should see an entry for Keptn. If you do not see this problem notification, STOP and ensure you’ve installed Dynatrace on the keptn box.
This tutorial WILL NOT WORK without this integration.
Configure Problem Sensitivity
For demo purposes, we will set Dynatrace to be extremely sensitive to failures.
Find the unleash-demo:80
nginx service, edit the anomaly detection settings and adjust the settings to match these:
Create Feature Flag
- Go to
http://<APP-VM-IP>/unleash
and login (use any fake values you like to login) - Create a feature flag called
EnableStaticContent
(case sensitive and must be called this). - Set the flag to
disabled
- Refresh the application UI and you should still see the standard (blue bar) page.
Manually Test Flag
Prove that the feature flag works:
- Go to the app (
http://<APP-VM-IP>
) and refresh the page. You should still see the blue banner. This page is served from theapp
container. - Enable the feature flag and refresh the app. Notice the green banner, this page is served from GitHub.
Set the flag back to disabled
so that traffic is being served by the app (blue banner).
Clone Repo to Keptn Machine, Create Keptn Project & Service
Execute these commands on the Keptn cluster:
cd ~
git clone http://github.com/agardnerit/unleashtutorial
cd unleashtutorial
keptn create project website --shipyard=shipyard.yaml
keptn create service front-end --project=website
keptn add-resource --project=website --service=front-end --stage=production --resource=remediations.yaml --resourceUri=remediation.yaml
The values in the remediations.yaml
file tell Keptn how to respond when it sees a failure rate increase problem for this project (website
), service (front-end
) and stage (production
).
Create Secret & Bounce Remediation Service
Note that the username
and token
can be set to anything.
The remediation-service
pod must be recreated so that it picks up this new secret.
Execute these commands on the Keptn cluster:
kubectl create secret -n keptn generic unleash --from-literal="UNLEASH_SERVER_URL=http://<APP-VM-IP>/unleash/api" --from-literal="UNLEASH_USER=me" --from-literal="UNLEASH_TOKEN=whatever"
kubectl scale deployment -n keptn remediation-service --replicas=0 && kubectl scale deployment -n keptn remediation-service --replicas=1
Load Generator
Run the load generator which will create errors. In another tab, keep refreshing the page and in a few minutes (when DT raises a problem) you’ll see the website failover to the green static hosted content.
Run this on the application VM hosting the website:
cd ~/unleashtutorial
./loadGenErrors.sh
- You will see
HTTP 500
errors being shown. The failure rate registered by Dynatrace on theunleash-demo:80
service will also increase.
$ ./loadGenErrors.sh
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>
-----------------
- After about 10 minutes, Dynatrace will register a problem and push a notification to Keptn.
- The Keptn
remediation-service
will activate and toggle the feature based on theremediations.yaml
file. - The feature flag will be
enabled
and the CDN page will be served (from GitHub) (this is the actual page)
Bonus: Synced JIRA Tickets
I have the JIRA Service and the Dynatrace for JIRA Cloud plugin installed into the Keptn cluster. Combined, these two do the following:
- Create a JIRA ticket whenever a problem event occurs. This JIRA ticket contains all the details of the problem.
- Sync comments bi-directionally between Dynatrace and JIRA.
- Link directly to the Keptn’s bridge and the Dynatrace problem from the ticket.
My developers now have an automatically created record of the problem and all remediation actions, right within their normal JIRA workflow. They can reply to the ticket from within Dynatrace or the ticket itself, everyone has all relevant details instantly, whichever tool they use.
Don’t use JIRA? Keptn is plug & play so just switch the service out for a different one such as Slack or ServiceNow (or write your own service for any other tool).
Conclusion
Although Keptn and Unleash are primarily designed for Kubernetes-based environments, there is no reason they (and the automation concepts described) cannot be used in a non-containerised environment.
Willingness to change the way you design and write software. Willingness to empower developers. Ability to quickly get relevant, actionable feedback to developers. Three key ingredients to a successful automation strategy.
Tooling helps, but culture is much more important on your software automation journey. The three big ones are:
- A willingness to rethink the way you design and write software.
- A willingness to empower development teams to take responsibility for the code they produce.
- The ability to get relevant, actionable feedback to developers - quickly.