Keptn + Unleash = Automated Feature Flag Self Healing for Non Kubernetes Environments

04/05/2020

Keptn is an event-based control plane for continuous delivery and automated operations. Unleash is a feature toggling framework. It sounds like these two would play well together, even in non cloud-native environments…

Overview

Self healing software is a cornerstone in software automation maturity. Tooling comes second to the ability and willingness to change the way you create software.

Keptn already has an excellent tutorial on running Keptn & Unleash in Kubernetes or OpenShift based environments. The focus of this post is how we can leverage these components in a non-Kubernetes environment.

This post will create an application that is coded to include feature flag capability. While disabled, the application will deliver traffic in the normal operating mode ie. serving the content from the application itself. When the feature flag is enabled, the application will instead serve traffic from a static resource (a file hosted on a CDN).

The feature flag engine that makes this possible will be Unleash.

Keptn will be the orchestration layer responsible for reacting to “problem events” and self healing the application by enabling the feature flag.

This scenario is used in real-world scenarios when websites wish to include a safety switch that they can toggle in case of emergencies (traffic overload, DDoS attack etc.)

Rounding out the system will be a monitoring provider. The monitoring providers job is to inform Keptn whenever there is a problem with the system (in our demo, an increased server-side error rate).

Keptn can use metrics from any third party tool, the two most common being Prometheus or Dynatrace. For this demo, we will use Dynatrace.

Fast Developer Feedback

Not only is it important to have self healing software, but it’s imperative that the developers get instant feedback on their code. For this reason, Keptn automatically pushes a stream of comments on to the Dynatrace problem ticket whenever a remediation step is attempted / performed. In this way, the developers can understand exactly what led up to the issue & how Keptn resolved the issue.

Architecture & Sequence Diagram

Here is the system architecture and sequence of events:

  1. Application is monitored by Dynatrace.
  2. We will manufacture an increase in error rate to simulate a production issue.
  3. Dynatrace identifies the issue and sends a notification to Keptn.
  4. Keptn triggers the remediation workflow and informs Unleash to toggle the feature flag to “on”. Traffic is now served from the CDN rather than the app.
  5. Keptn notifies the monitoring solution (Dynatrace) of the remediation attempt.

architecture and sequence diagram

Prerequisites

Any linux based VM will work, but you will have to modify the instructions to suit your distro.

Networking Prerequisites

For this demo:

  • The keptn VM will need to allow inbound HTTPS traffic from Dynatrace SaaS.
  • The application VM (running proxy, app, unleash and postgres) will need to allow inbound HTTP traffic from the keptn machine.

Containers?

I have chosen to deploy these workloads as containers purely for ease during the demo setup. There is nothing here that could not be achieved without using containers.

In terms of the demo and outputs, you may entirely ignore the fact that the application and Unleash service are running in containers. It makes no difference to the end result.

Install & Configure Git and Docker

On the application VM, run the following:

sudo apt update && sudo apt install git docker.io -y
sudo usermod -aG docker $USER

Launch a new terminal window to pick up the new user permissions. Validate it works with docker ps

You should see:

CONTAINER ID    IMAGE    COMMAND    CREATED    STATUS    PORTS    NAMES

Clone Repo

git clone https://github.com/agardnerit/unleashtutorial
cd unleashtutorial && chmod +x loadGenErrors.sh

Create New Docker Network

This allows containers to talk to each other via their container name.

On the application VM, run the following:

docker network create agardner-net

Run a PostGresDB for Unleash

On the application VM, run the following:

docker run -d --name postgres --network agardner-net -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_DB=unleash postgres

Database = unleash Username = postgres Password = mysecretpassword

Build & Run the Unleash Container

On the application VM, run the following:

docker build -t unleash ./unleash && docker run -d --name unleash --network agardner-net -e DATABASE_URL=postgres://postgres:mysecretpassword@postgres:5432/unleash unleash

Build and Run the App

On the application VM, run the following:

docker build -t app . && docker run -d --name app --network agardner-net app

Build and Run the NGINX Reverse Proxy

On the application VM, run the following:

docker build -t proxy ./proxy && docker run -d -p 80:80 --name proxy --network agardner-net -e DT_CUSTOM_PROP="keptn_project=website keptn_service=front-end keptn_stage=production" proxy

Validate Containers

Running docker ps should show 4x containers: proxy, app, unleash and postgres.

CONTAINER ID    IMAGE   ...  PORTS                NAMES
c1344de4e69c    proxy        0.0.0.0:80->80/tcp   proxy
676935d87028    app                               app
be6937f7641c    unleash      4242/tcp             unleash
fee962f54612    postgres     5432/tcp             postgres

Validate User Interfaces

  • The Unleash UI should now be available on http://<APP-VM-IP>/unleash
  • The app should now be available on http://<APP-VM-IP>

Validate that both of these are available by visiting them in a browser.

You can login to unleash with any username & password.

unleash ui

app ui

Validate Dynatrace Tags

In your Dynatrace tenant, open the Transactions and Services page, select the Keptn website production management zone to filter your services and navigate to the unleash-demo service.

Ensure that your service is tagged with the following:

keptn_project:website, keptn_service:front-end and keptn_stage:production

dynatrace tags

These tags are created when you installed the Dynatrace service on Keptn . If you do not see these tags, please STOP and ensure you follow this instructions linked above.

This tutorial WILL NOT WORK without these tags.

Validate Problem Notification Integration

Keptn automatically configures the problem notification integration when you onboard the Dynatrace Service.

Validate that it’s available now. In Dynatrace, go to Settings > Integration > Problem Notifications and you should see an entry for Keptn. If you do not see this problem notification, STOP and ensure you’ve installed Dynatrace on the keptn box.

dynatrace tags

This tutorial WILL NOT WORK without this integration.

Configure Problem Sensitivity

For demo purposes, we will set Dynatrace to be extremely sensitive to failures. Find the unleash-demo:80 nginx service, edit the anomaly detection settings and adjust the settings to match these:

dynatrace tags

Create Feature Flag

  • Go to http://<APP-VM-IP>/unleash and login (use any fake values you like to login)
  • Create a feature flag called EnableStaticContent (case sensitive and must be called this).
  • Set the flag to disabled
  • Refresh the application UI and you should still see the standard (blue bar) page.

unleash ui with feature flag

application standard ui

Manually Test Flag

Prove that the feature flag works:

  • Go to the app (http://<APP-VM-IP>) and refresh the page. You should still see the blue banner. This page is served from the app container.
  • Enable the feature flag and refresh the app. Notice the green banner, this page is served from GitHub.

application CDN ui


Set the flag back to disabled so that traffic is being served by the app (blue banner).


application standard ui

Clone Repo to Keptn Machine, Create Keptn Project & Service

Execute these commands on the Keptn cluster:

cd ~
git clone http://github.com/agardnerit/unleashtutorial
cd unleashtutorial
keptn create project website --shipyard=shipyard.yaml
keptn create service front-end --project=website
keptn add-resource --project=website --service=front-end --stage=production --resource=remediations.yaml --resourceUri=remediation.yaml

The values in the remediations.yaml file tell Keptn how to respond when it sees a failure rate increase problem for this project (website), service (front-end) and stage (production).

Create Secret & Bounce Remediation Service

Note that the username and token can be set to anything.

The remediation-service pod must be recreated so that it picks up this new secret.

Execute these commands on the Keptn cluster:

kubectl create secret -n keptn generic unleash --from-literal="UNLEASH_SERVER_URL=http://<APP-VM-IP>/unleash/api" --from-literal="UNLEASH_USER=me" --from-literal="UNLEASH_TOKEN=whatever"
kubectl scale deployment -n keptn remediation-service --replicas=0 && kubectl scale deployment -n keptn remediation-service --replicas=1

Load Generator

Run the load generator which will create errors. In another tab, keep refreshing the page and in a few minutes (when DT raises a problem) you’ll see the website failover to the green static hosted content.

Run this on the application VM hosting the website:

cd ~/unleashtutorial
./loadGenErrors.sh
  • You will see HTTP 500 errors being shown. The failure rate registered by Dynatrace on the unleash-demo:80 service will also increase.
$ ./loadGenErrors.sh 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>
-----------------
  • After about 10 minutes, Dynatrace will register a problem and push a notification to Keptn.
  • The Keptn remediation-service will activate and toggle the feature based on the remediations.yaml file.
  • The feature flag will be enabled and the CDN page will be served (from GitHub) (this is the actual page)

unleash toggle enabled

application CDN ui

Bonus: Synced JIRA Tickets

jira ticket

I have the JIRA Service and the Dynatrace for JIRA Cloud plugin installed into the Keptn cluster. Combined, these two do the following:

  1. Create a JIRA ticket whenever a problem event occurs. This JIRA ticket contains all the details of the problem.
  2. Sync comments bi-directionally between Dynatrace and JIRA.
  3. Link directly to the Keptn’s bridge and the Dynatrace problem from the ticket.

My developers now have an automatically created record of the problem and all remediation actions, right within their normal JIRA workflow. They can reply to the ticket from within Dynatrace or the ticket itself, everyone has all relevant details instantly, whichever tool they use.

Don’t use JIRA? Keptn is plug & play so just switch the service out for a different one such as Slack or ServiceNow (or write your own service for any other tool).

Conclusion

Although Keptn and Unleash are primarily designed for Kubernetes-based environments, there is no reason they (and the automation concepts described) cannot be used in a non-containerised environment.

Willingness to change the way you design and write software. Willingness to empower developers. Ability to quickly get relevant, actionable feedback to developers. Three key ingredients to a successful automation strategy.

Tooling helps, but culture is much more important on your software automation journey. The three big ones are:

  1. A willingness to rethink the way you design and write software.
  2. A willingness to empower development teams to take responsibility for the code they produce.
  3. The ability to get relevant, actionable feedback to developers - quickly.