How to setup Flask and Apache on an Ubuntu VM in DigitalOcean with a Custom Domain

In this video I show how setup Flask and Apache on an Ubuntu VM in Digital Ocean with a custom domain. This was made after someone in the comments on my other DigitalOcean video requested it. If there is something else anyone would like to see, please just let me know I am happy to provide these walk through’s.

Note: I hit a number of challenges with DNS in this one, I think it’s fun to watch me struggle. Enjoy!

How to setup Flask and Apache on an Ubuntu VM in DigitalOcean with a Custom Domain Read More »

Setting up Kubernetes to manage containers on the Google Cloud Platform

These days the pace of innovation in DevOps can leave you feeling like you’re jogging on a treadmill programmed to run faster than Usain Bolt. Mastery requires hours of practice and the last decade in DevOps has not allowed for it. Before gaining 10 years of experience running virtual machines using VmWare in private data-centers, private cloud software like Openstack and Cloudstack came along, and just when you and your team painfully achieved a stable install you were told running virtual machines in public clouds like AWS, GCP, and Azure is the way forward. By the time you got there it was time to switch to containers, and before you can fully appreciate those, server-less functions are on the horizon, but I digress. If you want to know more about server-less functions, see my previous article on AWS Lambda. Instead, this article will focus on running Docker containers inside of a Kubernetes cluster on Google’s Cloud Platform.

Linux Containers, which were recently popularized by Docker need something to help manage them and while there are many choices, Kubernetes the open-sourced container management system from Google is the undisputed king at this time. Given that Kubernetes was started by Google, it should be expected that the easiest way to install it is using Google’s Cloud Platform (GCP). However, Openshift from Redhat also provides a nice batteries included abstraction if you need to get up and running quickly as well as kops.


The main pre-requisites you need for this article is a Google Cloud Platform account and installing the gcloud utility via the SDK.

In addition, you need some form of a computer with Internet connectivity, some typing skills, a brain that can read, and a determination to finish…For now I will give you the benefit of the doubt and assume you have all of these. It is also nice to have your beverage of choice while you do this, a fine tea, ice cold beer, or glass of wine will work, but for Cancer’s sake please skip the sugar.

Here is where I would normally insert a link to facts on sugar and Cancer’s link, but I literally just learned I would be spreading rumors… Fine drink your Kool-Aid, but don’t blame me for your calories.

The Build Out of our Self Healing IRC Server Hosting Containers

I lied dude, IRC is so 1995 and unfortunately, ICQ’s been dead and Slack won’t let me host their sexy chat application with game like spirit and better jokes than Kevin Heart. So…sorry to excite you… but I guess I will fallback to the docs here and install Nginx like us newb’s are supposed to.

Numero Uno (Step 1 dude)

As part of the installation of the gcloud / SDK you should have ran gcloud init, which requires you to login with your Google account via a web browser.

You must log in to continue. Would you like to log in (Y/n)?  Y

Your browser has been opened to visit:

You are logged in as: [].

This account has no projects.

Would you like to create one? (Y/n)?  Y

After clicking allow in your browser you will be logged in…and asked about creating an initial Project. Say yes (type Y and hit enter).

Enter a Project ID. Note that a Project ID CANNOT be changed later.
Project IDs must be 6-30 characters (lowercase ASCII, digits, or
hyphens) in length and start with a lowercase letter. tuxlabsdemo
Your current project has been set to: [tuxlabsdemo].

Not setting default zone/region (this feature makes it easier to use
[gcloud compute] by setting an appropriate default value for the
--zone and --region flag).
See section on how to set
default compute region and zone manually. If you would like [gcloud init] to be
able to do this for you the next time you run it, make sure the
Compute Engine API is enabled for your project on the page.

Your Google Cloud SDK is configured and ready to use!

Sweet your Project is now created. In order to use the Google Cloud API’s you must first enable access by visiting and clicking enable.

That will take a minute. Once completed you will be able to run gcloud commands against your Project. We can set the default region for our project like so:

tuxninja@tldev1:~/google-cloud-sdk$ gcloud compute project-info add-metadata --metadata google-compute-default-region=us-west1
Updated [].

If you get an error here, stop being cheap and link your project to your billing account in the console.

Additionally, we want to set the default region/zone for gcloud commands like so:

tuxninja@tldev1:~$ gcloud config set compute/region us-west1
Updated property [compute/region].
tuxninja@tldev1:~$ gcloud config set compute/zone us-west1-a
Updated property [compute/zone].

Numero Dos Equis

We need to install kubectl so we can interact with Kubernetes.

tuxninja@tldev1:~$ gcloud components install kubectl

Your current Cloud SDK version is: 175.0.0
Installing components from version: 175.0.0

│               These components will be installed.                │
│         Name        │       Version       │         Size         │
│ kubectl             │               1.7.6 │             16.0 MiB │
│ kubectl             │                     │                      │

For the latest full release notes, please visit:

Do you want to continue (Y/n)?  Y

╠═ Creating update staging area                             ═╣
╠═ Installing: kubectl                                      ═╣
╠═ Installing: kubectl                                      ═╣
╠═ Creating backup and activating new installation          ═╣

Performing post processing steps...done.                                                                                                                      

Update done!


Once that is done, quickly realize someone spent an obscene amount of time making that install as pretty as it was without using ncurses. Shout out to that geek.

Numero Tres Deliquentes

Time to create our Kubernetes cluster. Run this command and “it’s going to be LEGEND….Wait for it….

tuxninja@tldev1:~$ gcloud container clusters create tuxlabs-kubernetes                           
Creating cluster tuxlabs-kubernetes...done.                                                   
Created [].
kubeconfig entry generated for tuxlabs-kubernetes.
tuxlabs-kubernetes  us-west1-a  1.7.6-gke.1  n1-standard-1  1.7.6         3          RUNNING

And I hope you’re not lactose intolerant cause the second half of that word is DAIRY.” – NPH

Numero (Audi) Quattro

Now you should be able to see all running Kubernetes services in your cluster like so:

tuxninja@tldev1:~$ kubectl get --all-namespaces services
NAMESPACE     NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default       kubernetes             ClusterIP     <none>        443/TCP         15m
kube-system   default-http-backend   NodePort    <none>        80:31154/TCP    14m
kube-system   heapster               ClusterIP   <none>        80/TCP          14m
kube-system   kube-dns               ClusterIP    <none>        53/UDP,53/TCP   14m
kube-system   kubernetes-dashboard   ClusterIP   <none>        80/TCP          14m

And we can see the pods like so:

tuxninja@tldev1:~$ kubectl get --all-namespaces pods
NAMESPACE     NAME                                                           READY     STATUS    RESTARTS   AGE
kube-system   event-exporter-1421584133-zlvnd                                2/2       Running   0          16m
kube-system   fluentd-gcp-v2.0-1nb9x                                         2/2       Running   0          16m
kube-system   fluentd-gcp-v2.0-bpqtv                                         2/2       Running   0          16m
kube-system   fluentd-gcp-v2.0-mntjl                                         2/2       Running   0          16m
kube-system   heapster-v1.4.2-339128277-gxh5g                                3/3       Running   0          15m
kube-system   kube-dns-3468831164-5nn05                                      3/3       Running   0          15m
kube-system   kube-dns-3468831164-wcwtg                                      3/3       Running   0          16m
kube-system   kube-dns-autoscaler-244676396-fnq9g                            1/1       Running   0          16m
kube-system   kube-proxy-gke-tuxlabs-kubernetes-default-pool-6ede7d6a-nvfg   1/1       Running   0          16m
kube-system   kube-proxy-gke-tuxlabs-kubernetes-default-pool-6ede7d6a-pr82   1/1       Running   0          16m
kube-system   kube-proxy-gke-tuxlabs-kubernetes-default-pool-6ede7d6a-w6p8   1/1       Running   0          16m
kube-system   kubernetes-dashboard-1265873680-gftnz                          1/1       Running   0          16m
kube-system   l7-default-backend-3623108927-57292                            1/1       Running   0          16m

Numero Cinco (de Mayo)

You now have an active Kubernetes cluster. That is pretty sweet huh? Make sure you take the time to check out what’s running under the hood in the Google Compute Engine as well.

tuxninja@tldev1:~$ gcloud compute instances list
NAME                                               ZONE        MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP     STATUS
gke-tuxlabs-kubernetes-default-pool-6ede7d6a-nvfg  us-west1-a  n1-standard-1        RUNNING
gke-tuxlabs-kubernetes-default-pool-6ede7d6a-pr82  us-west1-a  n1-standard-1         RUNNING
gke-tuxlabs-kubernetes-default-pool-6ede7d6a-w6p8  us-west1-a  n1-standard-1       RUNNING

Ok, for our final act, I promised Nginx…sigh…Let’s get this over with!

Step 1, create this nifty YAML file:

apiVersion: apps/v1beta1
kind: Deployment
  name: nginx-deployment
      app: nginx
  replicas: 2 # tells deployment to run 2 pods matching the template
  template: # create pods using pod definition in this template
      # unlike pod-nginx.yaml, the name is not included in the meta data as a unique name is
      # generated from the deployment name
        app: nginx
      - name: nginx
        image: nginx:1.7.9
        - containerPort: 80

Save it as deployment.yaml, then apply it!

tuxninja@tldev1:~$ kubectl apply -f deployment.yaml 
deployment "nginx-deployment" created

We can describe our deployment like this:

tuxninja@tldev1:~$ kubectl describe deployment nginx-deployment
Name:                   nginx-deployment
Namespace:              default
CreationTimestamp:      Sun, 15 Oct 2017 07:10:52 +0000
Labels:                 app=nginx
Selector:               app=nginx
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=nginx
    Image:        nginx:1.7.9
    Port:         80/TCP
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   nginx-deployment-431080787 (2/2 replicas created)
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  3m    deployment-controller  Scaled up replica set nginx-deployment-431080787 to 2

And we can take a gander at the pods created for this deployment

tuxninja@tldev1:~$ kubectl get pods -l app=nginx
NAME                               READY     STATUS    RESTARTS   AGE
nginx-deployment-431080787-7131f   1/1       Running   0          4m
nginx-deployment-431080787-cgwn8   1/1       Running   0          4m

To see info about a specific pod run: 

tuxninja@tldev1:~$ kubectl describe pod nginx-deployment-431080787-7131f
Name:           nginx-deployment-431080787-7131f
Namespace:      default
Node:           gke-tuxlabs-kubernetes-default-pool-6ede7d6a-nvfg/
Start Time:     Sun, 15 Oct 2017 07:10:52 +0000
Labels:         app=nginx
       plugin set: cpu request for container nginx
Status:         Running
Created By:     ReplicaSet/nginx-deployment-431080787
Controlled By:  ReplicaSet/nginx-deployment-431080787
    Container ID:   docker://ce850ea012243e6d31e5eabfcc07aa71c33b3c1935e1ff1670282f22ac1d0907
    Image:          nginx:1.7.9
    Image ID:       docker-pullable://nginx@sha256:e3456c851a152494c3e4ff5fcc26f240206abac0c9d794affb40e0714846c451
    Port:           80/TCP
    State:          Running
      Started:      Sun, 15 Oct 2017 07:11:01 +0000
    Ready:          True
    Restart Count:  0
      cpu:        100m
    Environment:  <none>
      /var/run/secrets/ from default-token-gw047 (ro)
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-gw047
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations: for 300s
        for 300s
  Type    Reason                 Age   From                                                        Message
  ----    ------                 ----  ----                                                        -------
  Normal  Scheduled              5m    default-scheduler                                           Successfully assigned nginx-deployment-431080787-7131f to gke-tuxlabs-kubernetes-default-pool-6ede7d6a-nvfg
  Normal  SuccessfulMountVolume  5m    kubelet, gke-tuxlabs-kubernetes-default-pool-6ede7d6a-nvfg  MountVolume.SetUp succeeded for volume "default-token-gw047"
  Normal  Pulling                5m    kubelet, gke-tuxlabs-kubernetes-default-pool-6ede7d6a-nvfg  pulling image "nginx:1.7.9"
  Normal  Pulled                 5m    kubelet, gke-tuxlabs-kubernetes-default-pool-6ede7d6a-nvfg  Successfully pulled image "nginx:1.7.9"
  Normal  Created                5m    kubelet, gke-tuxlabs-kubernetes-default-pool-6ede7d6a-nvfg  Created container
  Normal  Started                5m    kubelet, gke-tuxlabs-kubernetes-default-pool-6ede7d6a-nvfg  Started container

Finally it’s time to expose Nginx to the Internet

tuxninja@tldev1:~$ kubectl expose deployment/nginx-deployment --port=80 --target-port=80 --name=nginx-deployment --type=LoadBalancer
service "nginx-deployment" exposed

Check the status of our service

tuxninja@tldev1:~$ kubectl get svc nginx-deploymentNAME               TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
nginx-deployment   LoadBalancer   <pending>     80:31867/TCP   20s

Note the EXTERNAL-IP is in a pending state, once the LoadBalancer is created, this will have an IP address.

tuxninja@tldev1:~$ kubectl get svc nginx-deployment
NAME               TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)        AGE
nginx-deployment   LoadBalancer   80:31867/TCP   1m
tuxninja@tldev1:~$ curl
<!DOCTYPE html>
<title>Welcome to nginx!</title>

And were all done, congratulations! 🙂

In Closing…

Kubernetes is cool as a fan, and setting it up on GCP is almost as easy as pressing the big EASY button. We have barely scraped the surface here so for continued learning I recommend buying Kubernetes Up & Running by Kelsey Hightower, Brendan Burns and Joe Beda. I would follow these folks on twitter, and in addition follow Kubernetes Co-Founder Tim Hockin as well as former Docker, Google, and now Microsoft employee/guru of all things containers Jessie Frazelle.

After you are done following these inspirational leaders in the community go to youtube and watch every Kelsey Hightower video you can find. Kelsey Hightower is perhaps the tech communities best presenter and no one has done more to educate and bring Kubernetes to the mainstream than Kelsey. So a quick shout out and thank you to Kelsey for his contributions to the community. In his honor here are two of my favorite videos from Kelsey. [ one ] [ two ].

Setting up Kubernetes to manage containers on the Google Cloud Platform Read More »

How To: Create An AWS Lambda Function To Backup/Snapshot Your EBS Volumes

AWS Lambda functions are a great way to run some code on a trigger/schedule without needing a whole server dedicated to it. They can be cost effective, but be careful depending on how long they run, and the number of executions per hour, they can be quite costly as well.

For my use case, I wanted to create snapshot backups of EBS volumes for a Mongo Database every day. I originally implemented this using only CloudWatch, which is a monitoring service, but because it’s focused on scheduling, AWS also uses it for other things that require scheduling/cron like features. Unfortunately, the CloudWatch implementation of snapshot backups was very limited. I could not ‘tag’ the backups, which was certainly something I needed for easy finding and cleanups later (past a retention period).

Anyway, there were a couple pitfalls I ran into when creating this function.


  1. Make sure you security group allows you to communicate to the Internet for any AWS API’s you need to talk to.
  2. Make sure your time-out is set to 1 minute or greater depending on your use case. The default is seconds, and is likely not high enough.
  3. “The Lambda function execution role must have permissions to create, describe and delete ENIs. AWS Lambda provides a permissions policy, AWSLambdaVPCAccessExecutionRole, with permissions for the necessary EC2 actions (ec2:CreateNetworkInterface, ec2:DescribeNetworkInterfaces, and ec2:DeleteNetworkInterface) that you can use when creating a role”
    1. Personally, I did inline permissions and included the specific actions.
  4. Upload your zip file and make sure your handler section is configured with the exact file_name.method_in_your_code_for_the_handler
  5. Also this one is more of an FYI, Lambda Function have a maximum TTL of 5 minutes ( 300 seconds).

I think that was it, after that everything worked fine. To finish this short article off, screenshots and the code!




And finally the code…

Function Code

# Backup cis volumes

import boto3

def lambda_handler(event, context):
    ec2 = boto3.client('ec2')

    reg = 'us-east-1'

    # Connect to region
    ec2 = boto3.client('ec2', region_name=reg)

    response = ec2.describe_instances(Filters=[{'Name': 'instance-state-name', 'Values': ['running']},
                                               {'Name': 'tag-key', 'Values': ['Name']},
                                               {'Name': 'tag-value', 'Values': ['cis-mongo*']},

    for r in response['Reservations']:
        for i in r['Instances']:
            for mapping in i['BlockDeviceMappings']:
                volId = mapping['Ebs']['VolumeId']

                # Create snapshot
                result = ec2.create_snapshot(VolumeId=volId,
                                             Description='Created by Lambda backup function ebs-snapshots')

                # Get snapshot resource
                ec2resource = boto3.resource('ec2', region_name=reg)
                snapshot = ec2resource.Snapshot(result['SnapshotId'])

                # Add volume name to snapshot for easier identification
                snapshot.create_tags(Tags=[{'Key': 'Name', 'Value': 'cis-mongo-snapshot-backup'}])

And here is an additional function to add for cleanup

import boto3
from datetime import timedelta, datetime

def lambda_handler(event, context):
    # if older than days delete
    days = 14

    filters = [{'Name': 'tag:Name', 'Values': ['cis-mongo-snapshot-backup']}]

    ec2 = boto3.setup_default_session(region_name='us-east-1')
    client = boto3.client('ec2')
    snapshots = client.describe_snapshots(Filters=filters)

    for snapshot in snapshots["Snapshots"]:
        start_time = snapshot["StartTime"]
        delete_time = - timedelta(days=days)

        if start_time < delete_time:
            print 'Deleting {id}'.format(id=snapshot["SnapshotId"])
            client.delete_snapshot(SnapshotId=snapshot["SnapshotId"], DryRun=False)

The end, happy server-lessing (ha !)


How To: Create An AWS Lambda Function To Backup/Snapshot Your EBS Volumes Read More »

How To: Use Spinnaker to deploy into AWS

Spinnaker is a tool created by Netflix (of whom I have always been a big fan) that succeeded Asgard a tool I used in my past at PayPal. Not to digress but my favorite companies when it comes to DevOps tools are Hashicorp and Netflix. Obviously, that’s a bit of apples and oranges there, but they both make solid DevOps tools…Moving on…

Here is a quick overview on Spinnaker’s GUI

If you are familiar with Jenkins, then Spinnaker will make a lot of sense to you. Spinnaker is all about configuring a pipeline with stages to automate/orchestrate a number of steps with regards to ‘continously’ deploying your application code to an environment. Spinnaker puts the CD in CI/CD 😉 Corny, but had to say it…

Moving on…

To use Spinnaker effectively, you need to use Jenkins with it. Jenkins is responsible for your git / bake phase, where code is downloaded and then launched on a VM in your environment i.e. AWS. At the end of that launch assuming everything works out, a snapshot is taken and an AMI is created to be use in subsequent steps for install. A typical pipeline looks like this..

  1. Grab the latest checkin from Git repo
  2. Build a package rpm/deb of your application/code (+ dependencies)
  3. Install the package above, aka Bake the code on a VM in AWS, then take a snapshot ( create AMI)
  4. Subsequently deploy your code to compute / includes LB setup if there is any.
    • Should also mention Spinnaker automatically sets up ASG (auto-scaling groups) which is a nice feature ( if a machine dies its re-created based on the capacity you set/require)
  5. And finally if you need a cleanup task such as “destroy the boxes running the old code” that runs.

Visually pipeline configuration, it looks like this…

Notice the shrink cluster step. This is one of Spinnaker’s built in stages that you can use. However, there is a better way to handle this vs. manually creating the ‘cleanup’ phases after a deploy. You can instead employ what are called strategies…

For example, if you click Deploy in the pipeline to go to the configuration for that…. and then click on edit under a Server Group you have created under Deploy Configuration…

You will get a new window popup that looks like this… .

As you can see here I have clicked into Strategy, and am currently using Highlander which states ‘Destroys all previous server groups in the cluster as soon as new server group passes health checks’.

The highlander strategy is extremely useful for rolling out new builds. Essentially your old code will run, until your new code is healthy at which point the old server groups is destroyed. Assuming the new build is healthy (i.e. your health checks and all previous build tests etc have good coverage) you should be good to go to the shortest amount of time possible. I tried all sorts of customizations to my pipeline to emulate the above behavior without using the strategy and found that the highlander strategy is the fastest.

Anyway, your use cases may be different depending on the type of pipeline you are configuring. So take some time to familiarize yourself with the availability strategies depending on your specific use case.

Now I kinda jumped ahead a few chapters, but I did that to make the point quick before you lose interest, that Spinnaker will deploy your code, it will make sure your nodes are always running, and it can do this all automatically once configured…

What I mean is if we go backwards now to the original configuration step let’s look at your Configuration step in the pipeline…

Under the section called “Automated Triggers” I have added some configuration to listen to a Jenkins server for changes on a Job that I have defined in Jenkins. I am not going to go into much detail on Jenkins because this article would be too longer, but just to show this job in Jenkins very quickly for a full understanding.

This is a screenshot of a job that polls Git every minute for changes. So with these configured, Spinnaker will Listen for changes to our specified repo via Jenkins. Subsequently, if changes are detected Spinnaker will proceed with the rest of the pipeline configuration. The next step is Build.

The first step ‘Configuration’ detects changes to our git repo. The second step Build, turns those changes into an installable package again utilizing Jenkins to do this. Here we tell spinnaker to run our build job in Jenkins…let’s take a look at that in Jenkins…first we do our git configuration, and nothing under build triggers.

Then if necessary we can inject files during build time for packaging..

To make use of the injected files we must copy them from a variable to the host using a build step to execute a shell…

We then run our actual packaging script that turns it into an rpm/deb etc. Mine is called ‘’ and looks like this…

And finally there is magic… The last line of our executed shell build step uploads our package to S3….

deb-s3 upload --bucket global-s3-prod-spinnaker --prefix spinnaker-ubuntu-mirror --arch amd64 --codename trusty --preserve-versions true ./collector/*.deb

And the last step in our Jenkins is to archive the artifacts.

The end of the output of this Job when run looks like this.

Next Spinnaker goes to Bake, this is the step where the package gets installed to a VM “baked” and then snapshotted and turned to an AMI for the Deploy step.

The Bake step in Spinnaker is the most straight forward…

Essentially, as long as your package name ‘ciscollector’ as shown here matches the base name of the package you are creating, Spinnaker (is already configured to look in our S3 bucket) will find it and install it on a VM, at the end it will snapshot that VM and created an AMI to use to install in the subsequent Deploy stage of the pipeline.

Finally, we are back to the Deploy step. The deploy step depends on the bake step, just like the bake step depends on build and build depends on Configuration. This is how we create the workflow of our pipeline using ‘depends on’ ( sorry for not explaining that sooner ). So when Build completes successfully we start our Deploy step, which as shown before, has a server group created. Of course you will not, so you must create a new server group, which is how Spinnaker manages/groups servers & load balancers for deployments.

The deploy steps requires that you fill out the following sections with your specific configuration…

As discussed previously under Basic Settings it is best to pick a strategy as part of the Deploy stage rather then trying to do many custom/complicated things yourself. So rather then screenshot through this… it’s pretty self-explanatory to configured Load Balancers, Security Groups, Instance Type etc… The one thing I would highlight is the capacity section and also tagging under Advanced Settings.

The capacity section is important for 2 reasons. First whatever you set for number of instances, turns into an auto-scaling group requirement, thus whatever you set there, if a node is killed or lost for any reason, your auto-scaling group in AWS will make sure you always have that number of nodes running. This is helpful if nodes are dying for whatever reason, although not super helpful if they continually die quickly due to misconfiguration, etc. The second part about consider deployments successful when % of instances are healthy is very important for the speed of your pipeline. Let’s say you have a pool of 16 servers… if all those servers run the same image, the chances that you need to wait until 100% here are slim, for example lets say you run 4 servers, and you only need 3 servers to service 100% of capacity for request. In this case it would be acceptable to move on from this Deploy step at 75% capacity, because waiting for the last 25% isn’t really necessary to service request and you are 99% sure that last host is going to come up. So feel free to tweak.

Last as mentioned, make sure you take the opportunity to tag things in the Advanced Settings sections.

Another highlight, is this is where you would inject userdata to Ec2 instances (under advanced settings). This is helpful when you want to past at-boot-time configuration to a system.  For example, you might want to override a configuration file at boot time and say something like if I get this value in user data, override the config, if I don’t use the config. Just remember to pass your userdata as base64.

I want to leave you with some final comments. Spinnaker is a great deployment tool. It is helpful to use the pipeline/stage workflow for reproducing deployments, however, there are many limitations that Spinnaker has, and it will always lag on feature set parity with providers like AWS. As an example, the ALB support is quite limited at the time of this writing. So for that I had to add a custom Jenkins step that runs a script on the Jenkins server to manually add nodes to target groups for an ALB. If you are interested or need more details on that solution email me at

I hope you found this brief overview on Spinnaker useful. It’s a great tool that can be used to easily reproduce application deployments.

How To: Use Spinnaker to deploy into AWS Read More »

MongoDB data loss avoided courtesy of AWS EBS & Snapshots

Cross Region MongoDB Across A Slow Network (Napster) Bad, AWS Snapshots (Metallica) Good!

I recently found myself in a bit of a pickle. My team and I had deployed a 3 node MongoDB cluster configured as two nodes in us-east-1 and one node in us-west-2 to maximize our availability while minimizing cost. Ultimately, there were two problems with this approach. The first is that for reasons mostly outside of our control the rest of our application stack above the database was deployed in us-east-1 drastically reducing any availability benefit the tertiary node in us-west-2 was buying us. Additionally, we were not aware at the time we made this choice, but our cluster/replication traffic was going across a VPN with very limited bandwidth that frequently suffered network partitions due to network maintenance and a lack of redundancy. We found our MongoDB cluster failing over frequently due to losing communication with it’s members and when it did our cluster had a difficult time recovering because replication couldn’t catch up across the VPN.

After restarting Mongo several times, including removing the data directory and starting over fresh, ultimately replication was going to take days to sync, and we could not afford to wait that long. We needed to restore the cluster health ASAP so we could move all nodes to us-east-1 mitigating our network issue with our VPN that was introducing so much pain.

Now the system I am referring to is production, it cannot lose data, and it cannot take downtime/a maintenance. Given these constraints I started googling ways to catch up your MongoDB, when it will not catch up on it’s own. I tried some things I found like rsync etc, before realizing it wasn’t any faster across that slow VPN link. Ultimately, I decided I was going to try a snapshot. Now the document I read warned me that a live snapshot may result in potentially inconsistent data, but again I had to try it given the constraints I mentioned before. I had few options. In the end as it turns out, it worked perfectly and in under an hour I had my entire cluster healthy. Using the AWS CLI utility, here is how I did it…

Step 1 take the snapshot of the healthy node

I actually took the snapshot in the GUI at first… so not shown here, but for the record to create a snapshot, go to your volume under Ec2 Volumes and click actions then create snapshot and save the snapshot ID. (Or alternatively do it with the CLI like I did for everything else).

Step 2, copy the snapshot from your source region to your destination region

aws --region us-east-1 ec2 copy-snapshot --source-region us-west-2 --source-snapshot-id snap-01f185929341abd3b --description "cis-mongo-prod-3-snapshot-05-19-2017"

Make sure you copy to your clipboard the snapshot ID returned…

Step 3, Create a new volume from the copied snapshot

aws ec2 create-volume --size 300 --region us-east-1 --availability-zone us-east-1d --volume-type gp2 --snapshot-id snap-085b986dae85dfed1


"AvailabilityZone": "us-east-1c",
"Encrypted": false,
"VolumeType": "gp2",
"VolumeId": "vol-0fa49fde34e88a1c6",
"State": "creating",
"Iops": 900,
"SnapshotId": "snap-085b986dae85dfed1",
"CreateTime": "2017-05-19T20:37:33.304Z",
"Size": 300

Step 4, Attach the volume to the system

aws ec2 attach-volume --volume-id vol-0fa49fde34e88a1c6 --instance-id i-0717cd609275fdbef --device /dev/sdc

Oh No We Got An Error!

An error occurred (InvalidVolume.ZoneMismatch) when calling the AttachVolume operation: The volume 'vol-0fa49fde34e88a1c6' is not in the same availability zone as instance 'i-0717cd609275fdbef'

Ah ok, simple fix, we created the volume in a different AZ than the node we were attaching to.

(delete the old volume) Then…

Step 5, create a new volume from the snapshot, but this time specify the same AZ (us-east-1b instead of us-east-1c) as the node we wish to attach it to

aws ec2 create-volume --size 300 --region us-east-1 --availability-zone us-east-1b --volume-type gp2 --snapshot-id snap-085b986dae85dfed1

Step 6, try attaching the new volume (cross your fingers)

aws ec2 attach-volume --volume-id vol-095cc214c8a5e74e0 --instance-id i-0717cd609275fdbef --device /dev/sdc


"AttachTime": "2017-05-19T21:58:07.586Z",
"InstanceId": "i-0717cd609275fdbef",
"VolumeId": "vol-095cc214c8a5e74e0",
"State": "attaching",
"Device": "/dev/sdc"

Sweet it worked…Now it’s time to do some work on the node we attached this volume to.

Step 7, check if the new attachment is visible to the system

[root@ip-10-5-0-149 mongo]# lsblk
xvda 202:0 0 10G 0 disk
└─xvda1 202:1 0 10G 0 part /
xvdb 202:16 0 300G 0 disk /mnt
xvdc 202:32 0 300G 0 disk
[root@ip-10-5-0-149 mongo]#

Yup sure, is, we can see our device ‘xvdc’ is a 300G disk that has no mount point. We can also see ‘xvdb’ which is our original mongo data mount, mounted under /mnt.

Step 8, create mount point and mount the new device

[root@ip-10-5-0-149 mongo]# mkdir /mnt/mongo2
[root@ip-10-5-0-149 mongo]# mount /dev/xvdc /mnt/mongo2
[root@ip-10-5-0-149 mongo]#
[root@ip-10-5-0-149 mongo]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 9.8G 5.6G 3.9G 60% /
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 1.6G 15G 10% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/xvdb 296G 65M 281G 1% /mnt
none 64K 4.0K 60K 7% /.subd/tmp
tmpfs 3.2G 0 3.2G 0% /run/user/11272
/dev/xvdc 296G 12G 269G 5% /mnt/mongo2
[root@ip-10-5-0-149 mongo]#

Step 9, shutdown Mongo if it’s running

[root@ip-10-5-0-149 mongo]# service mongod stop

Step 10, copy the snapshot data, to the existing MongoDB data directory

[root@ip-10-5-0-149 mongo]# pwd
[root@ip-10-5-0-149 mongo]# ls
[root@ip-10-5-0-149 mongo]# cp -r /mnt/mongo2/* .

Step 11, fix permissions for the copied data

[root@ip-10-5-0-149 mongo]# chown mongod:mongod /mnt/mongo -R

NOTE: Do not forget this step or you will get errors starting the MongoDB service

Step 12, start Mongo back up

[root@ip-10-5-0-149 mongo]# service mongod start
Starting mongod (via systemctl): [ OK ]
[root@ip-10-5-0-149 mongo]#

Step 13, Check Mongo Cluster Status

[root@ip-10-5-0-149 mongo]# mongo
cisreplset:SECONDARY> rs.status()
"set" : "cisreplset",
"date" : ISODate("2017-05-19T22:17:00.483Z"),
"myState" : 2,
"term" : NumberLong(111),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
"_id" : 0,
"name" : "ip-10-5-0-149:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 6,
"optime" : {
"ts" : Timestamp(1495222912, 3),
"t" : NumberLong(110)
"optimeDate" : ISODate("2017-05-19T19:41:52Z"),
"configVersion" : 3,
"self" : true
"_id" : 1,
"name" : "",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 4,
"optime" : {
"ts" : Timestamp(1495230671, 1033),
"t" : NumberLong(111)
"optimeDate" : ISODate("2017-05-19T21:51:11Z"),
"lastHeartbeat" : ISODate("2017-05-19T22:16:56.263Z"),
"lastHeartbeatRecv" : ISODate("2017-05-19T22:16:59.620Z"),
"pingMs" : NumberLong(1),
"syncingTo" : "",
"configVersion" : 3
"_id" : 2,
"name" : "",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 3,
"optime" : {
"ts" : Timestamp(1495232212, 24),
"t" : NumberLong(111)
"optimeDate" : ISODate("2017-05-19T22:16:52Z"),
"lastHeartbeat" : ISODate("2017-05-19T22:16:56.516Z"),
"lastHeartbeatRecv" : ISODate("2017-05-19T22:16:58.751Z"),
"pingMs" : NumberLong(84),
"electionTime" : Timestamp(1495225273, 1),
"electionDate" : ISODate("2017-05-19T20:21:13Z"),
"configVersion" : 3
"ok" : 1

For contrast, here is what it looked like before, pay close attention to node/member

cisreplset:PRIMARY> rs.status()
"set" : "cisreplset",
"date" : ISODate("2017-05-19T22:00:07.185Z"),
"myState" : 1,
"term" : NumberLong(111),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
"_id" : 0,
"name" : "ip-10-5-0-149:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2017-05-19T22:00:06.570Z"),
"lastHeartbeatRecv" : ISODate("2017-05-19T21:49:39.839Z"),
"pingMs" : NumberLong(82),
"lastHeartbeatMessage" : "Connection refused",
"configVersion" : -1
"_id" : 1,
"name" : "",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2138,
"optime" : {
"ts" : Timestamp(1495228916, 7491),
"t" : NumberLong(111)
"optimeDate" : ISODate("2017-05-19T21:21:56Z"),
"lastHeartbeat" : ISODate("2017-05-19T22:00:05.507Z"),
"lastHeartbeatRecv" : ISODate("2017-05-19T22:00:05.358Z"),
"pingMs" : NumberLong(83),
"syncingTo" : "",
"configVersion" : 3
"_id" : 2,
"name" : "",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1200895,
"optime" : {
"ts" : Timestamp(1495231207, 1111),
"t" : NumberLong(111)
"optimeDate" : ISODate("2017-05-19T22:00:07Z"),
"electionTime" : Timestamp(1495225273, 1),
"electionDate" : ISODate("2017-05-19T20:21:13Z"),
"configVersion" : 3,
"self" : true
"ok" : 1

Now that our DB is verified healthy it’s time to cleanup.

Step 14, clean our now unnecessary waste ( and thank the gods)

Umount & Delete

[root@ip-10-5-0-149 mongo]# unmount /mnt/mongo2/
[root@ip-10-5-0-149 mongo]# rm -rf /mnt/mongo2/

Detach Volume

(env) ➜ ~ aws ec2 detach-volume --volume-id vol-0fa49fde34e88a1c6
"AttachTime": "2017-05-19T20:43:29.000Z",
"InstanceId": "i-0d535ee1cdfd79073",
"VolumeId": "vol-0fa49fde34e88a1c6",
"State": "detaching",
"Device": "/dev/sdc"
(env) ➜ ~

Delete Volume & Snapshots


(env) ➜ ~ aws ec2 delete-volume --volume-id vol-095cc214c8a5e74e0
(env) ➜ ~ aws ec2 delete-snapshot --snapshot-id snap-085b986dae85dfed1
(env) ➜ ~ aws ec2 delete-snapshot --snapshot-id snap-01f185929341abd3b --region us-west-2

When I ran into this issue and googled around a bit, I really didn’t find anyone with a detailed account of how they got out of it. Thus I was inspired by the opportunity to help others in the future and the result is this post. I hope it finds someone, someday, facing a similar scenario and graciously lifts them out of the depths! Godspeed, happy clouding.

MongoDB data loss avoided courtesy of AWS EBS & Snapshots Read More »