2016

Storing passwords securely using Pass (GPG)

Today we live in an endless sea of passwords, which are a very inefficient and ineffective means of securing our data & environments. Many companies are trying to solve this problem using a variety of techniques that all revolve around various forms of multi-factor authentication.

However, in the mean time were all screwed 😉

Just kidding. Quick PSA though, use two factor authentication at a minimum everywhere you can ESPECIALLY your email, since it’s used for password recovery on other sites. Ok then moving on…

There are many password managers like LastPass and 1Password, which do a fairly effective job at providing convenience and prevent you from scribbling down your passwords on paper (STOP IT !!!). However, I personally can’t get passed the whole ‘store all my passwords in one super secure vault on the Internet’ thing. To be fair some of these password managers can be downloaded on your machine and ran locally, but there are two other drawbacks to those I found.

  1. Some of them are not free and…
  2. Some of them have ugly and clunky UI’s

So what do I like/use then ? I use something called ‘pass’. Which is a command line utility that wraps GPG. The reason I use it is because…

  1. I love using command line utilities over GUI, I find it far more convenient and…
  2. I was going to write this exact utility (a GPG wrapper) until I found out someone else did and…
  3. Because I like GPG.

At most of the organizations I have worked at, password management was done poorly i.e. everyone used different approaches and there was no governance or oversight. I hope with this article to make folks aware of what I feel is a simple, effective method that every unix savvy administrator should use.

FYI Pass provides migration scripts from the most popular password manager tools on their website.

Introducing Pass

From the Pass site “Password management should be simple and follow Unix philosophy. With pass, each password lives inside of a gpg encrypted file whose filename is the title of the website or resource that requires the password. These encrypted files may be organized into meaningful folder hierarchies, copied from computer to computer, and, in general, manipulated using standard command line file management utilities.”

Where Can You Get or Learn More About Pass ? 

https://www.passwordstore.org/

Installing Pass

Depending on your operating system there are various ways to install

Ubuntu/Debian

sudo apt-get install pass

Fedora / RHEL

sudo yum install pass

Mac

brew install pass
echo "source /usr/local/etc/bash_completion.d/password-store" >> ~/.bashrc

Since I already installed pass on my Mac a while back I will be installing it on a Docker container with Ubuntu 16.04.

root@0b415380eb80:/# apt-get install -y pass

After pass successfully installs, try running it

root@0b415380eb80:/# pass
Error: password store is empty. Try "pass init".
root@0b415380eb80:/#

Well that is pretty straight forward, it appears we need to initiliaze the db.

root@0b415380eb80:/# pass init
Usage: pass init [--path=subfolder,-p subfolder] gpg-id...
root@0b415380eb80:/#

Looks like we need to provide ‘key’…can that be just anything?

root@0b415380eb80:/# pass init "tuxlabs Password Key"
mkdir: created directory '/root/.password-store/'
Password store initialized for tuxlabs Password Key
root@0b415380eb80:/# pass
Password Store
root@0b415380eb80:/#

Now our password store looks initialized ! Let’s try inserting a password into the DB !

root@0b415380eb80:/# pass insert Gmail/myemail
mkdir: created directory '/root/.password-store/Gmail'
Enter password for Gmail/myemail:
Retype password for Gmail/myemail:
gpg: tuxlabs Password Key: skipped: No public key
gpg: [stdin]: encryption failed: No public key
root@0b415380eb80:/#

Uh oh what happened ? Well remember I said it uses GPG, and we not only don’t have a gpg key setup in our Docker container, but we initialized our Pass DB without using a GPG Key (the whole point) !

root@0b415380eb80:/# gpg --list-keys
root@0b415380eb80:/#

To remedy this we need to create a GPG key

Creating your GPG Key

root@0b415380eb80:/# gpg --gen-key
gpg (GnuPG) 1.4.20; Copyright (C) 2015 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Please select what kind of key you want:
   (1) RSA and RSA (default)
   (2) DSA and Elgamal
   (3) DSA (sign only)
   (4) RSA (sign only)
Your selection? 1
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 4096
Requested keysize is 4096 bits
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0)
Key does not expire at all
Is this correct? (y/N) y

You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"

Real name: Tuxninja
Email address: tuxninja@tuxlabs.com
Comment: TuxLabs
You selected this USER-ID:
    "Tuxninja (TuxLabs) <tuxninja@tuxlabs.com>"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
You need a Passphrase to protect your secret key.

gpg: gpg-agent is not available in this session
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
............+++++
...................+++++
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
...+++++
+++++
gpg: key 5B2F89A5 marked as ultimately trusted
public and secret key created and signed.

gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
pub   4096R/5B2F89A5 2016-12-14
      Key fingerprint = 5FF6 1717 4415 03FF D455  7516 CF8E 1BDC 5B2F 89A5
uid                  Tuxninja (TuxLabs) <tuxninja@tuxlabs.com>
sub   4096R/EF0F232F 2016-12-14

root@0b415380eb80:/#

To view your GPG key run

root@0b415380eb80:/# gpg --list-keys
/root/.gnupg/pubring.gpg
------------------------
pub   4096R/5B2F89A5 2016-12-14
uid                  Tuxninja (TuxLabs) <tuxninja@tuxlabs.com>
sub   4096R/EF0F232F 2016-12-14

root@0b415380eb80:/#

Now we can see we have one GPG key, with the ID 5B2F89A5

Let’s try re-initializing Pass. 

root@0b415380eb80:/# pass init "5B2F89A5"
Password store initialized for 5B2F89A5
root@0b415380eb80:/#

But we have a problem, re-initializing Pass doesn’t get rid of our previous insert into the db. As you can see here our Pass DB is effectively corrupt.

root@0b415380eb80:~# pass
Password Store
`-- Gmail
root@0b415380eb80:~# pass rm Gmail
Are you sure you would like to delete Gmail? [y/N] y
rm: cannot remove '/root/.password-store/Gmail': Is a directory
root@0b415380eb80:~# pass rm Gmail/myemail
Error: Gmail/myemail is not in the password store.
root@0b415380eb80:~#

Hmmm, what’s a guy to do….

root@0b415380eb80:~# rm -rf .password-store/Gmail/
root@0b415380eb80:~# pass
Password Store
root@0b415380eb80:~#

Yes it really was that simple, and that is one more reason why I love pass.

You can also initialize your password store using git for version control, see the passwordstore.org website for more info !

Now let’s insert some good stuff.

Inserting A Password into Pass

root@0b415380eb80:~# pass insert Gmail/myemail
Enter password for Gmail/myemail:
Retype password for Gmail/myemail:
root@0b415380eb80:~# pass
Password Store
`-- Gmail
    `-- myemail
root@0b415380eb80:~#

That seems to have worked. Let’s try to retrieve the pass.

Retrieving A Password In Pass

root@0b415380eb80:~# pass Gmail/myemail
gpg: starting migration from earlier GnuPG versions
gpg: porting secret keys from '/root/.gnupg/secring.gpg' to gpg-agent
gpg: migration succeeded
testpass
root@0b415380eb80:~# pass Gmail/myemail
testpass
root@0b415380eb80:~#

Note, I retrieve the password twice using my GPG Passsword (You will be prompted through a curses interface to enter your passphrase). Then I run it again, because of the initial GPG migration messages just to show how it would normally work after you’ve used GPG once with Pass.

Now let’s say someone is standing over your shoulder, you want to access your passsword, but you don’t want them to see it. You can get it straight to your clipboard by using -c.

Copying Passwords To Your Clipboard

pass -c Gmail/myemail
Copied Gmail/myemail to clipboard. Will clear in 45 seconds.

Docker Issue ?

Notice the prompt is not included in the above example ? That is cause it didn’t actually work. Apparently, it doesn’t work in Docker due to not having display dependencies installed/configured. So what I show above is the output from my mac…but my actual Docker related error was.

root@0b415380eb80:~# pass -c Gmail/myemail
Error: Can't open display: (null)
Error: Could not copy data to the clipboard
root@0b415380eb80:~#

There might be an easy way to fix this (like install X), but I don’t usually use Docker for storing my passwords I just happen to be using it for this tutorial, so moving on !

Folders

It’s also important to note that Pass supports folder structures, as shown in my example I am creating a ‘Gmail’ folder and placing a password file called ‘myemail’ with my password in it. In reality I recommend not naming the file after your account/email and using the multiline version to encrypt those details as well. That way you can just stick to the site name for the name of the encrypted file in whatever folder or in the top level of Pass.

Multiline Encrypted Files with Pass

A common use case with Pass is adding an entire encrypted file so you can store more than just a password…

root@0b415380eb80:~# pass insert -m tuxlabs/databases
mkdir: created directory '/root/.password-store/tuxlabs'
Enter contents of tuxlabs/databases and press Ctrl+D when finished:

this is an example of a multiline
encrypted file
this way you can store more than just a password you can store user/pass/url etc
root@0b415380eb80:~# pass
Password Store
|-- Gmail
|   `-- myemail
`-- tuxlabs
    `-- databases
root@0b415380eb80:~#

Again retrieving it is as easy as..

root@0b415380eb80:~# pass tuxlabs/databases
this is an example of a multiline
encrypted file
this way you can store more than just a password you can store user/pass/url etc
root@0b415380eb80:~#

Finally if you no longer want the info to be stored in Pass…

If you want to copy you password to the clipboard from a multiline file, you must store your password on the first line of the file !

Deleting An Entry In Pass

root@0b415380eb80:~# pass rm Gmail/myemail
Are you sure you would like to delete Gmail/myemail? [y/N] y
removed '/root/.password-store/Gmail/myemail.gpg'
root@0b415380eb80:~# pass rm tuxlabs/databases
Are you sure you would like to delete tuxlabs/databases? [y/N] y
removed '/root/.password-store/tuxlabs/databases.gpg'
root@0b415380eb80:~# pass
Password Store
root@0b415380eb80:~#

Another thing, the output on my mac is much prettier than this `– thing I am getting in the Ubuntu Docker container… Not sure if that’s an Ubuntu issue or Docker, but on the Mac the output is much prettier, which can be seen on the passwordstore.org home page.

So that’s it, Pass is pretty straight forward, easy to work with, depends on GPG security and that is why I like it.

Stay secure, until next time !

 

Storing passwords securely using Pass (GPG) Read More »

Setting up Netflix’s Edda (CMDB) in AWS on Ubuntu

If you are running any kind of environment with greater than 10 servers, than you need a CMDB (Configuration Management DataBase). CMDB’s are the brain of your fleet & it’s environment. You can store anything in a CMDB, but commonly the metadata in CMDB’s consists of any of the following physical & digital asset inventory, software licenses, software configuration data, policy information, relationships (I.E. This VM—> Compute –> Rack –> Availability Zone –> Datacenter), automation metadata, and more… they also commonly provide change history for changes in your environment.

In the world of infrastructure as code, CMDB is king.

CMDB’s enable endless automation possibilities, without them you are stuck gathering and collecting ‘current’ configuration state about your infrastructure every time you want perform an automated change or run an audit/report . In my career I have built or been a part of CMDB efforts at nearly every company I have worked for. They are simply necessary, and by their nature they tend to require the choice of ‘built by us’ vs ‘buy or run’.

However, if you have the luxury of only running in AWS, you are in luck, because Netflix (The AWS poster child)  open sourced Edda in 2012 for this purpose!

Rather than talk about the specific features of Edda refer to the blog post or documentation, I want to keep this article short and jump right into setting up Edda, which is a bit tricky, because the documentation is out of date!

Setting Up Edda (2016)

First, in AWS you need setup an EC2 VM that has at least.. 6G for OS + dependencies including Mongo, and then however much disk you need to store the metadata for your environment (keep in mind it keeps change history). Personally I just created a root partition with 100G to keep things simple. For instance type I used ‘m4.xlarge’ and the Ubuntu version is 14.04.

After booting the VM, SSH to it and create a directory wherever your storage is allocated partition wise to store Edda & it’s dependencies. I will be using /cmdb/ in my example.

Initial Install Steps

mkdir /cmdb
cd /cmdb
export JAVA_OPTS="-Xmx1g -XX:MaxPermSize=256M"
git clone https://github.com/Netflix/edda.git
sudo add-apt-repository -y ppa:webupd8team/java &> /dev/null
sudo apt-get update
sudo debconf-set-selections <<< 'oracle-java8-installer shared/accepted-oracle-license-v1-1 boolean true'
sudo apt-get install -y oracle-java8-installer
sudo apt-get install -y scala
sudo apt-get install make

cd /cmdb/edda
make build

For the record, the Edda Wiki has the build steps wrong, it appears they no long are using Gradle, but have switch to SBT… which reminds me be aware Edda is written in Scala, which isn’t as popular as Java, Python etc… in addition it’s functional programming, which I don’t personally know a lot about, but I hear it’s got quite the learning curve..so beware if you need to make custom code changes, I would not recommend it, unless you know Scala ! 🙂

After the build of Edda succeeds, install Mongo

apt-get install -y mongodb

That’s it for dependencies

Configuring Mongo

For Edda to use Mongo all we need to do is ‘use’ the database we want to use for Edda & create an associated user. (Mongo will auto-create DB’s upon insert).

mongo

> use edda
> db.addUser({user:'edda',pwd:'t00t0ri4l',Roles: { edda: ['readWrite']}, roles: []})

You can test the user is working by doing… 

$ mongo edda -u edda -p
MongoDB shell version: 2.4.9
Enter password:
connecting to: edda
Server has startup warnings:
Sat Dec 10 00:53:21.093 [initandlisten]
Sat Dec 10 00:53:21.094 [initandlisten] ** WARNING: You are running on a NUMA machine.
Sat Dec 10 00:53:21.094 [initandlisten] **          We suggest launching mongod like this to avoid performance problems:
Sat Dec 10 00:53:21.094 [initandlisten] **              numactl --interleave=all mongod [other options]
Sat Dec 10 00:53:21.094 [initandlisten]
>

Configuring Edda

Under /cmdb/edda/src/main/resources we need to modify ‘edda.properties’ with valid config values for accounts, regions & mongo access.

Relevant Mongo Values

edda.mongo.address=127.0.0.1:27017
edda.mongo.database=edda
edda.mongo.user=edda
edda.mongo.password=t00t0ri4l

Account & Region Values 

edda.accounts=dev.us-east-1
edda.dev.us-east-1.region=us-east-1
edda.dev.us-east-1.aws.accessKey=fakeaccesskey
edda.dev.us-east-1.aws.secretKey=fakesecret

The above example is using one account and only one region. The Edda configuration uses generic labels, they are very flexible, but when using them you might be confused by the name of the label as it’s intent. Don’t fall into that trap, I did, and then I found this post on Google Groups… Check it out to gain more insight on how the configuration works and can be tweaked for  your needs. There is also the standard documentation, but it’s a little light IMO.

Running Edda

Congrats you made it, time to run Edda ! Again the documentation has this wrong (listed as gradle & Jetty)…instead were using SBT + Jetty…

$ cd /cmdb/edda/
$ ./project/sbt
> jetty:start

If everything goes smoothly you will start to see logs about crawling AWS API’s spewing to your screen 🙂 After about 2 minutes you should see data. You can check by doing a curl.

curl http://127.0.0.1:8080/api/v2/view/instances

This API URL should return a JSON object with instance ID’s for the account & region specified.

Additionally, Edda is listening on whatever private IP address you have setup, you will just need to modify the default security group to allow 8080 on your machine.

I get a bit frustrated with out of date documentation..so I hope this helps ! Happy automating !

Setting up Netflix’s Edda (CMDB) in AWS on Ubuntu Read More »

AWS, Google Cloud, Azure, and the centralized future of the Internet

I have just left AWS re-invent and I wanted to give my brief thoughts on the future of cloud computing. I believe in the next few years the shift we have been witnessing will be completed. That is to say that the thousands of enterprises and small businesses alike will finish their migrations to public clouds, simply because the benefits are far too great. Less people, less hardware, less glue code, more functionality, more value etc. AWS will be the dominant public cloud for the next couple of years minimum due to their first to market advantage and If you look at the announcements at re-invent 2016, you see a series of products that solve common problems. In fact a lot of the “innovations” AWS announced today, replace many SaaS solutions who ironically (or maybe not so much) are hosted on AWS. None of this concerns me, this is great disruption. AWS is teaching these businesses to move even further up and away from creating tooling for DevOps as products (they will take care of that) and focus on products that provide value differently, like sifting through massive amounts of data, increasing the quality and providing intelligence from that data. This is all great, and it’s definitely where things are headed, kudos to Amazon for guiding folks.

BUT here is what is disturbing to me, and it’s seems like no one talks about it. It’s as if they can’t see the elephant in the room.

The elephant in the room is that every company in the world is converging on a fewer number/types of physical devices, paths, datacenters etc. This means the global failure domains that should be distributed in nature are actually becoming more centralized therefore the risk of massive security or availability (outage) events is higher.

If you really think about it, Cloud was always the return of Utility computing (mainframes) etc. and as we go down this journey it’s becoming more evident it’s simply a more distributed version of mainframe, and in my opinion, at the moment that is giving people a false sense of comfort.

Early in my career almost 20 years ago, I was working at an ISP, and one of the core services for the Internet (DNS) was directly attacked. There was of course a widespread failure and what we soon realized was the Internet had more of a shared fate than most believed.

Fast forward to present day, and it just happened again with Dyn who hosted DNS for some very critical companies. This problem hasn’t been solved, it is getting worse.

This is the same problem we are going to have with AWS, Google Cloud and Azure.

As companies & governments converge on datacenters, and those datacenters connect to common interconnected fabrics (aka the Internet itself) and resources.

The Internet is becoming far more grouped…far more shared & central…and thus the shared fate of the Internet will lie solely on the shoulders of giants or as we like to call them in our industry monoliths.

Public cloud monoliths, monopolies…etc

Perhaps my worries will be mitigated by fantastic diversification and investment in truly distributed, distinct network paths, independent power plants, etc…BUT my fear is the convergence is happening so fast, the providers won’t be able to make that a reality fast enough and what’s the incentive for them ? They have to invest a tremendous amount of capital when they are already successful and this problem has not publicly and visibly humiliated us yet. But I fear that it will in the next few years…

So Godspeed to journey men of the cloud, as we enjoy the luxuries that AWS, Azure, & Google Cloud offer us. We are entering a beautiful and dangerous time. Beware, and hedge your company & product by distributing it as much as you can to avoid these central dependencies. Avoid these massive shared, global failure domains and ensure you diversified to avoid increased security risk.

AWS, Google Cloud, Azure, and the centralized future of the Internet Read More »

Python & The Jira Rest API

Recently, while having to work a lot more in Jira than normal I got annoyed with the Jira Web GUI. So I wrote a script to do simple management of our jira issues.

Here is the basic usage

(env) ➜  jira git:(master) ✗ ./jira-ctl.py
usage: jira-ctl.py [-h] [-lp] [-li LIST_PROJECT_ISSUES] [-ua UPDATE_ASSIGNEE]
                   [-ud UPDATE_DATE] [-usts UPDATE_STATUS]
                   [-usum UPDATE_SUMMARY]

optional arguments:
  -h, --help            show this help message and exit
  -lp, --list-projects  List all projects
  -li LIST_PROJECT_ISSUES, --list-issues LIST_PROJECT_ISSUES
                        List Issues for a specific project
  -ua UPDATE_ASSIGNEE, --update-assignee UPDATE_ASSIGNEE
                        Update an issues assignee format: <issue
                        number>,<first_last>
  -ud UPDATE_DATE, --update-date UPDATE_DATE
                        Update an issues due date format: <issue number
                        >,<yyyy-mm-dd>
  -usts UPDATE_STATUS, --update-status UPDATE_STATUS
                        Update an issues status format: <issue
                        number>,'<Open|In Progress|Resolved>'
  -usum UPDATE_SUMMARY, --update-summary UPDATE_SUMMARY
                        Update an issues summary format: <issue
                        number>,'<summary>'
(env) ➜  jira git:(master) ✗

Here is the code…

from jira.client import JIRA
import os, sys
import prettytable
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('-lp', '--list-projects', action="store_true", dest="list_projects", required=False, help="List all projects")
parser.add_argument('-li', '--list-issues', action="store", dest="list_project_issues", required=False, help="List Issues for a specific project")
parser.add_argument('-ua', '--update-assignee', action="store", dest="update_assignee", required=False, help="Update an issues assignee format: <issue number>,<first_last>")
parser.add_argument('-ud', '--update-date', action="store", dest="update_date", required=False, help="Update an issues due date format: <issue number>,<yyyy-mm-dd>")
parser.add_argument('-usts', '--update-status', action="store", dest="update_status", required=False, help="Update an issues status format: <issue number>,'<Open|In Progress|Resolved>'")
parser.add_argument('-usum', '--update-summary', action="store", dest="update_summary", required=False, help="Update an issues summary format: <issue number>,'<summary>'")
args = parser.parse_args()

def get_pass():
    if os.environ.get('JIRA_PASS') == None:
        print "you must first export your JIRA_PASS in the shell by running: source getpass.sh"
        print 'or "export JIRA_PASS=<jira_password>'
        sys.exit()
    jira_password = str(os.environ['JIRA_PASS'])

    return jira_password

def connect_jira(jira_server, jira_user, jira_password):
    '''
    Connect to JIRA. Return None on error
    '''
    try:
        #print "Connecting to JIRA: %s" % jira_server
        jira_options = {'server': jira_server}
        jira = JIRA(options=jira_options,
                    # Note the tuple
                    basic_auth=(jira_user,
                                jira_password))
        return jira
    except Exception,e:
        print "Failed to connect to JIRA: %s" % e
        return None

def list_projects():
    projects = jira.projects()
    header = ["Projects"]
    table = prettytable.PrettyTable(header)
    for project in projects:
        row = [str(project)]
        table.add_row(row)
    print table

def list_project_issues(project):
    query = 'project = %s and status != Resolved order by due asc' % (project)
    #query = 'project = %s and status != Resolved order by due desc' % (project)
    #query = 'project = %s order by due desc' % (project)
    issues = jira.search_issues(query, maxResults=100)

    if issues:
        header = ["Ticket", "Summary", "Assignee", "Status", "Due Date"]
        table = prettytable.PrettyTable(header)

        for issue in issues:
                summary = issue.fields.summary
                st = str(issue.fields.status)
                dd = issue.fields.duedate
                assignee = issue.fields.assignee
                row = [issue, summary, assignee, st, dd]
                table.add_row(row)
        print table

def update_issue_assignee(issue_number, assignee):
    issue = jira.issue(issue_number)
    issue.update(assignee=assignee)
    print "updated %s with new Assignee: %s" % (issue_number, assignee)

def update_issue_duedate(issue_number, dd):
    issue = jira.issue(issue_number)
    issue.update(duedate=dd)
    print "updated %s with new Due Date: %s" % (issue_number, dd)

def update_issue_summary(issue_number, summary):
    issue = jira.issue(issue_number)
    issue.update(summary=summary)
    print "updated %s with new Summary: %s" % (issue_number, summary)

def update_issue_status(issue_number, status):
    # To figure out transitions view the following URL
    # https://jira.yourcompany.com/rest/api/2/issue/XXXXXX-22/transitions?expand=transitions.fields

    transitions = {
                    'Resolved': 11,
                    'Open': 41,
                    'In Progress': 21,
                    'Testing': 71,
                    'Transition': 81,
                    'Closed': 91,
    }

    issue = jira.issue(issue_number)
    jira.transition_issue(issue, transitions[status])
    print "updated %s with new Status: %s" % (issue_number, status)

if __name__ == "__main__":
    if not args:
        parser.print_help()
    else:
        jira_user = '<your_login>'
        jira_password = get_pass()
        jira_server = 'https://jira.yourcompany.com'

        jira = connect_jira(jira_server, jira_user, jira_password)

        if args.list_projects:
            list_projects()
        elif args.list_project_issues:
            project = args.list_project_issues
            list_project_issues(project)
        elif args.update_assignee:
            (issue_number, assignee) = args.update_assignee.split(',')
            update_issue_assignee(issue_number, assignee)
        elif args.update_date:
            (issue_number, dd) = args.update_date.split(',')
            update_issue_duedate(issue_number, dd)
        elif args.update_status:
            (issue_number, status) = args.update_status.split(',')
            update_issue_status(issue_number, status)
        elif args.update_summary:
            (issue_number, summary) = args.update_summary.split(',')
            update_issue_summary(issue_number, summary)
        else:
            parser.print_help()

Later, I found out that there is a Jira command line utility, I didn’t check and see what functionality it provided, but I enjoyed writing this anyway.

Happy coding !

Python & The Jira Rest API Read More »

How To: Maximize Availability Effeciently Using AWS Availability Zones

For the TL;DR version, skip straight to the Cassandra Examples

Intro & Background

During my years at PayPal I was fortunate enough to be a part of a pioneering architecture & engineering team that designed & delivered  a new paradigm for how we deployed & operated applications using a model that included 5 Availability Zones per Region (multiple regions) & Global Traffic Management. This new deployment pattern increased our cost efficiency and capacity while providing high availability to our production application stack. The key to increasing cost efficiency while not losing availability is how you manage your capacity. Failure detection and global traffic management go the rest of the way to make use of all your Availability Zones & Regions giving you better availability. There is a little more to it, in terms of how you deploy your applications advantageously to this design…but we will get into that with our Cassandra/AWS examples later.

Prior to this model our approach was the traditional 3 datacenter model. 2 Active + 1 DR all with 100% of the capacity required to operate independently and they required manual failover in the event of an outage.

Oh the joys of operating your own private cloud & datacenters at scale 🙂

Most companies and a recent article suggests PayPal as well, are thinking about or are moving to public cloud. Public cloud gives most companies cheaper and faster access to the economy of scale as well as an immediate tap into a global infrastructure.

Amazon Web Services (AWS) is the market share leader in public cloud today. They operate tens of Regions & Availability Zones all around the world. Deploying your application(s) on these massive scale public clouds means opportunity for operating them in more efficient ways.

Recently I came across a few articles that felt like reading a history book on going from a Failover to an Always On model.  I’ll just leave these here for inquiring minds…

http://highscalability.com/blog/2016/8/23/the-always-on-architecture-moving-beyond-legacy-disaster-rec.html
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44686.pdf

Ok enough with the introduction, lets move into the future, starting with the Principles of High Availability.

Principles for High Availability – Always

  • Use Multiple Regions ( Min 2, Max as many as you need) 
  • Use Multiple Availability Zones ( Min 2, Max as many as is prudent to optimize availability and capacity with cost ) 
  • Design for an Active/Active architecture
  • Deploy at least N+1 capacity per cluster when Active/Active is not easily accomplished
    • Note: N=100% of capacity required to run your workload
  • Eliminate Single Points of Failure/Ensure redundancies in all components of the distributed system(s)
  • Make services fault tolerate/Ensure they continue in the event of various failures
  • Make services resilient by implementing timeout/retry patterns (circuit breaker, exponential back off algorithm etc)
  • Make services implement graceful failure such as degrading the quality of a response while still providing a response
  • Use an ESB (Enterprise Service Bus) to make calls to your application stack asynchronous where applicable
  • Use caching layers to speed up responses
  • Use Auto-Scaling for adding dynamic capacity for bursty workloads (This can also save money)
  • Design deployments to be easily reproduced & self-healing (i.e. using Spinnaker(Netflix), Kubernetes (Google) for containers, Scalr etc)
  • Design deployments with effective monitoring visibility –  Synthetic Transactions, Predictive Analytics, event triggers, etc.

Principles for High Availability – Never’s

  • Deploy to a single availability zone
  • Deploy to a single region
  • Depend on either a single availability zone or region to always be UP.
  • Depend on capacity in a single availability zone in a region to be AVAILABLE.
  • Give up too easily on an Active/Active design/implementation
  • Rely on datacenter failover to provide HA (Instead prefer active/active/multi-region/multi-az)
  • Make synchronous calls across regions (this negatively effects your failure domain cross region)
  • Use sticky load balancing to pin requests to a specific node for a response to succeed (i.e. if you are relying on it for ‘state’ purposes this is really bad)

Summarizing

  • YOU MUST deploy your application components to a minimum of 2 Regions.
  • Because failures of an entire region in AWS happen frequently…
  • YOU MUST deploy your application components to a minimum of 2 Availability Zones within a Region. 
  • Because an AWS AZ can go down for maintenance, outage or be out of new capacity frequently…
  • YOU MUST aim to maximize the use of Availability Zones and Regions, but balance that with cost requirements and be aware of diminishing returns
  • YOU MUST use global traffic routing (Route53) and health-check monitoring/ automated markdowns to route around failure to healthy deployments
  • YOU MUST follow the rest of the Principles of High Availability Always directives to the best of your ability

Active/Active Request Routing (AWS)

Diagram Components

  • Global Traffic Management: Using AWS Route53 as a Global Traffic Manager (configuring Traffic Flows + geolocation)
  • 3 Regions = AZ’s: us-east-1 = 4, us-west-2 = 3, eu-central-1 = 2

Additional Info on Active/Active Request Routing

  • Route53 load balances requests to the application stack within each availability zone and across 3 regions.
  • This is an Active/Active design, i.e. Route53 is configured to route requests to all available zones until one fails. (Traffic Flow + Geolocation)

The Design Above Is Actually Wasteful (Cost & Capacity must be optimized for HA)

  • This example shows all application deployments (& components) in every AZ deployed with 100% capacity required to run without another AZ
  • This means you could lose ALL, BUT ONE availability zone and still service requests from your one remaining AZ

While this would achieve the highest availability, it is not the most optimal approach because it is wasteful

  • Each application team must decide the right mix of regions, availability zones & capacity to provide HA at the lowest cost for their consumers based on their application requirements
  • Depending on how much capacity you deploy for your application per AZ it is possible to deploy too much/many and have diminishing returns on availability while wasting a lot of $$$
  • You must find the right balance for your application & business contexts ( see below examples with Cassandra for more details on how this approach can be wasteful )

Understanding Availability Zones

The crux of achieving a highly available architecture lies within the proper usage & understanding of Availability Zones. If you ignore everything else, read this section !

What Do AZ’z Provide ?

In Amazon’s own words: 

“Each region contains multiple distinct locations called Availability Zones, or AZs. Each Availability Zone is engineered to be isolated from failures in other Availability Zones, and to provide inexpensive, low-latency network connectivity to other zones in the same region. By launching instances in separate Availability Zones, you can protect your applications from the failure of a single location.” Source: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html

How to think of an AZ

  • In the AWS World it is helpful to use the analogy of an Availability Zone being a Rack in a traditional datacenter
  • I.E. instead of striping a Cassandra cluster across Racks, you would do it across Availability Zones.

What about using an Availability Zone as a complete application failure domain ?

  • In a traditional private datacenter an Availability Zone is a separate datacenter container, with shared network (although it can be isolated as well).
  • It is built/populated with the required infrastructure & capacity to run the applications and it has all the physical characteristics mentioned above that the AWS AZ has
  • However, the way it is used in a private datacenter can be different, because YOU have full visibility & control of capacity
  • For example, you could limit service calls such that service calls are to/from the same AZ, effectively creating a distinct application failure domain, piggy-backing the already distinct infrastructure failure domain (provided by an Availability Zone)

Why not use an AWS Availability Zone as a distinct application failure domain then ?

  • You can, but this fails to be robust in the public cloud context because capacity might not always be available in the AZ and you don’t know when it will run out
  • BEING CLEAR – You can run out of capacity in an Availability Zone FOREVER. The AZ will still be available to manage existing infrastructure/capacity, but new capacity cannot be added
  • This would negatively effect auto-scaling & new provisioning and ultimately implies you cannot rely on a single AZ to be treated as a datacenter container
  • To continue operating when an AZ runs out of capacity in public cloud you have to utilize a new AZ for new capacity
  • That new AZ would follow the same isolated service calls principles except now 2 AZ’s would be called as part of a failure domain
  • Effectively multiple AZ’s would become a single failure domain organically
  • Therefore, it is not advantageous in the public cloud world to try to treat a single AZ as a failure domain
  • Instead stripe applications across as many AZ’s as necessary to provide up to N+2 capacity per Region while minimizing waste.
  • The region becomes the failure domain of your application and this is a more efficient use of AZs to provide availability in the public cloud context
  • This is why N+N regions is paramount in managing your availability in public cloud

What is the latency between Availability Zones ?

  • Tests with micro instances (they have worst network profile) show that the latency between AZ’s within a region is on average 1ms
  • Most workloads will have no challenge dealing with 1ms latency and taking full advantage of availability zones to augment their availability
  • However, some applications are very chatty, making hundreds to thousands of calls in series
  • These workloads often must be re-architected to work well in a public cloud environment

What happens if I do not use multiple Availability Zones ?

Truths…

  • An Availability Zone can become unavailable at any time, for any reason
  • An Availability Zone can become unavailable due to planned maintenance or an outage
  • An Availability Zone can run out of capacity for long periods of time and indefinitely (forever)

Thus the result of not deploying to multiple availability zones is…

  • Provisioning outages for new capacity and the inability to self-heal deployments

The reality is… 

  • If you only deploy to a single AZ in a Region, you guarantee that you will experience a 100% failure in that Region, when the AZ you rely on becomes unavailable
  • You can failover or continue (if Active/Active) to service transactions in another Region, but you can achieve higher availability per Region by striping your application deployment across multiple AZ’s in a region
  • And you have less latency to deal with when failing over to another AZ locally vs. an AZ in a remote region

Finding The Right Balance Between High Availability & Cost

Often times availability decisions come with a cost. This section will describe a more practical and better way to achieve high availability infrastructure for your application while keeping the costs within reason. It is important to note, cost optimization heavily depends on application context, because you must first figure out the lowest common denominator of AZ’s / Regions you can use and still achieve your target availability. This typically depends on the number of required AZ’s to achieve quorum for stateful services. For example, if we were configuring Cassandra for eventual consistency + high availability, we would want 5 nodes in the cluster and a replication factor of 3. We would want to the best of our ability to spread those nodes across AZ’s.

100% of application capacity deployed to every Availability Zone is bad

  • The traffic routing example in the beginning shows 100% capacity being deployed for every application component in every AZ
  • As covered previously and shown in the later sections, this is a sub-optimal deployment strategy.
  • Instead you must stripe your applications capacity across AZ’s, keeping in mind a target of at least N+1 capacity or N+2 where possible

What if I deploy 100% of application capacity to only two of Availability Zones per Region?

  • Let’s assume we are talking about MySQL or another RDBMs with ACID properties.
  • You might want to deploy MySQL as a master/master where each master is in a separate AZ and can support 100% of the required capacity
  • Given this configuration you have met the N+1 requirement per Region
  • This might be acceptable to you from the availability perspective, however keep in mind you do not control when those AZ’s become unavailable or worse run out of new capacity.
  • Always make sure an application like this is able to be redeployed through automation (i.e. self-healing) in another availability zone

Stateless/Stateful Applications & Capacity Defines the number of AZ’z per Region you need

  • Stateless applications are much easier to provide high availability for. You simply maximize redundant instances across AZ’s within reason (cost being the reason/limiting factor)
  • So effectively with stateless applications you tend to deploy to as many AZ’s as you have available
  • But what should determine the number of AZ’s you utilize per region is your stateful applications requirements for quorum or availability and how you choose to manage capacity

Examples of balancing High Availability & Cost using Apache Cassandra in AWS

This Cassandra calculator is a useful reference for this section
http://www.ecyrd.com/cassandracalculator/

Cassandra N+1 ( 4 AZ’s ) – Wasteful Version

The following assumes Cassandra is configured for eventual consistency with the following parameters:

  • Cluster Size=5, Replication Factor=3, Write/Read=1
  • Region=us-east-1, Available AZ’s=4

Diagram Explanation

  • In this configuration you can lose 2 nodes without data availability impact
  • If you lose a 3rd node you would lose access to 60% of your data
  • Your node availability is N+2. HOWEVER, because you have only 4 AZ’s in us-east-1 that can be used, 2 nodes are required to be in 1 AZ
  • The available AZ’s in the region reduces your overall availability as a whole to N+1 in the event of ‘us-east-1e’ becoming unavailable
  • Thus the 5th node is completely unnecessary and wasteful ! 

cassandran1wasteful

Cassandra N+1 Cost Optimized (4 AZ’s) Version

The following assumes Cassandra is configured for eventual consistency with the following parameters:

  • Set Cluster Size=4, Replication Factor=2
  • Region=us-east-1, Available AZ’s=4

Diagram Explanation

  • You can now lose 1 node or one availability zone and still service 100% of requests
  • Losing a 2nd node result in 50% impact to data accessibility

cassandran1optimized

Cassandra N+2 Cost Optimized (4 AZ’s) Version

  • Set Cluster Size=4, Replication Factor=3
  • Region=us-east-1, Available AZ’s=4

Diagram Explanation

  • In order to optimize for cost and achieve N+2 you must have at least 4 AZs available
  • Otherwise, 100% of data per node is required to achieve N+2 with only 3 AZ’s
  • A better scenario is having 4 AZ’s or more where you can play with Replication Factor (aka data availability thus capacity needed to run the service)
  • To achieve cost optimized N+2 with 4 AZ’s we will set cluster size to 4 and Replication Factor to 3 resulting in this diagram/outcome

cassandran2optimized

This achieves the following

  • You can lose any 2 nodes or AZ’s above, and continue to run. If you lose a 3rd node you would have 75% impact to your data availability and need to fail/markdown the region for another region to takeover.
  • Also if you have a 4 node cluster with a replication factor=4 you have 100% of data on each node…this is not cost optimized for availability, but allows you to lose 3/4 nodes.
  • However, N+3 is considered too many eggs in one region and thus is not the right approach.
  • Instead if you architect for N+2 in each region + have global traffic management routing to active/active deployments, you will maximize your availability.

Cassandra With Only 2 Availability Zones

  • us-east-1 has 4 AZ’s this is actually a high number of available regions for AWS, in regions like us-west-2 and us-central-1 you only have 3, and 2 AZ’s respectively.
  • To clearly show the risk of ignoring availability zones & capacity while architecting for availability let’s look at the extreme case of deploying in eu-central-1 changing nothing else
  • Again we use a Cluster Size=5,Replication Factor=3 as our initial example
  • eu-central-1 has only 2 AZ’s

cassandra2azbadness

  • As you can see if you lose eu-central-1a you lose 3 nodes at one time and 60% of your data becomes inaccessible.
  • If you lose eu-central-1b you are ok because you have > 100% of the required data available provided by your remaining nodes, but you cannot lose another node.
  • You cannot and should not ever deploy a service this way. Instead you need to tune the Cassandra cluster for the number of AZ’s.
  • Setting Cluster Size=2 and Replication Factor=2 for eu-central-1 allows an N+1 design
  • Unfortunately, cost optimization is not possible with only 2 AZ’s

cassandra2azn1

  • You can lose either node or AZ, and still have access to 100% of your data. You must lose access to the region or both AZs/nodes to have a service impact
  • You cannot achieve true N+2 with only 2 AZ’s in a region (You can achieve it via node redundancy only)
  • What I recommend in this case is have multi-region deployments (kudos if they happen to be in different public clouds), but within AWS deploy to eu-central-1 + eu-west-1 using N+1 deployments & global traffic management to achieve higher availability than a single region deployment would allow.

I hope this lengthy overview helps someone somewhere struggling with how to deploy applications to Availability Zones in AWS.
Feel free to contact me with any questions or corrections, thanks for reading !

How To: Maximize Availability Effeciently Using AWS Availability Zones Read More »