TuxLabs LLC

All things DevOps


How To: Interact with AWS S3 Using the Go SDK and not lose your mind

Published / by tuxninja / Leave a Comment

After these messages we will carry on with our regularly scheduled programming…

Yesterday ( during the scribbling of this article ) AWS suffered one of it’s worst outages in history in the us-east-1 region. A reminder to us all to be multi-region and more importantly multi-cloud. Please see my other articles on HA deployments using AWS and my perspective & caution on the path to centralization or singularity we appear to be on (though the outage may help people wake up).

Now back to your regularly scheduled program…

My team and I are building a CMDB for AWS, which provides us with everything happening in our AWS environment + OS level metadata + change history. There will be a separate article on the CMDB journey, but today I want to focus on a specific service in AWS called S3, which is their object store. S3 is a bit of a special snowflake when it comes to AWS services and because of that I ran into challenges structuring my code, because up until S3 (which was the last service I wrote code for) everything had been very similar, and easily modularized. We will get to more detail, but let’s start this article by covering how to use the Go SDK for AWS.


This article assumes you already program in Go and have Go installed on your machine. To get started you will need a couple additional items.

  1. Download and install the SDK here : https://github.com/aws/aws-sdk-go 
  2. This is the documentation for the SDK, you will need it, bookmark it : http://docs.aws.amazon.com/sdk-for-go/api/
  3. It is extremely helpful when working with the API’s to have aws-shell installed : https://github.com/awslabs/aws-shell
    • This enables you to interact with AWS API’s on the fly so you can understand the output of commands as you are searching for what you are trying to accomplish.

The Collector Structure

The collector is the component in my CMDB architecture that does all the work of collecting the metadata that we shove into our CMDB. The collector is  heavily threaded using go routines for performance. The basic structure looks like this.

  • Call a go routine for each service you want to collect
    • //pass in all accounts, regions (from config-file) and pre-established awsSessions to each account you are collecting
    • Inside of a services go routine, loop overs accounts & regions
      • Launch a go routine for each account & region
        • Inside of those go routines make your AWS API call(s), example DescribeInstances
        • Store the response (I loop through mine and store it in a map using the resource-id as the key)
        • Finally, kick off another go routine to write to our API and store the data.

Ok, so hopefully that seems straight forward as a basic structure…let’s get to why S3 through me for a loop.

S3 Challenges

It will be best if I show you what I tried first, basically I tried to marry my existing pattern to S3 and that certainly was a bad idea from the start. Here was the structure of the S3 part of the code.

  • The S3 go routine gets called from main.go
  • //all accounts, regions and AWS Sessions are past into the next go routine
    • Inside of the S3 go routine, loop over accounts & regions
      • Launch a go routine for each account & region
        • Inside of those go routines List S3 Buckets
          • For each S3 buckets returned
            • Call additional API’s such as GetBucketTagging()

Ok so what happened ? I got a lot of errors that’s what 🙂 Ones like this….

At first, I thought maybe my code wasn’t thread safe…but that didn’t make much sense given the other services had no issues like this.

So as I debugged my code, I began to realize the buckets list I was getting, wasn’t limited to the region I was passing in/ establishing a session for.

Naturally, I googled can I list buckets for a single region ?

https://github.com/aws/aws-sdk-java/issues/920 (even though this is the Java SDK it still applies)..

Ah ok, the BucketList being returned on an AWS Session established with a specific account and region, ignores the region. Because S3 Buckets are global to an account, thus all buckets under an account are returned in the ListBuckets() call. I knew S3 buckets were global per account, but failed to expect a matching behavior/output when a specific region is passed into the SDK/API.

Ok so how then can I distinguish where a bucket actually lives?

As spfink says above, I needed to run GetBucketLocation() per bucket. Thus my code structure started to look like this…

  • For each account, region
    • ListBuckets
      • For each bucket returned in that account, region
        • GetBucketLocation
        • If a LocationConstraint (region) is returned, set the new region (otherwise if response is null, do nothing)
        • Get tags for the bucket in account, region

With this code I was still getting errors about region, but why ?

Well I made the mistake of thinking a ‘null’ response from the API for LocationConstraint had no meaning (or meant query it from any region), wrong (null actually means us-east-1 see from my google below) thus the IF condition evaluated false and the existing region from the outer loop was used because GetBucketLocation() returned null and this resulted in many errors.

Here’s what the google turned up..


So let’s clarify my mistakes…

  1. The S3 ListBuckets call returns all buckets under an account globally.
    • It does not abide by a region configured in an API Session
    • Thus I/you should not loop over regions from a config file for the S3 service.
    • Instead I/you need to find a buckets ‘real’ location using GetBucketLocation
    • Then set the region for actions other than ListBuckets (which is global per account and ignores region passed).
  2. GetBucketLocation returning null, doesn’t mean the bucket is global or that you can interact with the bucket from endpoint you please…it actually means us-east-1 http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

The Working Code

So in the end the working code for S3 looks like this…

  • collector/main.go fires off a bunch of go routines per service we are collecting for.
  • It passes in accounts, and regions from a config file.
  • For the S3 service/file under the ‘services’ package the entry point is a function called StoreS3Resources.

Everything in the code should be self explanatory from that point on. You will note a function call to ‘writeToCis’… CIS is the name of our internal CMDB project/service. Again, I will later be blogging about the entire system in detail once we open source the code. Please keep in mind this code is MVP, it will be changed a lot (optimization, modularized, bug fixes, etc) before & after we open source it, but for now he is the quick and dirty, but hopefully functional code 🙂 Use at your own risk !

Setting up Netflix’s Edda (CMDB) in AWS on Ubuntu

Published / by tuxninja / Leave a Comment

If you are running any kind of environment with greater than 10 servers, than you need a CMDB (Configuration Management DataBase). CMDB’s are the brain of your fleet & it’s environment. You can store anything in a CMDB, but commonly the metadata in CMDB’s consists of any of the following physical & digital asset inventory, software licenses, software configuration data, policy information, relationships (I.E. This VM—> Compute –> Rack –> Availability Zone –> Datacenter), automation metadata, and more… they also commonly provide change history for changes in your environment.

In the world of infrastructure as code, CMDB is king.

CMDB’s enable endless automation possibilities, without them you are stuck gathering and collecting ‘current’ configuration state about your infrastructure every time you want perform an automated change or run an audit/report . In my career I have built or been a part of CMDB efforts at nearly every company I have worked for. They are simply necessary, and by their nature they tend to require the choice of ‘built by us’ vs ‘buy or run’.

However, if you have the luxury of only running in AWS, you are in luck, because Netflix (The AWS poster child)  open sourced Edda in 2012 for this purpose!

Rather than talk about the specific features of Edda refer to the blog post or documentation, I want to keep this article short and jump right into setting up Edda, which is a bit tricky, because the documentation is out of date!

Setting Up Edda (2016)

First, in AWS you need setup an EC2 VM that has at least.. 6G for OS + dependencies including Mongo, and then however much disk you need to store the metadata for your environment (keep in mind it keeps change history). Personally I just created a root partition with 100G to keep things simple. For instance type I used ‘m4.xlarge’ and the Ubuntu version is 14.04.

After booting the VM, SSH to it and create a directory wherever your storage is allocated partition wise to store Edda & it’s dependencies. I will be using /cmdb/ in my example.

Initial Install Steps

For the record, the Edda Wiki has the build steps wrong, it appears they no long are using Gradle, but have switch to SBT… which reminds me be aware Edda is written in Scala, which isn’t as popular as Java, Python etc… in addition it’s functional programming, which I don’t personally know a lot about, but I hear it’s got quite the learning curve..so beware if you need to make custom code changes, I would not recommend it, unless you know Scala ! 🙂

After the build of Edda succeeds, install Mongo

That’s it for dependencies

Configuring Mongo

For Edda to use Mongo all we need to do is ‘use’ the database we want to use for Edda & create an associated user. (Mongo will auto-create DB’s upon insert).

You can test the user is working by doing… 

Configuring Edda

Under /cmdb/edda/src/main/resources we need to modify ‘edda.properties’ with valid config values for accounts, regions & mongo access.

Relevant Mongo Values

Account & Region Values 

The above example is using one account and only one region. The Edda configuration uses generic labels, they are very flexible, but when using them you might be confused by the name of the label as it’s intent. Don’t fall into that trap, I did, and then I found this post on Google Groups… Check it out to gain more insight on how the configuration works and can be tweaked for  your needs. There is also the standard documentation, but it’s a little light IMO.

Running Edda

Congrats you made it, time to run Edda ! Again the documentation has this wrong (listed as gradle & Jetty)…instead were using SBT + Jetty…

If everything goes smoothly you will start to see logs about crawling AWS API’s spewing to your screen 🙂 After about 2 minutes you should see data. You can check by doing a curl.

This API URL should return a JSON object with instance ID’s for the account & region specified.

Additionally, Edda is listening on whatever private IP address you have setup, you will just need to modify the default security group to allow 8080 on your machine.

I get a bit frustrated with out of date documentation..so I hope this helps ! Happy automating !