{"id":490,"date":"2017-03-01T21:24:09","date_gmt":"2017-03-01T21:24:09","guid":{"rendered":"http:\/\/tuxlabs.com\/?p=490"},"modified":"2017-03-03T07:32:05","modified_gmt":"2017-03-03T07:32:05","slug":"how-to-interact-with-aws-s3-using-the-go-sdk-and-not-lose-your-mind","status":"publish","type":"post","link":"https:\/\/tuxlabs.com\/?p=490","title":{"rendered":"How To: Interact with AWS S3 Using the Go SDK and not lose your mind"},"content":{"rendered":"<p><a href=\"http:\/\/tuxlabs.com\/wp-content\/uploads\/2017\/03\/aws-sdk-go-golang.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-505\" src=\"http:\/\/tuxlabs.com\/wp-content\/uploads\/2017\/03\/aws-sdk-go-golang.jpg\" width=\"600\" height=\"392\" \/><\/a><\/p>\n<p><em><strong>After these messages we will carry on with our regularly scheduled programming&#8230;<\/strong><\/em><\/p>\n<p><em>Yesterday ( during the scribbling of this article ) <a href=\"https:\/\/techcrunch.com\/2017\/02\/28\/amazon-aws-s3-outage-is-breaking-things-for-a-lot-of-websites-and-apps\/\">AWS suffered one of it&#8217;s worst outages in history in the us-east-1 region<\/a>. A reminder to us all to <strong>be multi-region and more importantly multi-cloud<\/strong>. Please see my other articles on <a href=\"http:\/\/tuxlabs.com\/?p=380\">HA deployments using AWS<\/a> and my perspective &amp; caution on <a href=\"http:\/\/tuxlabs.com\/?p=430\">the path to centralization or singularity<\/a> we appear to be on (though the outage may help people wake up).<\/em><\/p>\n<p><em><strong>Now back to your regularly scheduled program&#8230;<\/strong><\/em><\/p>\n<hr \/>\n<p>My team and I are building a CMDB for AWS, which provides us with everything happening in our AWS environment + OS level metadata + change history. There will be a separate article on the CMDB journey, but today I want to focus on a specific service in AWS called S3, which is their object store. S3 is a bit of a special snowflake when it comes to AWS services and because of that I ran into challenges structuring my code, because up until S3 (which was the last service I wrote code for) everything had been\u00a0very similar, and easily modularized. We will get to more detail, but let&#8217;s start this article by covering how to use the Go SDK for AWS.<\/p>\n<h3>Dependencies<\/h3>\n<p>This article assumes you already program in Go and have Go installed on your machine.\u00a0To get started you will need a couple additional items.<\/p>\n<ol>\n<li>Download and install the SDK here : <a href=\"https:\/\/github.com\/aws\/aws-sdk-go\">https:\/\/github.com\/aws\/aws-sdk-go\u00a0<\/a><\/li>\n<li>This is the documentation for the SDK, you will need it, bookmark it : <a href=\"http:\/\/docs.aws.amazon.com\/sdk-for-go\/api\/\">http:\/\/docs.aws.amazon.com\/sdk-for-go\/api\/<\/a><\/li>\n<li>It is extremely helpful when working with the API&#8217;s to have aws-shell installed :\u00a0<a href=\"https:\/\/github.com\/awslabs\/aws-shell\">https:\/\/github.com\/awslabs\/aws-shell<\/a>\n<ul>\n<li>This enables you to interact with AWS API&#8217;s on the fly so you can understand the output of commands as you are searching for what you are trying to accomplish.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<h3>The Collector Structure<\/h3>\n<p>The collector is the component in my CMDB architecture that does all the work of collecting the metadata that we shove into our CMDB. The collector is \u00a0heavily threaded using go routines for performance. The basic structure looks like this.<\/p>\n<ul>\n<li>Call a go routine for each service you want to collect\n<ul>\n<li>\/\/pass in all accounts, regions (from config-file) and pre-established awsSessions to each account you are collecting<\/li>\n<li>Inside of a services go routine, loop overs accounts &amp; regions\n<ul>\n<li>Launch a go routine for each account &amp; region\n<ul>\n<li>Inside of those go routines make your AWS API call(s), example DescribeInstances<\/li>\n<li>Store the response (I loop through mine and store it in a map using the resource-id as the key)<\/li>\n<li>Finally, kick off another go routine to write to our API and store the data.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Ok, so hopefully that seems straight forward as a basic structure&#8230;let&#8217;s get to why S3 through me for a loop.<\/p>\n<h3>S3 Challenges<\/h3>\n<p>It will be best if I show you what I tried first, basically I tried to marry my existing pattern to S3 and that certainly was a bad idea from the start. Here was the structure of the S3 part of the code.<\/p>\n<ul>\n<li>The S3 go routine gets called from main.go<\/li>\n<li>\/\/all accounts, regions and AWS Sessions are past into the next go routine\n<ul>\n<li>Inside of the S3 go routine, loop over accounts &amp; regions\n<ul>\n<li>Launch a go routine for each account &amp; region\n<ul>\n<li>Inside of those go routines List S3 Buckets\n<ul>\n<li>For each S3 buckets returned\n<ul>\n<li>Call additional API&#8217;s such as GetBucketTagging()<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Ok so what happened ? I got a lot of errors that&#8217;s what \ud83d\ude42 Ones like this&#8230;.<\/p>\n<pre class=\"nums:false lang:default decode:true \">BucketRegionError: incorrect region, the bucket is not in 'us-west-2' region\r\nstatus code: 301, request id:<\/pre>\n<blockquote><p>At first, I thought maybe my code wasn&#8217;t thread safe&#8230;but that didn&#8217;t make much sense given the other services had no issues like this.<\/p><\/blockquote>\n<p>So as I debugged my code, I began to realize<strong> the buckets list I was getting, wasn&#8217;t limited to the region I was passing in\/ establishing a session for<\/strong>.<\/p>\n<blockquote><p>Naturally, I googled can I list buckets for a single region ?<\/p><\/blockquote>\n<p class=\"p1\"><span class=\"s1\"><a href=\"https:\/\/github.com\/aws\/aws-sdk-java\/issues\/920\">https:\/\/github.com\/aws\/aws-sdk-java\/issues\/920<\/a><\/span><span class=\"s2\">\u00a0(even though this is the Java SDK it still applies)..<\/span><\/p>\n<pre class=\"nums:false lang:default decode:true\">\"spfink commented on Nov 16, 2016\r\nIt is not possible to list the buckets in a single region. Regardless of the endpoint or region that you set, when calling list buckets you will get buckets from all regions.\r\n\r\nIn order to determine the region of a bucket you can use getBucketLocation(String bucketName).\r\n\r\nhttps:\/\/github.com\/aws\/aws-sdk-java\/blob\/master\/aws-java-sdk-s3\/src\/main\/java\/com\/amazonaws\/services\/s3\/AmazonS3.java#L1026\u201d<\/pre>\n<p>Ah ok, the BucketList being returned on an AWS Session established with a specific account and region,<strong> ignores the region. Because S3 Buckets are global to an account<\/strong>, thus all buckets under an account are returned in the ListBuckets() call. I knew S3 buckets were global per account, but <strong>failed to expect a matching behavior\/output when a specific region is passed into the SDK\/API<\/strong>.<\/p>\n<blockquote><p>Ok so how then can I distinguish where a bucket actually lives?<\/p><\/blockquote>\n<p>As spfink says above, I needed to run <strong>GetBucketLocation() per bucket<\/strong>. Thus my code structure started to look like this&#8230;<\/p>\n<ul>\n<li>For each account, region\n<ul>\n<li>ListBuckets\n<ul>\n<li>For each bucket returned in that account, region\n<ul>\n<li>GetBucketLocation<\/li>\n<li>If a LocationConstraint (region) is returned, set the new region (otherwise if response is null, do nothing)<\/li>\n<li>Get tags for the bucket in account, region<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>With this code\u00a0I was still getting errors about region, but why ?<\/p>\n<p>Well I made the mistake of thinking a &#8216;null&#8217; response from the API for LocationConstraint had no meaning (or meant query it from any region), <strong>wrong<\/strong> (null actually means us-east-1 see from my google below)\u00a0thus the IF condition evaluated false and the existing region from the outer loop was\u00a0used because GetBucketLocation() returned null and this resulted in many errors.<\/p>\n<blockquote><p>Here&#8217;s what the google turned up..<\/p><\/blockquote>\n<p><a href=\"https:\/\/github.com\/aws\/aws-cli\/issues\/564\">https:\/\/github.com\/aws\/aws-cli\/issues\/564<\/a><\/p>\n<pre class=\"nums:false lang:default decode:true \">\"kyleknap commented on Mar 16, 2015\r\n@xurume\r\n\r\nFor buckets located in US Standard, the location constraint will be null. For S3, here is the list of region names with their corresponding regions: http:\/\/docs.aws.amazon.com\/general\/latest\/gr\/rande.html#s3_region. Notice that the location constraint is none for US Standard.\r\n\r\nThe CLI uses the values in the region column for the --region parameter. So for S3's US Standard, you need to use us-east-1 as the region.\u201d<\/pre>\n<p>So let&#8217;s clarify my mistakes&#8230;<\/p>\n<ol>\n<li>The S3 ListBuckets call returns all buckets under an account globally.\n<ul>\n<li>It does not abide by a region configured in an API Session<\/li>\n<li>Thus I\/you\u00a0<strong>should not loop over regions<\/strong> from a config file for the S3 service.<\/li>\n<li>Instead I\/you need to find a buckets &#8216;real&#8217; location using GetBucketLocation<\/li>\n<li>Then set the region for actions\u00a0other than ListBuckets (which is global per account and ignores region passed).<\/li>\n<\/ul>\n<\/li>\n<li>GetBucketLocation returning null, doesn&#8217;t mean the bucket is global or that you can interact with\u00a0the bucket from endpoint\u00a0you please&#8230;it actually means us-east-1\u00a0<a href=\"http:\/\/docs.aws.amazon.com\/general\/latest\/gr\/rande.html#s3_region\">http:\/\/docs.aws.amazon.com\/general\/latest\/gr\/rande.html#s3_region<\/a><\/li>\n<\/ol>\n<h3>The Working Code<\/h3>\n<p>So in the end the working code for S3 looks like this&#8230;<\/p>\n<ul>\n<li>collector\/main.go fires off a bunch of go routines per service we are collecting for.<\/li>\n<li>It passes in accounts, and regions from a config file.<\/li>\n<li>For the S3 service\/file under the &#8216;services&#8217; package the entry point is a function called StoreS3Resources.<\/li>\n<\/ul>\n<p>Everything in the code should be self explanatory from that point on. You will note a function call to &#8216;writeToCis&#8217;&#8230; CIS\u00a0is the name of our internal CMDB project\/service. Again, I will later be blogging about the entire system in detail once we open source the code. Please keep in mind this code is MVP, it will be changed a lot (optimization, modularized, bug fixes, etc) before &amp; after we open source it, but for now he is the quick and dirty, but hopefully functional code \ud83d\ude42 Use at your own risk !<\/p>\n<pre class=\"lang:default decode:true \">package services\r\n\r\nimport (\r\n\t\"github.com\/aws\/aws-sdk-go\/service\/s3\"\r\n\t\"github.com\/aws\/aws-sdk-go\/aws\/session\"\r\n\t\"github.com\/aws\/aws-sdk-go\/aws\"\r\n\t\"sync\"\r\n\t\"fmt\"\r\n\t\"time\"\r\n\t\"encoding\/json\"\r\n\t\"strings\"\r\n)\r\n\r\nvar wgS3BucketList sync.WaitGroup\r\nvar wgS3GetBucketDetails sync.WaitGroup\r\nvar accountRegionsMap = make(map[string]map[string][]string)\r\nvar accountToBuckets = make(map[string][]string)\r\nvar bucketToAccount = make(map[string]string)\r\nvar defaultRegion string = \"us-east-1\"\r\n\r\nfunc writeS3ResourceToCis(resType string, resourceData map[string]interface{}, account string, region string){\r\n\tb, err := json.Marshal(resourceData)\r\n\tcheck(err)\r\n\r\n\terr, status, url := writeToCisBulk(resType, region, b)\r\n\tcheck(err)\r\n\tfmt.Printf(\"%s - %s - %s - %s - Bytes: %d\\n\", status, url, account, region, cap(b))\r\n}\r\n\r\nfunc StoreS3Resources(awsSessions map[string]*session.Session, accounts []string, configuredRegions []string) {\r\n\ts3Start := time.Now()\r\n\r\n\twgS3BucketList.Add(1)\r\n\tgo func () {\r\n\t\tdefer wgS3BucketList.Done()\r\n\t\tfor _, account := range accounts {\r\n\t\t\tawsSession := awsSessions[account]\r\n\t\t\tgetS3AccountBucketList(awsSession, account)\r\n\t\t}\r\n\t}()\r\n\twgS3BucketList.Wait()\r\n\r\n\tgetS3BucketDetails(awsSessions, configuredRegions)\r\n\r\n\ts3Elapsed := time.Since(s3Start)\r\n\tfmt.Printf(\"S3 completed in: %s\\n\", s3Elapsed)\r\n}\r\n\r\nfunc getS3AccountBucketList(awsSession *session.Session, account string) {\r\n\tsvcS3 := s3.New(awsSession, &amp;aws.Config{Region: aws.String(defaultRegion)})\r\n\r\n\t\/\/list returned is for all buckets in an account ( no regard for region )\r\n\tresp, err := svcS3.ListBuckets(nil)\r\n\tcheck(err)\r\n\r\n\tvar buckets []string\r\n\r\n\tfor _,bucket := range resp.Buckets {\r\n\t\tbuckets = append(buckets, *bucket.Name)\r\n\r\n\t\t\/\/reverse mapping needed for lookups in other funcs\r\n\t\tbucketToAccount[*bucket.Name] = account\r\n\t}\r\n\r\n\t\/\/a list of buckets per account\r\n\taccountToBuckets[account] = buckets\r\n}\r\n\r\n\r\nfunc getS3BucketLocation(awsSession *session.Session, bucket string, bucketToRegion map[string]string, regionToBuckets map[string][]string)  {\r\n\twgS3GetBucketDetails.Add(1)\r\n\tgo func() {\r\n\t\tdefer wgS3GetBucketDetails.Done()\r\n\t\tsvcS3 := s3.New(awsSession, &amp;aws.Config{Region: aws.String(defaultRegion)}) \/\/ default\r\n\r\n\t\tvar requiredRegion string\r\n\r\n\t\tlocationParams := &amp;s3.GetBucketLocationInput{\r\n\t\t\tBucket: aws.String(bucket),\r\n\t\t}\r\n\t\trespLocation, err := svcS3.GetBucketLocation(locationParams)\r\n\t\tcheck(err)\r\n\r\n\t\t\/\/We must query the bucket based on the location constraint\r\n\t\tif strings.Contains(respLocation.String(), \"LocationConstraint\") {\r\n\t\t\trequiredRegion = *respLocation.LocationConstraint\r\n\t\t} else {\r\n\t\t\t\/\/if getBucketLocation is null us-east-1 used\r\n\t\t\t\/\/http:\/\/docs.aws.amazon.com\/general\/latest\/gr\/rande.html#s3_region\r\n\t\t\trequiredRegion = \"us-east-1\"\r\n\t\t}\r\n\r\n\t\tbucketToRegion[bucket] = requiredRegion\r\n\t\tregionToBuckets[requiredRegion] = append(regionToBuckets[requiredRegion], bucket)\r\n\t\taccountRegionsMap[bucketToAccount[bucket]] = regionToBuckets\r\n\t}()\r\n}\r\n\r\nfunc getS3BucketsTags(awsSession *session.Session, buckets []string, account string, region string) {\r\n\twgS3GetBucketDetails.Add(1)\r\n\tgo func() {\r\n\t\tdefer wgS3GetBucketDetails.Done()\r\n\t\tsvcS3 := s3.New(awsSession, &amp;aws.Config{Region: aws.String(region)})\r\n\r\n\t\tvar resourceData = make(map[string]interface{})\r\n\r\n\t\tfor _, bucket := range buckets {\r\n\t\t\ttaggingParams := &amp;s3.GetBucketTaggingInput{\r\n\t\t\t\tBucket: aws.String(bucket),\r\n\t\t\t}\r\n\t\t\trespTags, err := svcS3.GetBucketTagging(taggingParams)\r\n\t\t\tcheck(err)\r\n\r\n\t\t\tresourceData[bucket] = respTags\r\n\t\t}\r\n\t\twriteS3ResourceToCis(\"buckets\", resourceData, account, region)\r\n\t}()\r\n}\r\n\r\n\r\nfunc getS3BucketDetails(awsSessions map[string]*session.Session, configuredRegions []string) {\r\n\r\n\tfor account, buckets := range accountToBuckets {\r\n\t\t\/\/reset regions for each account\r\n\t\tvar bucketToRegion = make(map[string]string)\r\n\t\tvar regionToBuckets = make(map[string][]string)\r\n\t\tfor _,bucket := range buckets {\r\n\t\t\tawsSession := awsSessions[account]\r\n\t\t\tgetS3BucketLocation(awsSession, bucket, bucketToRegion, regionToBuckets)\r\n\t\t}\r\n\t}\r\n\twgS3GetBucketDetails.Wait()\r\n\r\n\t\/\/Preparing configured regions to make sure we only write to CIS for regions configured\r\n\tvar configuredRegionsMap = make(map[string]bool)\r\n\tfor _,region := range configuredRegions {\r\n\t\tconfiguredRegionsMap[region] = true\r\n\t}\r\n\r\n\tfor account := range accountRegionsMap {\r\n\t\tawsSession := awsSessions[account]\r\n\t\tfor region, buckets := range accountRegionsMap[account] {\r\n\t\t\t\/\/Only proceed if it's a configuredRegion from the config file.\r\n\t\t\tif _, ok := configuredRegionsMap[region]; ok {\r\n\t\t\t\tfmt.Printf(\"%s %s has %d buckets\\n\", account, region, len(buckets))\r\n\t\t\t\tgetS3BucketsTags(awsSession, buckets, account, region)\r\n\t\t\t} else {\r\n\t\t\t\tfmt.Printf(\"Skipping buckets in %s because is not a configured region\\n\", region)\r\n\t\t\t}\r\n\t\t}\r\n\t}\r\n\twgS3GetBucketDetails.Wait()\r\n}\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<a href=\"https:\/\/tuxlabs.com\/?p=490\" rel=\"bookmark\" title=\"Permalink to How To: Interact with AWS S3 Using the Go SDK and not lose your mind\"><p>After these messages we will carry on with our regularly scheduled programming&#8230; Yesterday ( during the scribbling of this article ) AWS suffered one of it&#8217;s worst outages in history in the us-east-1 region. A reminder to us all to be multi-region and more importantly multi-cloud. Please see my other articles on HA deployments using [&hellip;]<\/p>\n<\/a>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[131,142,1,8,12],"tags":[170,23,132,155,167,166,169,165,168],"class_list":{"0":"post-490","1":"post","2":"type-post","3":"status-publish","4":"format-standard","6":"category-aws","7":"category-go","8":"category-howtos","9":"category-programming","10":"category-systems-administration","11":"tag-amazon","12":"tag-aws","13":"tag-cloud","14":"tag-cmdb","15":"tag-go","16":"tag-golang","17":"tag-outage","18":"tag-s3","19":"tag-sdk","20":"h-entry","21":"hentry"},"_links":{"self":[{"href":"https:\/\/tuxlabs.com\/index.php?rest_route=\/wp\/v2\/posts\/490","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tuxlabs.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tuxlabs.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tuxlabs.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tuxlabs.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=490"}],"version-history":[{"count":22,"href":"https:\/\/tuxlabs.com\/index.php?rest_route=\/wp\/v2\/posts\/490\/revisions"}],"predecessor-version":[{"id":623,"href":"https:\/\/tuxlabs.com\/index.php?rest_route=\/wp\/v2\/posts\/490\/revisions\/623"}],"wp:attachment":[{"href":"https:\/\/tuxlabs.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=490"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tuxlabs.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=490"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tuxlabs.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=490"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}