Cross Region MongoDB Across A Slow Network (Napster) Bad, AWS Snapshots (Metallica) Good!
I recently found myself in a bit of a pickle. My team and I had deployed a 3 node MongoDB cluster configured as two nodes in us-east-1 and one node in us-west-2 to maximize our availability while minimizing cost. Ultimately, there were two problems with this approach. The first is that for reasons mostly outside of our control the rest of our application stack above the database was deployed in us-east-1 drastically reducing any availability benefit the tertiary node in us-west-2 was buying us. Additionally, we were not aware at the time we made this choice, but our cluster/replication traffic was going across a VPN with very limited bandwidth that frequently suffered network partitions due to network maintenance and a lack of redundancy. We found our MongoDB cluster failing over frequently due to losing communication with it’s members and when it did our cluster had a difficult time recovering because replication couldn’t catch up across the VPN.
After restarting Mongo several times, including removing the data directory and starting over fresh, ultimately replication was going to take days to sync, and we could not afford to wait that long. We needed to restore the cluster health ASAP so we could move all nodes to us-east-1 mitigating our network issue with our VPN that was introducing so much pain.
Now the system I am referring to is production, it cannot lose data, and it cannot take downtime/a maintenance. Given these constraints I started googling ways to catch up your MongoDB, when it will not catch up on it’s own. I tried some things I found like rsync etc, before realizing it wasn’t any faster across that slow VPN link. Ultimately, I decided I was going to try a snapshot. Now the document I read warned me that a live snapshot may result in potentially inconsistent data, but again I had to try it given the constraints I mentioned before. I had few options. In the end as it turns out, it worked perfectly and in under an hour I had my entire cluster healthy. Using the AWS CLI utility, here is how I did it…
Step 1 take the snapshot of the healthy node
I actually took the snapshot in the GUI at first… so not shown here, but for the record to create a snapshot, go to your volume under Ec2 Volumes and click actions then create snapshot and save the snapshot ID. (Or alternatively do it with the CLI like I did for everything else).
Step 2, copy the snapshot from your source region to your destination region
aws --region us-east-1 ec2 copy-snapshot --source-region us-west-2 --source-snapshot-id snap-01f185929341abd3b --description "cis-mongo-prod-3-snapshot-05-19-2017"
Make sure you copy to your clipboard the snapshot ID returned…
Step 3, Create a new volume from the copied snapshot
aws ec2 create-volume --size 300 --region us-east-1 --availability-zone us-east-1d --volume-type gp2 --snapshot-id snap-085b986dae85dfed1
Response:
{ "AvailabilityZone": "us-east-1c", "Encrypted": false, "VolumeType": "gp2", "VolumeId": "vol-0fa49fde34e88a1c6", "State": "creating", "Iops": 900, "SnapshotId": "snap-085b986dae85dfed1", "CreateTime": "2017-05-19T20:37:33.304Z", "Size": 300 }
Step 4, Attach the volume to the system
aws ec2 attach-volume --volume-id vol-0fa49fde34e88a1c6 --instance-id i-0717cd609275fdbef --device /dev/sdc
Oh No We Got An Error!
An error occurred (InvalidVolume.ZoneMismatch) when calling the AttachVolume operation: The volume 'vol-0fa49fde34e88a1c6' is not in the same availability zone as instance 'i-0717cd609275fdbef'
Ah ok, simple fix, we created the volume in a different AZ than the node we were attaching to.
(delete the old volume) Then…
Step 5, create a new volume from the snapshot, but this time specify the same AZ (us-east-1b instead of us-east-1c) as the node we wish to attach it to
aws ec2 create-volume --size 300 --region us-east-1 --availability-zone us-east-1b --volume-type gp2 --snapshot-id snap-085b986dae85dfed1
Step 6, try attaching the new volume (cross your fingers)
aws ec2 attach-volume --volume-id vol-095cc214c8a5e74e0 --instance-id i-0717cd609275fdbef --device /dev/sdc
Response:
{ "AttachTime": "2017-05-19T21:58:07.586Z", "InstanceId": "i-0717cd609275fdbef", "VolumeId": "vol-095cc214c8a5e74e0", "State": "attaching", "Device": "/dev/sdc" }
Sweet it worked…Now it’s time to do some work on the node we attached this volume to.
Step 7, check if the new attachment is visible to the system
[root@ip-10-5-0-149 mongo]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 10G 0 disk └─xvda1 202:1 0 10G 0 part / xvdb 202:16 0 300G 0 disk /mnt xvdc 202:32 0 300G 0 disk [root@ip-10-5-0-149 mongo]#
Yup sure, is, we can see our device ‘xvdc’ is a 300G disk that has no mount point. We can also see ‘xvdb’ which is our original mongo data mount, mounted under /mnt.
Step 8, create mount point and mount the new device
[root@ip-10-5-0-149 mongo]# mkdir /mnt/mongo2 [root@ip-10-5-0-149 mongo]# mount /dev/xvdc /mnt/mongo2 [root@ip-10-5-0-149 mongo]# [root@ip-10-5-0-149 mongo]# df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 9.8G 5.6G 3.9G 60% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 1.6G 15G 10% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/xvdb 296G 65M 281G 1% /mnt none 64K 4.0K 60K 7% /.subd/tmp tmpfs 3.2G 0 3.2G 0% /run/user/11272 /dev/xvdc 296G 12G 269G 5% /mnt/mongo2 [root@ip-10-5-0-149 mongo]#
Step 9, shutdown Mongo if it’s running
[root@ip-10-5-0-149 mongo]# service mongod stop
Step 10, copy the snapshot data, to the existing MongoDB data directory
[root@ip-10-5-0-149 mongo]# pwd /mnt/mongo [root@ip-10-5-0-149 mongo]# ls [root@ip-10-5-0-149 mongo]# cp -r /mnt/mongo2/* .
Step 11, fix permissions for the copied data
[root@ip-10-5-0-149 mongo]# chown mongod:mongod /mnt/mongo -R
NOTE: Do not forget this step or you will get errors starting the MongoDB service
Step 12, start Mongo back up
[root@ip-10-5-0-149 mongo]# service mongod start Starting mongod (via systemctl): [ OK ] [root@ip-10-5-0-149 mongo]#
Step 13, Check Mongo Cluster Status
[root@ip-10-5-0-149 mongo]# mongo cisreplset:SECONDARY> rs.status() { "set" : "cisreplset", "date" : ISODate("2017-05-19T22:17:00.483Z"), "myState" : 2, "term" : NumberLong(111), "heartbeatIntervalMillis" : NumberLong(2000), "members" : [ { "_id" : 0, "name" : "ip-10-5-0-149:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 6, "optime" : { "ts" : Timestamp(1495222912, 3), "t" : NumberLong(110) }, "optimeDate" : ISODate("2017-05-19T19:41:52Z"), "configVersion" : 3, "self" : true }, { "_id" : 1, "name" : "10.5.5.182:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 4, "optime" : { "ts" : Timestamp(1495230671, 1033), "t" : NumberLong(111) }, "optimeDate" : ISODate("2017-05-19T21:51:11Z"), "lastHeartbeat" : ISODate("2017-05-19T22:16:56.263Z"), "lastHeartbeatRecv" : ISODate("2017-05-19T22:16:59.620Z"), "pingMs" : NumberLong(1), "syncingTo" : "10.100.0.17:27017", "configVersion" : 3 }, { "_id" : 2, "name" : "10.100.0.17:27017", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 3, "optime" : { "ts" : Timestamp(1495232212, 24), "t" : NumberLong(111) }, "optimeDate" : ISODate("2017-05-19T22:16:52Z"), "lastHeartbeat" : ISODate("2017-05-19T22:16:56.516Z"), "lastHeartbeatRecv" : ISODate("2017-05-19T22:16:58.751Z"), "pingMs" : NumberLong(84), "electionTime" : Timestamp(1495225273, 1), "electionDate" : ISODate("2017-05-19T20:21:13Z"), "configVersion" : 3 } ], "ok" : 1 } cisreplset:SECONDARY>
For contrast, here is what it looked like before, pay close attention to node/member 10.5.0.149
cisreplset:PRIMARY> rs.status() { "set" : "cisreplset", "date" : ISODate("2017-05-19T22:00:07.185Z"), "myState" : 1, "term" : NumberLong(111), "heartbeatIntervalMillis" : NumberLong(2000), "members" : [ { "_id" : 0, "name" : "ip-10-5-0-149:27017", "health" : 0, "state" : 8, "stateStr" : "(not reachable/healthy)", "uptime" : 0, "optime" : { "ts" : Timestamp(0, 0), "t" : NumberLong(-1) }, "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2017-05-19T22:00:06.570Z"), "lastHeartbeatRecv" : ISODate("2017-05-19T21:49:39.839Z"), "pingMs" : NumberLong(82), "lastHeartbeatMessage" : "Connection refused", "configVersion" : -1 }, { "_id" : 1, "name" : "10.5.5.182:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 2138, "optime" : { "ts" : Timestamp(1495228916, 7491), "t" : NumberLong(111) }, "optimeDate" : ISODate("2017-05-19T21:21:56Z"), "lastHeartbeat" : ISODate("2017-05-19T22:00:05.507Z"), "lastHeartbeatRecv" : ISODate("2017-05-19T22:00:05.358Z"), "pingMs" : NumberLong(83), "syncingTo" : "10.100.0.17:27017", "configVersion" : 3 }, { "_id" : 2, "name" : "10.100.0.17:27017", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 1200895, "optime" : { "ts" : Timestamp(1495231207, 1111), "t" : NumberLong(111) }, "optimeDate" : ISODate("2017-05-19T22:00:07Z"), "electionTime" : Timestamp(1495225273, 1), "electionDate" : ISODate("2017-05-19T20:21:13Z"), "configVersion" : 3, "self" : true } ], "ok" : 1 } cisreplset:PRIMARY>
Now that our DB is verified healthy it’s time to cleanup.
Step 14, clean our now unnecessary waste ( and thank the gods)
Umount & Delete
[root@ip-10-5-0-149 mongo]# unmount /mnt/mongo2/ [root@ip-10-5-0-149 mongo]# rm -rf /mnt/mongo2/
Detach Volume
(env) ➜ ~ aws ec2 detach-volume --volume-id vol-0fa49fde34e88a1c6 { "AttachTime": "2017-05-19T20:43:29.000Z", "InstanceId": "i-0d535ee1cdfd79073", "VolumeId": "vol-0fa49fde34e88a1c6", "State": "detaching", "Device": "/dev/sdc" } (env) ➜ ~
Delete Volume & Snapshots
(env) ➜ ~ aws ec2 delete-volume --volume-id vol-095cc214c8a5e74e0 (env) ➜ ~ aws ec2 delete-snapshot --snapshot-id snap-085b986dae85dfed1 (env) ➜ ~ aws ec2 delete-snapshot --snapshot-id snap-01f185929341abd3b --region us-west-2
When I ran into this issue and googled around a bit, I really didn’t find anyone with a detailed account of how they got out of it. Thus I was inspired by the opportunity to help others in the future and the result is this post. I hope it finds someone, someday, facing a similar scenario and graciously lifts them out of the depths! Godspeed, happy clouding.