Runner: Multi-threaded SSH with Sudo support using Python & Paramiko

Example of Runner

$ runner -r web1 -c "whoami" -s
RUNNER [INFO]: MATCHING HOSTNAMES WITH 'web1'
RUNNER: 1 HOSTS HAVE BEEN SELECTED
RUNNER [INFO]: LOGFILE SET - logs/runner.log.2015-01-17.03:10:00
RUNNER [INFO]: USER SET - tuxninja
RUNNER [INFO]: SSH CONNECT TIMEOUT is: 5 seconds
RUNNER [INFO]: THREADS SET - 20
RUNNER [INFO]: SUDO IS ON
RUNNER [INPUT]: Please Enter Site Pass: 
web1.tuxlabs.com: 
web1.tuxlabs.com: root
web1.tuxlabs.com: [tuxninja@web1 ~]$ 

RUNNER [RESULT]: Successfully logged into 1/1 hosts and ran your commands in 0:00:08 second(s)
RUNNER [RESULT]: There were 0 login failures.

 

Why Runner ?

I have been working as a Systems & Network Administrator since 1999. In that time I have repeatedly had the need for rapidly executing commands across thousands of servers. There are many applications out there that solve this problem in various ways…to name a few…pdsh, Ansible, Salt, Chef, Puppet (mcollective),  even Cfengine and more. Some require agents running on the machines, some use SSH, but require keys…or learning curves. Alternatively, you can write your own code to solve this problem, which is what I did mostly for fun. I don’t recommend re-inventing the wheel if you need this for your job, just use what is already out there, or download runner and hack it to your hearts content for your purposes.

Fabric vs. Paramiko

Because I use Python for most of my work code these days, I decided to write my multi-threaded SSH command runner in Python this way I can use Runner for parallel SSH transport & easily bolt on my other Python scripts for additional functionality. Python has fantastic support for SSH via two libraries Fabric & Paramiko. Fabric is built on top of Paramiko. Fabric provides a simpler interface than Paramiko does for doing just about anything you can think of. Create a fabfile run it, and wolla instant results from commands ran via SSH. Fabric is really great for running & re-running a set of commands to automate an install or reporting for example. All that being said I still chose to use Paramiko over Fabric for three reasons.

  1. I don’t like abstraction. Fabric hides the ugly-ness of Paramiko, which I prefer to understand better.
  2. Writing this using Paramiko lent itself better to a command line utility used for adhoc commands than Fabric did.
  3. I wasn’t sure if Fabric’s abstraction would limit me later based on needing custom functionality. So for Runner I chose Paramiko, but to be clear, 9 times out of 10 I think I would choose Fabric.

Bastions

A bastion or jump box is a machine that is used as the gatekeeper of access to the rest of the machines in your network. In secure environments where your Corp network is separate from your Production network, you will have to SSH into a bastion, which usually has some form of 2-factor authentication (at least it should !) and then from there you may SSH into other hosts. A bastion can throw a real wrench in trying to manage thousands of machines in seconds, because you would have to authenticate to the bastion 1000 times ! The way around this, is by setting up your SSH config to proxy commands.

ProxyCommand & Sconnect

Sconnect (or connect.c) is a binary that is most commonly used as the proxy command for SSH. You can download / read more about sconnect here : https://bitbucket.org/gotoh/connect/wiki/Home and it will also tell you how to setup your SSH config. Using a ProxyCommand with Runner is required, you can however use any ProxyCommand you would like. Really quickly here is what you basically need to do.

  1. Download / Compile connect.c
  2. Copy it to /usr/local/bin/sconnect and set executable permissions
  3. In your SSH Config (.ssh/config) add…
    1. Host <ssh-config-profile-name>
      User tuxninja
      ForwardAgent yes
      HostName <bastion_name>
      DynamicForward 8081 (any uncommon port is fine)
    2. Host *.tuxlabs.com
      User tuxninja
      ProxyCommand /usr/local/bin/sconnect -4 -w 4 -S localhost:8081 %h.tuxlabs.com %p

That is basically it. Then you should start a screen session so you can background the SSH session, since you will leave this open for other SSH sessions to proxy through so you don’t have to go through 2-factor authentication more than once. So something like…

screen -S sshsession
ssh <ssh-config-profile-name>

After you authenticate, detach yourself from the screen using CTRL A then D. Now you can ssh to anything @ domain name in my case tuxlab.com and it will forward through the bastion. At this point you still have to authenticate using a username / password, which is fine. Runner deals with this.

Hosts

Runner requires a hosts file to run. By default it is configured to look in hosts/hosts-all for a list of all hosts. I use a script called ‘update-runner-hosts.pl’, which is included in my github to gather hosts from a URL and update the required hosts file. Once you have populated hosts/hosts-all with the FQDN for your hosts, you are ready to use Runner.

Note: You can use ‘-f’ to provide a custom location for your hosts file.

Great Flags / Features

So some of the really great features of Runner are threading (-t), sudo (-s), list only mode (-l) and the regular expression (-r). -r is for pattern matching your hosts lists, which is incredibly handy and absolutely required in an environment with hundreds to thousands of hosts and you only want to select hosts with -r ‘web’ in them.

(-1) one host per pool mode is a great feature, however it is dependent on understanding your environments hostname pattern so you will have to modify the regular expression in the code to make sure it works for you. It is currently setup to identify hostnames in pools when the naming convention is something like apache1234.tuxlabs.com.

Ok I could go on and on about runner, but it’s better to just share the code at this point and let you go! Note the statically defined proxy_command in the code, you may need to change this if you didn’t use sconnect or the same port.

Note: by default runner uses the user you are logged in as to SSH, you can prompt input for a different user with ‘-u’.

All code and accessories are available for download on github : https://github.com/tuxninja/tuxlabs-code/tree/master/runner

Email tuxninja@tuxlabs.com with any question ! Happy SSH’ing admins!

Note: In various versions of this code I had a ‘-h’ allowing you to pass a CSV list of hosts, somehow I let that drop out of this version, sorry ! Feel free to re-add it !

The Runner Code

#!/usr/bin/env python
#Author: Jason Riedel

import paramiko
import getpass
import Queue
import threading
import argparse
import os.path
import time
import logging
import re
import datetime

## SETUP AVAILABLE ARGUMENTS ##
parser = argparse.ArgumentParser()
parser.add_argument('-f', action="store", dest="file_path", required=False, help="Specify your own path to a hosts file")
parser.add_argument('-l', action="store_true", dest="list_only", required=False, help="List all known hosts")
parser.add_argument('-q', action="store_true", dest="quiet_mode", required=False, help="Quiet mode: turns off RUNNER INFO messages.")
parser.add_argument('-qq', action="store_true", dest="super_quiet_mode", required=False, help="Super Quiet mode: turns off ALL RUNNER messages except [INPUT].")
parser.add_argument('-r', action="store", dest="host_match", required=False, help="Select Hosts matching supplied pattern")
parser.add_argument('-c', action="store", dest="command_string", required=False, help="Command to run")
parser.add_argument('-s', action="store_true", dest="sudo", required=False, help="Run command inside root shell using sudo") 
parser.add_argument('-t', action="store", dest="connect_timeout", required=False, help="ssh timeout to hosts in seconds")
parser.add_argument('-T', action="store", dest="threads", required=False, help="# of threads to run (don't get crazy)")
parser.add_argument('-u', action="store", dest="site_user", required=False, help="Specify a username (by default I use who you are logged in as)")
parser.add_argument('-1', action="store_true", dest="host_per_pool", required=False, help="One host per pool")
args = parser.parse_args()

##GLOBAL##
logging.getLogger('paramiko.transport').addHandler(logging.NullHandler())

stime = time.time()

## SET TIMEOUT ##
connect_timeout = 5
if args.connect_timeout:
    connect_timeout = args.connect_timeout

## SET THREADS / WORKERS ##
workers = 20
if args.threads:
    workers = int(args.threads)

## SET USER / PASS ##
site_user = getpass.getuser()
site_passwd = ''
if args.site_user:
    site_user = args.site_user

failed_logins = []
successful_logins = []

tstamp = datetime.datetime.now().strftime("%Y-%m-%d.%H:%M:%S")
logfile_dir = 'logs'
if not os.path.exists(logfile_dir):
    os.makedirs(logfile_dir)
logfile_path = '%s/runner.log.%s' % (logfile_dir, tstamp)
logfile = open(logfile_path, 'w')

## END GLOBAL ##

def ssh_to_host(hosts, site_passwd):
    for i in range(workers):
        t = threading.Thread(target=worker, args=(site_user, site_passwd))
        t.daemon = True
        t.start()

    for hostname in hosts:
        hostname = hostname.rstrip()
        q.put(hostname)

    q.join()

def worker(site_user, site_passwd):
    while True:
        hostname = q.get()
        node_shell(hostname, site_user, site_passwd)
        q.task_done()


def node_shell(hostname, site_user, site_passwd):
    ssh = paramiko.SSHClient()
    proxy_command = "sconnect -4 -w 4 -S localhost:8081 %s %s" % (hostname,'22')
    proxy_sock = paramiko.ProxyCommand(proxy_command)
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    try:
        ssh.connect(hostname, username=site_user, password=site_passwd, timeout=connect_timeout, sock=proxy_sock)
        transport = ssh.get_transport()
        transport.set_keepalive(1)

        cmd = args.command_string
	if args.sudo: 
		try: 
			## have to use invoke shell for sudo due to ssh config on machines requirng a TTY
			channel = ssh.invoke_shell() 
			sudocmd = 'sudo ' + cmd

			channel.send(sudocmd + '\n') 

			buff = ''
			while not '[sudo] password' in buff: 
				resp = channel.recv(9999)
				buff += resp

			channel.send(site_passwd + '\n') 

			buff = ''
			while not buff.endswith('$ '):
				resp = channel.recv(9999)
				buff += resp

			for line in buff.split('\n'):
				log_and_print("%s: %s" % (hostname, line))

		except Exception as e:
			log_and_print("ERROR: Sudo failed: %s" % (e))  
  
	else: 
        	(stdin, stdout, stderr) = ssh.exec_command(cmd)

		## stdout 
        	for line in stdout.readlines():
            		line = line.rstrip()
            		log_and_print("%s: %s" % (hostname, line))
		## stderr
        	for line in stderr.readlines():
            		line = line.rstrip()
            		log_and_print("%s: %s" % (hostname, line))

        successful_logins.append(hostname)
        ssh.close()

    except Exception as e:
        log_and_print("%s: failed to login : %s" % (hostname, e))
        failed_logins.append(hostname)
        ssh.close()

def log_and_print(message):
    if args.super_quiet_mode or args.list_only:
        if "RUNNER [INPUT]" in message or "RUNNER [ERROR]" in message or "RUNNER" not in message:
            print message
            logfile.write(message + '\n')
    elif args.quiet_mode or args.list_only:
        if "RUNNER [INFO]" not in message:
            print message
            logfile.write(message + '\n')
    else:
        print message
        if not args.list_only:
            logfile.write(message + '\n')

def get_hosts(file_path):
    if os.path.exists(file_path):
        hosts = open(file_path)
        selected_hosts = []
        if not args.host_match:
            selected_hosts = list(hosts)
            log_and_print("RUNNER [INFO]: SELECTING ALL HOSTS")
        else:
            host_match = args.host_match
            for host in hosts:
                if re.search(host_match, host):
                    selected_hosts.append(host)
            log_and_print("RUNNER [INFO]: MATCHING HOSTNAMES WITH '%s'" % (host_match))
    else:
        log_and_print("RUNNER [ERROR]: %s does not exist ! Try running ./update-runner-hosts" % (file_path))
        exit()

    ## Select one host per pool
    if args.host_per_pool:
        seen = {}
        host_per_pool = []
        for host in selected_hosts:
	    # Here strip values that make hostnames unique like #'s
	    # That way the dict matches after 1 host per pool has been seen 
            nhost = re.sub("\d+?\.", ".", host) #Removing #'s in a hostname like host1234.tuxlabs.com
            if not nhost in seen:
                seen[nhost] = 1
                host_per_pool.append(host)
        selected_hosts = host_per_pool

    log_and_print("RUNNER: %s HOSTS HAVE BEEN SELECTED" % (len(selected_hosts)))
    return selected_hosts

if __name__ == "__main__":
    file_path = 'hosts/hosts-all' ## update-hosts-all creates the DIR 

    if args.file_path:
        file_path = args.file_path
        if '~' in file_path:
            print "RUNNER [ERROR]: -f does not support '~'"
            exit()

    if args.list_only or args.command_string:
        selected_hosts = get_hosts(file_path)
        if args.list_only:
            for host in selected_hosts:
                host = host.rstrip()
                log_and_print(host)
            log_and_print("\nThere were %s hosts listed." % (len(selected_hosts)))
            exit()

        else:
            log_and_print("RUNNER [INFO]: LOGFILE SET - %s" % (logfile_path))
            log_and_print("RUNNER [INFO]: USER SET - %s" % (site_user))
            log_and_print("RUNNER [INFO]: SSH CONNECT TIMEOUT is: %s seconds" % (connect_timeout))
            log_and_print("RUNNER [INFO]: THREADS SET - %s" % (workers))
	    if args.sudo:
		log_and_print("RUNNER [INFO]: SUDO IS ON") 

            site_passwd = getpass.getpass("RUNNER [INPUT]: Please Enter Site Pass: ")

            q = Queue.Queue()

            ssh_to_host(selected_hosts,site_passwd)

            etime=time.time()
            run_time = int(etime-stime)

            timestamp = str(datetime.timedelta(seconds=run_time))
            log_and_print("\nRUNNER [RESULT]: Successfully logged into %s/%s hosts and ran your commands in %s second(s)" % (len(successful_logins), len(selected_hosts), timestamp))
            log_and_print("RUNNER [RESULT]: There were %s login failures.\n" % (len(failed_logins)))
            if len(failed_logins) > 0:
                for failed_host in failed_logins:
                    log_and_print("RUNNER [RESULT]: Failed to login to: %s" % (failed_host))
    else:
        parser.print_help()
        output = "\nRUNNER [INFO]: Either -l (list hosts only) or -s (Run cmd string) is required.\n"
        log_and_print(output)