Blog

6 Ways to Speed Up Ansible Playbook Execution

Category
Software development
6 Ways to Speed Up Ansible Playbook Execution

Ansible works by connecting to remote nodes and pushing out small programs called modules to them. The management node (controlling node) controls the entire execution of the playbook. It’s the node from which the user runs the installation. Managed nodes, referred to as ‘hosts’, are the target devices (servers, network appliances, etc.) that Ansible manages.

The inventory file provides the list of hosts where the Ansible modules need to run. The management node makes an SSH connection, executes the modules on the host’s machine, and installs the software. If modules are successfully installed, Ansible removes them. The library of modules can reside on any machine, and you don’t need any daemons, servers, or databases required.

How Ansible Works

If you’re working with Ansible regularly, especially with complex playbooks, you probably wonder if there is a way for playbooks to run more quickly.

Strategies for reducing the time Ansible takes to execute playbooks are:

Below you can find a how to use them, as well as results for all mentioned strategies and their comparison.

Setting up Local Infrastructure

The two crucial things are SSH access and root privileges on Linux servers. The good thing is that today, it’s easy to get low-cost access to Linux virtual machines through public cloud services. You can choose what fits you best, Amazon EC2, Google Cloud Compute Engine, Microsoft Azure, Digital Ocean, etc. But, to avoid spending money on a public cloud, there are alternatives, like Docker and Vagrant.

You can find instructions on how to install Vagrant and set up a working environment on the GitHub Ansible Configuration repository. To compare multiple strategies, we will use simple Ansible roles for Kafka and Nginx, the code you can find in the same GitHub repository.

Ansible Configuration File

There are multiple ways to configure Ansible’s behavior, like environment variables, command-line options, playbook keywords, etc. One of the most common ways is an INI file called Ansible.cfg.

We can refer to this file as the brain. It governs the behavior of all interactions performed by the control node.

It can be in the form of an environment variable, and it can also be placed in multiple directories. Search it in the following order:

  • ANSIBLE_CONFIG (environment variable if set)
  • ansible.cfg (in the current directory)
  • ~/.ansible.cfg (in the home directory)
  • /etc/ansible/ansible.cfg

Ansible will process the above list, use the first file found, and ignore all others. 

The default configuration file (in the /etc/ansible/ansible.cfg) is huge and divided into multiple sections. Square brackets separate each section and are responsible for specific functionality. To check the default configuration file, use the following command:

$ ansible-config listCode language: PHP (php)

Some of the sections in the configuration file are:

[defaults]
[inventory]
[privilege_escalation]
[paramiko_connection]
[ssh_connection]
[persistent_connection]
[sudo_become_plugin]Code language: JSON / JSON with Comments (json)

A basic configuration file can contain a couple of sections (depending on the need), but we can use a [defaults] section with a couple of parameters for test purposes.

An example of what Ansible.cfg can contain:

[defaults]
inventory = hosts
remote_user = vagrant
private_key_file = .vagrant/machines/default/virtualbox/private_key
host_key_checking = FalseCode language: PHP (php)

The above file specifies the location of the inventory file that Ansible will use to determine what hosts it has available to talk to (hosts). It will also connect the default user for executing the playbook (remote_user) and the SSH private key (private_key_file) used to authenticate. Also, SSH host-key checking is disabled (convenient when dealing with Vagrant machines, otherwise, the ~/ssh/known-hosts file needs to be edited every time the Vagrant machine is destroyed and recreated).

Warning: Disabling SSH host-key checking can be a security risk when connecting to other servers over the network.

Ansible Playbook Execution Strategies

Strategies are a way to control play execution. Ansible’s default strategy is linear, running tasks in order and starting the next one when all hosts finish the current one. It uses the number of forks (defaults to 5) to parallelize, and it works great most of the time.

Playbook Execution

Default Strategy

The configuration file for the default strategy (used in this demo) has the following content:

[defaults]
inventory = hosts
remote_user = vagrant
private_key_file = ~/ansible-playground/.vagrant/machines/default/virtualbox/private_key
host_key_checking = False
callback_whitelist = profile_tasksCode language: JavaScript (javascript)

As mentioned, the above code snippet uses the default strategy and default number of forks, so you don’t have to set those values explicitly.

Playbook execution time with this strategy:

PLAY RECAP ****************************************************************************
testserver : ok=27   changed=21   unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
Wednesday 07 June 2023  11:18:01 +0200 (0:00:00.615)       0:00:32.886 
============================================================================
kafka : Download Apache Kafka --------------------------------------- 10.06s
nginx : Install Nginx -----------------------------------------------  6.28s
kafka : Unpack Apache Kafka -----------------------------------------  3.10s
Gathering Facts------------------------------------------------------  1.80s
nginx : Copy TLS files ----------------------------------------------  1.11s
kafka : Force systemd to re-read configs ----------------------------- 0.81s
kafka : Template configuration file to server.properties ------------- 0.73s
nginx : Restart Nginx ------------------------------------------------ 0.62s
nginx : Manage Nginx configuration file ------------------------------ 0.61s
kafka : Restart kafka systemd ---------------------------------------- 0.61s
kafka : Template configuration file to connect-standalone.properties - 0.60s
kafka : Template kafka systemd service ------------------------------- 0.59s
kafka : Install and start the kafka service -------------------------- 0.57s
kafka : Template configuration file to zookeeper.properties ---------- 0.57s
kafka : Create kafka user -------------------------------------------- 0.49s
kafka : Create symlink to kafka installation directory --------------- 0.47s
kafka : Create kafka group ------------------------------------------- 0.46s
kafka : Check if Kafka has already been downloaded and unpacked ------ 0.43s
nginx : Create a TLS certificate directory --------------------------- 0.36s
kafka : Create directory for kafka application logs ------------------ 0.36

Disable Fact Caching

If fact caching is enabled, Ansible will store facts in a cache the first time it connects to hosts. For subsequent playbook runs, Ansible looks at the facts in the cache instead of fetching them from the remote host, until the cache expires. If the play doesn’t reference Ansible facts, you can disable it for that specific play or even disable fact gathering by default for all plays.

The first option is to disable the fact caching for a specific play, and you can do that by adding the gather_facts: False in the Ansible playbook:

---
- name: Ansible Playbook
  hosts: webservers
  become: True
  gather_facts: false
  roles:
    - kafka
    - nginxCode language: PHP (php)

The second option is to disable facts by modifying the Ansible.cfg file:

[defaults]
gathering = explicit

There is one more option regarding the facts: smart gathering. In this case, Ansible gathers facts only if they are not present in the cache or if it expired. To use it, you have to set a fact_caching implementation in Ansible.cfg explicitly, or Ansible will not cache facts between playbook runs.

Tip: To use fact caching, playbook should not explicitly specify gather_facts: True or False. With smart gathering enabled in the configuration file, Ansible will gather facts only if they are not present in the cache.

[defaults]
gathering = smart
# 24-hour timeout, adjust if needed 
fact_caching_timeout = 86400					
# Specify a fact caching implementation
fact_caching = ...Code language: PHP (php)

Playbook execution time without fact caching:

PLAY RECAP ****************************************************************************
testserver : ok=26   changed=21   unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
Wednesday 07 June 2023  11:31:24 +0200 (0:00:01.057)       0:00:27.341
============================================================================
kafka : Download Apache Kafka ---------------------------------------- 5.68s
nginx : Install Nginx ------------------------------------------------ 4.86s
kafka : Unpack Apache Kafka ------------------------------------------ 2.99s
kafka : Install and start the kafka service -------------------------- 1.32s
kafka : Create kafka group ------------------------------------------- 1.11s
nginx : Copy TLS files ----------------------------------------------- 1.08s
nginx : Restart Nginx ------------------------------------------------ 1.06s
kafka : Restart kafka systemd ---------------------------------------- 1.01s
kafka : Force systemd to re-read configs ----------------------------- 0.80s
kafka : Template configuration file to server.properties ------------- 0.70s
kafka : Create kafka user -------------------------------------------- 0.60s
kafka : Template configuration file to connect-standalone.properties - 0.59s
kafka : Template kafka systemd service ------------------------------- 0.58s
kafka : Template configuration file to zookeeper.properties ---------- 0.58s
nginx : Manage Nginx configuration file ------------------------------ 0.55s
kafka : Check if Kafka has already been downloaded and unpacked ------ 0.49s
kafka : Create symlink to kafka installation directory --------------- 0.43s
kafka : Delete the kafka archive file -------------------------------- 0.36s
kafka : Create directory for kafka data log files -------------------- 0.35s
nginx : Create a TLS certificate directory --------------------------- 0.33s
Back to all Strategies

Increase the Number of Forks

Ansible will connect to the hosts in parallel to execute the tasks for each task. But, it only connects to some of the hosts in parallel. Instead, it controls the level of parallelism by a parameter called forks, which defaults to 5.

Forks decide the maximum number of simultaneous connections Ansible makes on each task under a single run. In other words, Ansible executes a task on the first five hosts, waits, takes the next batch of five hosts, and so on. This option will not provide a significant increase in execution time since its primary purpose is to perform a task on a large number of hosts quickly (takes advantage of the processing power of multiple hosts at the same time).

The first way to change the number of forks is to set the ANSIBLE_FORKS environment variable using the following command:

$ export ANSIBLE_FORKS=10Code language: JavaScript (javascript)

The other way is to change the number of forks in the configuration file (used in the demo):

[defaults]
...
forks=10

Playbook execution time with increased forks number:

PLAY RECAP ****************************************************************************
testserver : ok=23   changed=2    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0
Wednesday 07 June 2023  11:26:04 +0200 (0:00:00.512)       0:00:26.426 
============================================================================
kafka : Download Apache Kafka ---------------------------------------- 5.31s
nginx : Install Nginx ------------------------------------------------ 4.83s
kafka : Unpack Apache Kafka ------------------------------------------ 2.80s
Gathering Facts ------------------------------------------------------ 1.79s
nginx : Copy TLS files ----------------------------------------------- 1.12s
kafka : Force systemd to re-read configs ----------------------------- 0.86s
kafka : Template configuration file to server.properties ------------- 0.71s
kafka : Template kafka systemd service ------------------------------- 0.63s
kafka : Template configuration file to connect-standalone.properties - 0.62s
kafka : Restart kafka systemd ---------------------------------------- 0.61s
kafka : Install and start the kafka service -------------------------- 0.61s
nginx : Manage Nginx configuration file ------------------------------ 0.60s
kafka : Create kafka user -------------------------------------------- 0.58s
kafka : Template configuration file to zookeeper.properties ---------- 0.56s
nginx : Restart Nginx ------------------------------------------------ 0.51s
kafka : Create kafka group ------------------------------------------- 0.47s
kafka : Create symlink to kafka installation directory --------------- 0.46s
kafka : Check if Kafka has already been downloaded and unpacked ------ 0.42s
kafka : Delete the kafka archive file -------------------------------- 0.41s
kafka : Create directory for kafka data log files -------------------- 0.33s

Back to all Strategies

Enable Pipelining

We can describe the simplified way of how Ansible executes a task with the following steps:

  • It generates a Python script based on the invoked module
  • It copies the Python script to the host
  • It executes the Python script

Using pipelining, Ansible will execute the Python script by piping it to the SSH session instead of copying it. It saves time because it uses one SSH session instead of two. Pipelining can be enabled by modifying Ansible.cfg file:

[defaults]
...
pipelining=TrueCode language: PHP (php)

When using sudo operations in the playbook, you must disable requiretty in /etc/sudoers on the managed hosts. Pipelining is disabled by default since requiretty is enabled for many distributions. Pipelining replaces the former Accelerated Mode.

Playbook execution time with pipelining enabled:

PLAY RECAP ****************************************************************************
testserver : ok=27   changed=21   unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
Wednesday 07 June 2023  11:37:19 +0200 (0:00:00.631)       0:00:27.054
============================================================================
kafka : Download Apache Kafka ---------------------------------------- 5.42s
nginx : Install Nginx ------------------------------------------------ 4.80s
kafka : Unpack Apache Kafka ------------------------------------------ 2.93s
Gathering Facts ------------------------------------------------------ 1.75s
nginx : Copy TLS files ----------------------------------------------- 1.17s
kafka : Force systemd to re-read configs ----------------------------- 0.83s
kafka : Template configuration file to server.properties ------------- 0.77s
kafka : Template configuration file to zookeeper.properties ---------- 0.64s
nginx : Restart Nginx ------------------------------------------------ 0.63s
kafka : Install and start the kafka service -------------------------- 0.63s
kafka : Restart kafka systemd ---------------------------------------- 0.62s
kafka : Template configuration file to connect-standalone.properties - 0.60s
nginx : Manage Nginx configuration file ------------------------------ 0.59s
kafka : Template kafka systemd service ------------------------------- 0.58s
kafka : Create kafka user -------------------------------------------- 0.49s
kafka : Create symlink to kafka installation directory --------------- 0.47s
kafka : Create kafka group ------------------------------------------- 0.46s
kafka : Check if Kafka has already been downloaded and unpacked ------ 0.45s
kafka : Create directory for kafka application logs ------------------ 0.42s
kafka : Register '/opt/kafka/logs' directory status ------------------ 0.38sCode language: JavaScript (javascript)

Pipelining enables Ansible to send commands directly to STDIN through a persistent SSH connection. Since we use a single machine in this test environment, there is no significant playbook execution time decrease.

Back to all Strategies

SSH Multiplexing

Ansible uses SSH as its primary transport mechanism for communicating with servers. Because the SSH protocol runs on top of the TCP protocol, a new TCP connection must be established when it establishes a connection to a remote machine with SSH. TCP connection has the standard client and server negotiation, which takes a small amount of time. When Ansible runs a playbook, it makes many SSH connections to do things such as copy over files and run commands. Each time Ansible makes a new SSH connection to a host, it has to pay this negotiation penalty.

OpenSSH is the most common implementation of SSH, and it supports an optimization called SSH multiplexing. When using it, multiple SSH sessions to the same host will share the same TCP connection, so the TCP connection negotiation happens only the first time.

When SSH multiplexing is enabled, on the first SSH attempt to a host, OpenSSH starts a master connection, which is by default open for 60 seconds.

To enable SSH multiplexing, add the following in the Ansible.cfg file:

[defaults]
...
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=2m

The ControlMaster=auto line enables SSH multiplexing, and it tells SSH to create the master connection and the control socket if it does not exist yet. ControlPersist=2m tells SSH to close the master connection if there have been no SSH connections for 2 minutes.

Playbook execution time with SSH multiplexing enabled:

PLAY RECAP ****************************************************************************
testserver : ok=27   changed=21   unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
Wednesday 07 June 2023  12:50:09 +0200 (0:00:00.591)       0:00:26.249
============================================================================
kafka : Download Apache Kafka ---------------------------------------- 5.64s
nginx : Install Nginx ------------------------------------------------ 4.99s
kafka : Unpack Apache Kafka ------------------------------------------ 2.98s
Gathering Facts ------------------------------------------------------ 1.18s
nginx : Copy TLS files ----------------------------------------------- 1.09s
kafka : Force systemd to re-read configs ----------------------------- 0.82s
kafka : Template configuration file to server.properties ------------- 0.71s
nginx : Manage Nginx configuration file ------------------------------ 0.66s
kafka : Restart kafka systemd ---------------------------------------- 0.63s
nginx : Restart Nginx ------------------------------------------------ 0.59s
kafka : Install and start the kafka service -------------------------- 0.58s
kafka : Template configuration file to connect-standalone.properties - 0.56s
kafka : Template configuration file to zookeeper.properties ---------- 0.55s
kafka : Template kafka systemd service ------------------------------- 0.55s
kafka : Create kafka user -------------------------------------------- 0.50s
kafka : Create kafka group ------------------------------------------- 0.47s
kafka : Check if Kafka has already been downloaded and unpacked ------ 0.45s
kafka : Create symlink to kafka installation directory --------------- 0.45s
kafka : Delete the kafka archive file -------------------------------- 0.37s
kafka : Create directory for kafka data log files -------------------- 0.32s

Configuring an SSH connection with ControlMaster and ControlPersist parameters provides a significantly decreased execution time if it uses multiple managed nodes to execute the tasks. ControlMaster allows multiple simultaneous SSH connections with the remote host using a single network connection, which cannot live up to its full potential with a single managed node.

Back to all Strategies

Free Strategy 

In contrast to linear, Ansible will not wait for the results of the task to execute on all hosts. Instead, if a host completes one task, Ansible will execute the next task on that host.

To set up the free strategy, add the strategy : free in the playbook file:

---
- name: Ansible Playbook
  hosts: webservers
  become: True
  strategy: free
  roles:
    - kafka
    - nginxCode language: PHP (php)

Playbook execution time using the free Ansible strategy:

PLAY RECAP ****************************************************************************
testserver : ok=27   changed=21   unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
Wednesday 07 June 2023  12:42:08 +0200 (0:00:00.516)       0:00:26.217 
============================================================================
nginx : Install Nginx ------------------------------------------------ 5.16s
kafka : Download Apache Kafka ---------------------------------------- 4.73s
kafka : Unpack Apache Kafka ------------------------------------------ 3.03s
Gathering Facts ------------------------------------------------------ 1.74s
nginx : Copy TLS files ----------------------------------------------- 1.15s
kafka : Force systemd to re-read configs ----------------------------- 0.83s
kafka : Template configuration file to server.properties ------------- 0.72s
kafka : Install and start the kafka service -------------------------- 0.65s
kafka : Restart kafka systemd ---------------------------------------- 0.62s
kafka : Template configuration file to connect-standalone.properties - 0.58s
kafka : Template kafka systemd service ------------------------------- 0.56s
kafka : Template configuration file to zookeeper.properties ---------- 0.56s
nginx : Manage Nginx configuration file ------------------------------ 0.55s
nginx : Restart Nginx ------------------------------------------------ 0.52s
kafka : Create kafka user -------------------------------------------- 0.50s
kafka : Create symlink to kafka installation directory --------------- 0.46s
kafka : Create kafka group ------------------------------------------- 0.43s
kafka : Check if Kafka has already been downloaded and unpacked ------ 0.42s
kafka : Delete the kafka archive file -------------------------------- 0.39s
kafka : Create directory for kafka data log files -------------------- 0.36s

Using this strategy, Ansible does not wait for other hosts to finish the current task, and the playbook execution time will not significantly decrease since it uses only one host in the test scenario.

Back to all Strategies

Ansible Mitogen

Mitogen for Ansible is a module runtime for Ansible. Requiring minimal configuration changes, it updates Ansible’s slow and wasteful shell-centric implementation with pure-Python equivalents, invoked via highly efficient remote procedure calls to persistent interpreters tunneled over SSH. No changes are required to target hosts.

You can install the Mitogen in two ways. The first way to download .tar.gz package from the Mitogen site. The second way is using pip (in case you’re using the Python virtual environment):

$ pip install mitogen

To use the Ansible Mitogen, change in Ansible.cfg file is required:

[defaults]
...
strategy_plugins = ~/ansible_playground_env/lib/python3.8/site-packages/ansible_mitogen/plugins/strategy
strategy = mitogen_linearCode language: JavaScript (javascript)

Where strategy_plugins is the path to the strategy directory depends on how you install Mitogen. If you installed using pip, you can find it in /lib/python3.8/site-packages directory, and if you used .tar.gz, it depends on the extraction location.

Playbook execution time using Mitogen plugin:

PLAY RECAP ****************************************************************************
testserver : ok=27   changed=21   unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
Wednesday 07 June 2023  12:56:34 +0200 (0:00:00.215)       0:00:23.640 
============================================================================
kafka : Download Apache Kafka ---------------------------------------- 8.56s
nginx : Install Nginx ------------------------------------------------ 4.94s
Gathering Facts ------------------------------------------------------ 4.22s
kafka : Unpack Apache Kafka ------------------------------------------ 2.43s
kafka : Install and start the kafka service -------------------------- 0.43s
kafka : Force systemd to re-read configs ----------------------------- 0.38s
kafka : Restart kafka systemd ---------------------------------------- 0.29s
nginx : Restart Nginx ------------------------------------------------ 0.22s
kafka : Template configuration file to server.properties ------------- 0.21s
kafka : Create kafka user -------------------------------------------- 0.19s
kafka : Create kafka group ------------------------------------------- 0.15s
kafka : Create symlink to kafka installation directory --------------- 0.15s
kafka : Check if Kafka has already been downloaded and unpacked ------ 0.14s
nginx : Manage Nginx configuration file ------------------------------ 0.13s
nginx : Copy TLS files ----------------------------------------------- 0.12s
kafka : Template configuration file to connect-standalone.properties - 0.11s
kafka : Template configuration file to zookeeper.properties ---------- 0.11s
kafka : Template kafka systemd service ------------------------------- 0.10s
kafka : Delete the kafka archive file -------------------------------- 0.10s
kafka : Create directory for kafka application logs ------------------ 0.08s
Back to all Strategies

Ansible Strategies Comparison and Benchmark Results

All methods mentioned above will help Ansible decrease the playbook time execution for at least a bit. It is important to pinpoint that execution time will differ for initial and subsequent runs. The first time a playbook runs on a remote host generally takes the longest, as it installs the packages, creates directories and files, etc. Any subsequent runs of the same playbook are faster due to Ansible’s idempotence checking.

The best way to find the execution time of the playbook is by enabling the profile_task callback in Ansible.cfg file:

[defaults]
...
callback_whitelist = profile_tasks

With this, the playbook execution output will contain timing information for each task and a sorted list of tasks at the end of the run.

Wednesday 07 June 2023  12:56:34 +0200 (0:00:00.215)       0:00:23.640 
============================================================================
kafka : Download Apache Kafka ---------------------------------------- 8.56s
nginx : Install Nginx ------------------------------------------------ 4.94s
Gathering Facts ------------------------------------------------------ 4.22s

Initial run

As already mentioned, the first time a playbook runs on a host is the longest.

Initial Run Strategies to Speed Up Playbook Execution

Every strategy we used decreased the execution time of the playbook, but there is no significant difference between them on the first look. But if we calculate the percentage, the situation is slightly different, as shown below.

Initial Run Percentage Strategies to Speed Up Playbook Execution

In the case of the Mitogen Plugin, execution time is decreased by almost 30%, which is significant (e.g., if the total execution time is 30 minutes, with the plugin used, execution time could go to 20 minutes).

Interestingly, the execution time of the longest task was not changed significantly, as can be seen in the following table.

Initial Run in Seconds Strategies to Speed Up Playbook Execution

Subsequent run

After a playbook has already run on a host, subsequent runs of the same playbook are faster due to Ansible’s idempotence checking.

Ansible Subsequent Run Strategies to Speed Up Playbook Execution

The big difference can be seen in the execution time of the longest task (usually installing the service, downloading the .tar.gz file, etc).

Ansible Subsequent Run in Seconds Strategies to Speed Up Playbook Execution

You can see the visual representation of elapsed time (in seconds) in the first and subsequent runs below.

Initial vs Subsequent Run Strategies to Speed Up Playbook Execution

And here, you can see the equivalent speed-up (in percentages):

Ansible Initial vs Subsequent Run Strategies to Speed Up Playbook Execution

We can see the most significant difference when using the Mitogen plugin with repetitive tasks (in this case, creating and destroying 100 files in the same directory):

Ansible Repetative Tasks Strategies to Speed Up Playbook Execution

In terms of numbers, playbook execution diminished from 1 minute 8 seconds to just a little over 8 seconds, which indicates an increase of ~88%.

Repetative Tasks in Seconds Strategies to Speed Up Playbook Execution

Conclusion

All the mentioned strategies improved playbook execution times, with Ansible Mitogen emerging as the winner. SSH Multiplexing and using the Free strategy also show good time increases, which are good results for such a small configuration change.

Increasing the time using any strategy depends on the playbook/role itself and the environment. The execution time relies on multiple configurations; the key is finding the best combination of configuration parameters.

Regarding test results that don’t show considerable decreases, remember that we used only one host in the test scenario. In a real work environment, we would use multiple hosts, and then all the strategies mentioned will serve their purpose more effectively.

Among the strategies mentioned above, there is a complete list of other parameters to control and optimize playbook execution, such as serial, run_once, and more.

Wanna read more?

We also talked about Tips for Optimizing Functions in Azure Service Bus, as well as how we developed Quarkus Cache with Redis.

CONTACT US

Exceptional ideas need experienced partners.