Best practices to build great Ansible playbooks

I have been building playbooks for two years and I always try to make them understandable. However, one of our playbooks in a big project became very complex and is now very hard to understand and to maintain. I didn’t want to make that mistake again so I made a list with my personal good practices to prevent building complex playbook that I share with you here.

TL;DR: If you don’t want to build unintelligible playbooks that will cost you a lot in the long run, you should read my following suggestions.

[Edit]: I’m writing a book about Ansible, click here if want a free draft.

Make it simple

I am a web developer, not an OPS. I build Node.js/AngularJs and Symfony applications. I remember my first times doing some provisioning, it was running with Puppet. It was brilliant but also very complex. I had a really hard time understanding how it worked and debugging it wasn’t the funniest part of my life. I hated working with provisioning at that time. I still make those nightmares about it…

I was then introduced to Ansible and I completely changed my mind about provisioning \o/. Now I love it. And I want beginners to feel the same as I do. Therefore I really focus on making my playbooks truly simple so they can be understandable by everyone. So if you also believe that automated provisioning should be applied by everyone, then think about beginners when you write your code and make everything simple. Everybody should be able to use your playbooks.

Baby Ansible

(Thanks Justin Nemmers for the picture !)

Most of the time my playbooks are used to provision servers running a webapp and since I always run them on fewer than 5 servers at a time, I don’t try to make them quick (to execute), speed is not a big deal for me. But I guess execution time is an important factor when you have to run your playbook on thousands of servers.

Therefore, I think that my best practices fit only for people who use Ansible in the same way as I do.

Avoid using tags

no tags

I often read that we should tag everything, but I disagree.

TL;DR: It can be OK to tag a role in a playbook, but no one should use tag inside a role.

Why do tags exist in Ansible? Let’s see what the Ansible documentation says:

If you have a large playbook it may become useful to be able to run a specific part of the configuration without running the whole playbook.
Both plays and tasks support a “tags:” attribute for this reason.

There are ways to avoid tags

I use Ansible to provision my servers or to update a configuration on some servers. Playbooks have to be idempotent and it’s better to check idempotency at each execution.

So the only reason why I could be using tags is to gain execution time. But as I said earlier, time is not a big deal for me. Therefore I have no interest in using them.

If you are building your playbook and you want to run only some roles to gain time, just comment on the unnecessary roles in the playbook. You don’t need tags for that.

Sometimes, you may have to perform a specific action. Like during the “heartbleed crisis”, you had to update your bash on almost every server. But even in this kind of situation, you don’t use tags, you use a specific new playbook.

When you run a playbook using tags you are often targeting specifics roles like the example from the Ansible documentation

- name: be sure ntp is installed
  yum: pkg=ntp state=installed
  tags: ntp

- name: be sure ntp is configured
  template: src=ntp.conf.j2 dest=/etc/ntp.conf
  notify:
    - restart ntpd
  tags: ntp

- name: be sure ntpd is running and enabled
  service: name=ntpd state=running enabled=yes
  tags: ntp

In this case, I’d rather have a playbook “ntp” that run only the ntp role than using the tags.

Which one of the following commands tell us the most clearly what is going to happen?

ansible-playbook playbook.yml -i hosts/production --tags="ntp"

ansible-playbook ensure-ntp-is-working.yml -i hosts/production

The first one is talking about ntp. But we don’t know the purpose of running this command. With the second one, we know that it is to ensure that ntp is working.

Think about beginners! Make their life easier, don’t use tags.

not suited

Tags come from the dark side

So tags are not useful. But there is worse. They are harmful to your playbook because they add complexity. When you add a tag, this tag is here for a purpose. Therefore, every time they see a tag in your playbook, people have to understand the purpose of it. As a result, your playbook is becoming more difficult to understand and to maintain.

Here is a part of one of our nginx’s role.

- name: add repo file
  template: src=nginx.repo dest=/etc/yum.repos.d/nginx.repo mode=644 owner={{ app_user }}
  tags:
  - nginx
  - packages

- name: install
  yum: name=nginx enablerepo=nginx state=latest
  tags:
  - nginx
  - packages
  - yum

- name: create working directories
  file: path={{ item }} state=directory mode=755 owner={{ app_user }}
  with_items:
    - "{{ etc_path }}/nginx"
    - "{{ etc_path }}/nginx/conf.d"
    - "{{ log_path }}/nginx"
    - "{{ tmp_path }}/nginx"
  tags:
  - nginx
  - filesystem

- name: remove nginx conf file
  file: path=/etc/nginx/nginx.conf state=absent
  tags:
  - nginx
  - yum-cleaning

- name: disable service as sudo_user
  service: name=nginx enabled=no
  tags:
  - nginx
  - services
  - yum-cleaning

There are six different tags (nginx, packages, yum, filesystem, yum-cleaning, service) making at least 6 ways of using the role. If you want to test your playbooks (and you should), you will have to test every combination that your tags permit instead of one single run without tags. It’s already hard to test the playbooks because they depend on the Ansible version and the OS version. If you add tags, it becomes really annoying.

In this role, if we want to test every possible combination, we will have to test 2^6 = 64 combinations!

GOT maths

The same part of the role without the tags is pretty simple. We know that every task will be run anytime without any condition or specific purpose to understand.

- name: add repo file
  template: src=nginx.repo dest=/etc/yum.repos.d/nginx.repo mode=644 owner={{ app_user }}

- name: install
  yum: name=nginx enablerepo=nginx

- name: create working directories
  file: path={{ item }} state=directory mode=755 owner={{ app_user }}
  with_items:
  - "{{ etc_path }}/nginx"
  - "{{ etc_path }}/nginx/conf.d"
  - "{{ log_path }}/nginx"
  - "{{ tmp_path }}/nginx"

- name: remove nginx conf file
  file: path=/etc/nginx/nginx.conf state=absent

- name: disable service as sudo_user
  service: name=nginx enabled=no

It’s nearly impossible to remember all the tasks of your playbook that have a particular tag. So it’s never crystal clear what you are doing when you run your playbooks using tags.

random

One role, one goal

Avoid tasks within a role that are not related to each other. Don’t build “common role”. It’s ugly and it’s bad for the readability of your playbook.

Instead, try to make more little roles with explicit names.

See this playbook:

- name: My playbook
  hosts: all
  sudo: true
  vars_files:
    - vars/main.yml

  roles:
    - common #What the hell does this role do?
    - Stouts.nodejs
    - Stouts.mongodb

And this one:

- name: My playbook
  hosts: all
  sudo: true
  vars_files:
    - vars/main.yml

  roles:
    - ubuntu-apt
    - create-www-data-user
    - Stouts.nodejs
    - Stouts.mongodb

The last one is easier to understand.

Convention on naming your role.

I really like roles that can be run on many OSs and for many kinds of application. I try to make my roles as generic as I can. But, if in order to work on many OSs or for many applications, the role becomes too complex, then I prefer using a more specific but simpler role.

simplicity

If your role works only on some OSs or for a specific kind of application, you should explicitly say so in the role’s name. By making so, just by reading the role’s name we know how specific it is.

We often find the name of the author in the name of a role but having it doesn’t help us a lot to understand how a role works. So I think it shouldn’t be in the role’s name.

Taking into account what I just said, my convention on how to name a role is OS-Application-Purpose.

A few examples:
- debian-symfony-nginx: I can quickly understand that this role provision nginx for Symfony’s applications on every OS of the Debian’s family.
- ubuntu-mongodb: The role will install and configure MongoDB on Ubuntu.
- nodejs: Here the role will install Node.js on every kind of OS.

Make explicit the dependencies of a playbook

A playbook should be read like a great story in a book. You read the roles from top to bottom and you understand everything that is going on with the playbook, period.

If you want to quickly understand what a playbook does which solution do you prefer?

- name: My playbook with dependencies
  hosts: all
  sudo: true
  vars_files:
    - vars/main.yml

  roles:
    - elasticsearch
    - drupal

- name: My playbook without dependencies
  hosts: all
  sudo: true
  vars_files:
    - vars/main.yml

  roles:
    - java
    - elasticsearch
    - git
    - apache
    - mysql
    - php
    - php-mysql
    - composer
    - drush

When you read the second one, you see every role involved in the playbook and you don’t have to dig to get the information.

One server, one run to provision it the first time

You can have many playbooks to manage your server in your daily operations like the ensure-ntp-is-working.yml playbook.
But you should be able to provision it the first time with only one playbook.

You shouldn’t do something like this:

ansible-playbook -i hosts/myserver playbook_part1.yml
ansible-playbook -i hosts/myserver playbook_part2.yml
ansible-playbook -i hosts/myserver playbook_part3.yml

Instead, you should be able to provision your server with:

ansible-playbook -i hosts/myserver playbook.yml

one playbook

If you have more than one playbook, most of the time you will have to remember the order in which you have to run them. Testing and checking idempotency is also more practical having one playbook instead of many.

Explore Ansible Galaxy

There are many great roles over there. Instead of rewriting everything go forking! Look at how other people do, you will learn faster.

Don’t use requirements.txt

When you type ansible-galaxy install -r requirements.txt, it uses the tag of the Github repository to git clone it. A git tag doesn’t set a version of the code. The code behind a tag can be updated. So you can’t know for sure what you will end up downloading in this way. And this is bad for many reasons including security.

danger

Instead use GIT and add the roles as submodules. It allows you to update the roles while being confident about the version of the code you download.

Put the community’s roles in a separate folder

If you want to be able to update the roles you found on ansible-galaxy or directly on Github you should put it in a separate folder so you can quickly find your roles (that you can change) and the community role (that you shouldn’t change). You will end with something like:


├── group_vars
├── hosts
├── roles
│   ├── community
│   │   ├── composer
│   │   ├── ubuntu-apt
│   │   ├── ubuntu-mysql
│   │   └─ ubuntu-symfony-nginx
│   ├── my_app-monitoring
│   └─ ubuntu-rabbitmq
├── vars
└─ playbook.yml

If you really want to make a change to a community role, you have two options:
- You make a pull-request for this change.
- You make the change and you move the role in the directory with your own roles.

If you choose to separate your roles, you have to tell Ansible where it can find them with the ansible.cfg file like for example:


[defaults]
roles_path = devops/provisioning/roles/community
inventory = devops/provisioning/hosts

Don’t use ‘vagrant provision’

If you are a Vagrant user, you may be using the ansible provisioner. I know that the command is very convenient but it will confuse beginners a lot because there are differents concepts involved:
- Creating the VM (Virtualbox)
- Configuring the VM (Vagrant)
- Provisioning the VM (Ansible)

It’s already not easy to understand each role of Virtualbox and Vagrant during the creation of the VM. If you mix it with the role of Ansible, you are not helping…

not helping

And honestly, you won’t lose so much time using

ansible-playbook playbook.yml -i hosts/vagrant

instead of

vagrant provision

At Theodo, we use OpenStack to create VMs for our staging environment. There is also plugin to create OpenStack’s VM from the Vagrantfile. Same story here, don’t use this plugin because it’s highly confusing.

One day one brilliant trainee spent almost the whole day trying to create and provision one OpenStack VM via the Vagrantfile. Because we were doing vagrant up and vagrant provision for the dev environment, he wanted to do the same for staging one to have consistency.

true story

Keep it light

This is the same as any other part of your code: you should delete every file and directory that is not mandatory.

about perfection

Check out our open source projects

We have a small open source organization on Github, have a look at out our website. Our goal is to provide simple-to-use tools to make the provisioning with Ansible easier and easier. By doing this, we hope that we will help a lot of people learning and enjoying Ansible. If you like the spirit of this organization, we are looking for people to help us. Just make a nice PR and you will be welcome! =).

help hand

Other great tips

I really liked this article about “Ansible (Real Life) Good Practices”.
You will find other opinions about tags… ;).

It will explain why you should:
- Use a module when available
- Set a default for every variable
- Use the ‘state’ parameter
- Prefer scalar variables
- Tag every task and role

You can find some other tips here like “Beware of default” in these slides.

Thanks for reading, if you want to share your bests tips with us, they are very welcome! The fastest way is to ping me on Twitter.

If you need help to build a nice Agile/DevOps working environment, we will love helping you!