Introduction
In this tutorial we'll learn how to set up a Docker Swarm cluster using Ansible to orchestrate the basics.
Provision some nodes
First up, let's provision some machines to set this up on. I'm using DigitalOcean in this example, but you can use whichever cloud provider you like.
I've created three nodes:
- Manager
- Replica
- Node
I have the following inventory file to represent these:
./swarm_cluster
[node]
45.55.144.228 private_ip=10.132.48.106 ansible_ssh_user=root
[manager]
104.236.61.150 private_ip=10.132.44.190 ansible_ssh_user=root
[replica]
159.203.91.233 private_ip=10.132.48.105 ansible_ssh_user=root
(for simplicity, I'm using the actual IPs. Don't worry, this swarm no longer exists ;) ).
We'll basically be following along the official tutorial: Build a Swarm cluster for production
First-up let's bootstrap our nodes. We'll need docker engine and consul on each node. I'm not going to go into detail on these roles, but you will find them in the accompanying github repo for this tutorial. This tutorial assumes that you have the accompanying roles.
Install the basics
swarm.yml
- hosts:
- all
vars:
- initial_cluster_size: 3
pre_tasks:
- name: Install ansible requirements
pip:
name: "docker-py"
state: present
tags:
- swarm
roles:
- ubuntubase
- docker
- consul
initial_cluster_size
is used to help consul bootstrap it's cluster. Set this to the size of your initial cluster.
and run the playbook:
ansible-playbook swarm.yml -i swarm_cluster
Set up the Consul cluster
Ok. So now we have docker and consul running on our server. Let's ssh in and check what's what with consul:
ssh root@104.236.61.150
...
# see the members:
root@manager:~# consul members
Node Address Status Type Build Protocol DC
manager 10.132.44.190:8301 alive server 0.6.3 2 dc1
# we have no members, so we need to manually join our nodes:
root@manager:~# consul join 10.132.48.106 10.132.48.105
Successfully joined cluster by contacting 2 nodes.
# now we have a consul cluster
root@manager:~# consul members
Node Address Status Type Build Protocol DC
manager 10.132.44.190:8301 alive server 0.6.3 2 dc1
node 10.132.48.106:8301 alive server 0.6.3 2 dc1
replica 10.132.48.105:8301 alive server 0.6.3 2 dc1
You can also run: consul monitor
to view the logs. Hopefully you should see that a leader has been elected:
[INFO] consul: adding server foo (Addr: 127.0.0.2:8300) (DC: dc1)
[INFO] consul: adding server bar (Addr: 127.0.0.1:8300) (DC: dc1)
[INFO] consul: Attempting bootstrap with nodes: [127.0.0.3:8300 127.0.0.2:8300 127.0.0.1:8300]
...
[INFO] consul: cluster leadership acquired
Notes:
- Consul is clustered, so it doesn't matter which node you log into.
- Running join will join our nodes into a cluster. This is supposed to happen automatically - but in my case it didn't.
- You can run
consul monitor
to check the logs of our cluster
Important notes about running consul:
If you look at the upstart script we are using to run consul you can see that the command we run is:
/usr/local/bin/consul agent \
-data-dir="/tmp/consul" -ui -bind=10.132.48.106 -client=0.0.0.0 \
-bootstrap-expect 3\
-server\
-client=0.0.0.0
is required because Swarm tries to communicate via the provided consul address (below) on port 8500. By default consul only uses the default loop-back address .. I don't actually know what that means .. but the effect that it has is that swarm cannot reach the consul server and therefore cannot assign a leader .. and basically it won't work. Providing -client means that swarm can communicate on <internal_ip>:8500. To Quote from someone smarter than me:
Finally, the client_addr line tells Consul to listen on all interfaces (not just loopback, which is the default).
Check out the consul UI
The upstart script that runs with our consul setup sets all the nodes to also support the consul UI. To view the UI locally, we'll tunnel through one of our servers:
# create the tunnel
ssh -N -f -L 8500:localhost:8500 root@45.55.144.228
Now you can view the consul UI on your local machine at localhost:8500/ui
Now we have Consul installed and running, let's set up our swarm:
Provision the Swarm
Set up the master
First up, let's set up the swarm manager:
swarm.yml
- hosts:
- manager
tasks:
#$ docker run -d -p 4000:4000 swarm manage -H :4000 --replication --advertise 172.30.0.161:4000 consul://172.30.0.161:8500
- name: Run consul manager
docker:
name: swarm
image: swarm
command: "manage -H :4000 --replication --advertise {{private_ip}}:4000 consul://{{private_ip}}:8500"
state: started
ports:
- "4000:4000"
expose:
- 4000
tags:
- swarm
Notes:
- We're only running this on the manager node/s from our inventory (
hosts: manager
)
Run the playbook again:
ansible-playbook swarm.yml -i swarm_cluster
Now, if you log into the manager
node and run docker ps
, you should see that we have a swarm container running on our server:
root@manager:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3b753e4772b4 swarm "/swarm manage -H :40" 34 seconds ago Up 33 seconds 2375/tcp swarm
Set up the replica
Now, our replicat is actually just another master node. Docker Swarm will handle which master acts like the master. As such, we just add the replica to our hosts list above
- hosts:
- manager
- replica
...
This deviates from the official tutorial, but I found that when trying it the way that the official tutorial works, I often got stuck. For example, my "manager" node would loose the master election and my replica node would become master. In the official tutorial, this is problematic because they do not expose port 4000 on the replica - therefore: one cannot communicate with the master node.
Notes:
- Note in each case we're using the
{{private_ip}}
for consul. This tutorial is slightly different from the official one in that we have installed consul on all our nodes. Because consul is clustered talking to consul on any node is the same. - See the note above about exposing the
client_ip
. Without-client=0.0.0.0
set up, our above swarm command will not be able to communicate with Consul on the provided IP. Further: it will also not be able to communicate withlocalhost
because in that instance,localhost
refers to localhost inside the docker container. This again deviates from the tutorial, however: without this change, the tutorial did not work for me.
Set up the node
To set up the node, add:
- hosts:
- node
tasks:
#$ docker run -d swarm join --advertise=172.30.0.69:2375 consul://172.30.0.161:8500
- name: Run consul node
docker:
name: swarm
image: swarm
command: "join --advertise {{private_ip}}:4000 consul://{{private_ip}}:8500"
state: started
tags:
- swarm
You can now ssh into the node server and you should see that there is a docker swarm container running on there.
Note: The master tries to communicate with the node on port 2375. I needed to add: DOCKER_OPTS="-H tcp://0.0.0.0:2375"
to my config file in /etc/default/docker
. I then also needed to communicate with: -H 2375
when talking to the local docker instance.
Communicate with the Swarm
You now have a Docker Swarm running. Some things you can do:
Log into the swarm manager:
Check the status of the cluster:
docker -H :4000 info
Run a container on the cluster
docker -H :4000 run hello-world
Check which node the container ran on:
docker -H :4000 ps -a
...
Issues I had that appear not be be documented:
- Swarm is unable to communicate with consul. Need to specify
client=0.0.0.0
for consul agent - Docker master is unable to communicate with nodes on port
2375
. Need to addDOCKER_OPTS="-H tcp://0.0.0.0:2375"
to my config file in/etc/default/docker
. This means that the normaldocker ..
command will no longer work on the node. You need to specify:docker -H :2375 ...
.
Next Steps
- Get the swarm working with Docker Compose
- Test the new (beta)
on-node-failure
re-scheduling feature - Test with rolling over an entire node (Chaos Monkey style)
- Look into Registrator to automatically register the nodes with consul ... (or should that actually already be happening?)
References:
- Build a Swarm cluster for production
- Connection refused to Consul UI unless -client option is public address