Running N|Solid on Docker Swarm with NFS filesystem
Note: N|Solid isn't fully supported to run on NFS
Expected Result
Run a docker swarm environment with 3 nodes (1 manager and 2 workers). These 3 nodes are sharing the same NFS directory mount. N|Solid Console is deployed in the manager node. If that node goes down, Swarm will failover to the 2nd or 3rd node, and will be able to recover N|Solid Console using the same shared NFS shared directory.
Reproduce steps:
We need to create/prepare 4 virtual machines using Vagrant, all of them are running CentOS 7.1. The brief functionality for each machine:
- vm 1: Contains the NFS mount and configuration
- vm 2: Docker Swarm - manager node: sharing the NFS mount from vm 1
- vm 3 - 4: Docker Swarm - worker node: sharing the NFS mount from vm 1
Create a Vagrant file in a local folder with this content:
$nfs_mount = <<SCRIPT # configure hostname hostnamectl set-hostname nfs-mount # update #yum update -y # install nfs yum install nfs-utils -y # create the nfs folder mkdir /var/nfsshare # change the permissions of the folder chmod -R 755 /var/nfsshare chown nfsnobody:nfsnobody /var/nfsshare # start the necessary services systemctl enable rpcbind systemctl enable nfs-server systemctl enable nfs-lock systemctl enable nfs-idmap systemctl enable firewalld systemctl start rpcbind systemctl start nfs-server systemctl start nfs-lock systemctl start nfs-idmap systemctl start firewalld # share the nfs directory with all clients echo " /var/nfsshare *(rw,sync,no_root_squash,no_all_squash) /home *(rw,sync,no_root_squash,no_all_squash) " >> /etc/exports # start the nfs service systemctl restart nfs-server # override centos firewall firewall-cmd --permanent --zone=public --add-service=nfs firewall-cmd --permanent --zone=public --add-service=mountd firewall-cmd --permanent --zone=public --add-service=rpc-bind firewall-cmd --reload SCRIPT $docker_swarm = <<SCRIPT # make the manager and workers accessible echo " 10.1.1.162 managernode 10.1.1.163 workernode1 10.1.1.164 workernode2 " >> /etc/hosts # configure hostname hostnamectl set-hostname $1 # install docker wget https://download.docker.com/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker.repo yum install docker-ce -y # start docker services systemctl enable docker systemctl enable firewalld systemctl start docker systemctl start firewalld # open ports for docker swarm firewall-cmd --permanent --add-port=2376/tcp firewall-cmd --permanent --add-port=2377/tcp firewall-cmd --permanent --add-port=7946/tcp firewall-cmd --permanent --add-port=80/tcp firewall-cmd --permanent --add-port=7946/udp firewall-cmd --permanent --add-port=4789/udp # restart the sevice firewall-cmd --reload systemctl restart docker # nfs needs to be available in all nodes yum install nfs-utils -y # create directories for the nfs directory mkdir -p /mnt/nfs/home mkdir -p /mnt/nfs/var/nfsshare # mount the nfs directory into the system mount -t nfs 10.1.1.161:/var/nfsshare /mnt/nfs/var/nfsshare/ -o nolock # configuring the manager node if [[ $1 == 'managernode' ]]; then # set this node as manager docker swarm init --advertise-addr 10.1.1.162 # export worker token for swarm nodes docker swarm join-token worker -q > /vagrant/worker_token fi if [[ $1 == 'workernode1' || $1 == 'workernode2' ]]; then # join this node in swarm as a worker docker swarm join --token $(cat /vagrant/worker_token) 10.1.1.162:2377 fi SCRIPT Vagrant.configure("2") do |config| (1..4).each do |i| config.vm.define "vm_#{i}" do |s| s.vm.box = "bento/centos-7.3" s.vm.network "private_network", ip: "10.1.1.16#{i}", netmask: "255.255.255.0", auto_config: true # necessary to share token for swarm s.vm.synced_folder ".", "/vagrant" # nfs mount config if i == 1 s.vm.network :forwarded_port, guest: 8080, host: 8080 s.vm.network :forwarded_port, guest: 5000, host: 5000 # nfs mount config s.vm.provision :shell, inline: $nfs_mount, privileged: true end if i == 2 # Docker Swarm manager node s.vm.provision :shell, inline: $docker_swarm, args: "'managernode'", privileged: true end if i == 3 # Docker Swarm worker node 1 s.vm.provision :shell, inline: $docker_swarm, args: "'workernode1'", privileged: true end if i == 4 # Docker Swarm worker node 2 s.vm.provision :shell, inline: $docker_swarm, args: "'workernode2'", privileged: true end s.vm.provider "virtualbox" do |vb| vb.name = "vm_#{i}" vb.memory = "1024" vb.cpus = "2" vb.gui = false end end end end
And then a Dockerfile in the same folder of the Vagrant file with the following content:
version: "3.1" services: nsolid-console: image: nodesource/nsolid-console:latest environment: - NSOLID_CONSOLE_LICENSE_KEY=93224ed2-edb0-4f98-af83-cd66b12adbea ports: - 6753:6753 networks: - nsolid volumes: - /mnt/nfs/var/nfsshare/console:/var/lib/nsolid/console deploy: replicas: 2 mode: replicated resources: limits: cpus: '2' memory: 2G restart_policy: condition: on-failure networks: nsolid:
Replace NSOLID_CONSOLE_LICENSE_KEY with the key that was provided to you.
Then run:
$ vagrant up
SSH into the manager node and run:
$ vagrant ssh vm_2
Start the nsolidconsole service in Docker Swarm:
$ docker stack deploy -c /vagrant/Dockerfile ns
You can check the status of the container running:
$ docker service ls
The output should look like:
ID NAME MODE REPLICAS IMAGE PORTS xxx ns_nsolid-console replicated 3/3 nodesource/nsolid-console:latest *:6753->6753/tcp
Check the REPLICAS column to be 3/3, if not, check the logs of the container using:
$ sudo docker service logs [ ID ] > /vagrant/logs-console
After finishing, you should get the N|Solid Console running on the following ip:
# manager node ip address 10.1.1.162:6753 # worker1 node ip address 10.1.1.163:6753 # worker2 node ip adress 10.1.1.164:6753
All of them sharing a shared NFS directory mount configured in 10.1.1.161.
Possible issues when using NFS
- Check the mount option flag has nolock
- Check the services needed for NFS and nolock are running on every reboot
Blockers/Considerations:
- UDP is generally not a valid transport protocol for NFSv4. Early implementations of NFS 4.0 still allowed UDP which allows the UDP transport protocol to be used in rare cases. RFC5661 explicitly states UDP alone should not be used for the transport protocol in NFS 4.1. Errors due to unsupported transport protocols for specific versions of NFS are not always clear. A common error message when attempting to use UDP with NFSv4
- The problems with NFS is it's remote and shared, the server has to do things around the cache and locking (since other hosts accessing the same files could try to change them) which causes problems with applications that expect that the files are local.
Other solutions/possibilities to use NFS:
- N|Solid isn't fully supported to run on NFS but we think a disk image hosted on NFS could be a possible solution. This disk image is different because although the image file is potentially shared when you mount the image the local operating system sees the image as a local filesystem and assumes that regular file access and locking is safe. The other side of this is if you do the same directory mount on multiple systems at the same time it's likely to corrupt it due to these assumptions.