Everybody wants to grow up. And as our services over the Internet grow, we must create a bigger infraestructure. I’ll dedicate a few words about services scalation in future pages on this guide. But one of our options is to replicate our filesystem (or part of it).
Doing it, if we have two servers (balanced, some day, I’ll write about it), serving the same pages, they may seem independent, but sharing the files to serve, so, if I update serverA, it will be replicated transparently to serverB, if a user uploads a file to serverB, serverA will see it (not instantly, but we don’t have to care much about it), and, the best of it, if one of the server fails, the other one will still work and will synchronize the broken server when it’s up.
My recommendation: don’t use it for all our files. Page caches, logs and temporary stuff is not important and will overload our systems. Think about statics (images, css, js), uploads and code (or maybe you may want another code control, rsync, git, svn…). These files may not be replicating all the time as they are not being created/deleted/modified a lot, but when we want one, it’s being replicated over our servers.
Note: This guide is based on an Ubuntu based distribution (Debian, Mint, etc), I will use sudo to get privileges and apt-get to install programs. Also some configuration files may be special (I hope they aren’t). You could use this guide to configure a replicated file system between two servers of a different distribution adapting it a little bit.
About this approach: I’m doing it with a few servers, just two servers, acting as clients at the same time. So they are storage and web servers. But you may want to leave storage servers just for storage and mount our virtual disks in our server nodes (I will add some notes for storage clients). It depends on how big your system is.
Set up hosts
Ok, let’s start with it. The first thing to do is to name all our servers. That is to assign a host name to every IP in our cluster. You may have a domain name and your site maybe on that domain, but it’s not safe to do it with Internet IPs, because these storages are intend to be located behind a firewall and isolated from Internet. But, if you have bought a simple VPS you aren’t asked and your VPS will have a beautiful external IP, in this case it’s recommended to use the internal network IP or even a VPN IP. If you have a large server farm you may want to install a dns server, like bind9, but for this example, we will just edit the /etc/hosts in all our nodes and add these lines:
# In case you have a client...
Make sure, you are able to ping one server from the other, and from the client (if applicable).
In Debian/Ubuntu/Mint… just do:
$ sudo apt-get install glusterfs-server glusterfs-common glusterfs-client
But, I think writing glusterfs-server, it installs the others.
Now, make sure, you have TCP ports 111, 24007, 24008, 24009, 24009+(1..n) with n as the number of total bricks across all volumes open. If you use UFW, write:
$ sudo ufw allow 111
$ sudo ufw allow 24007
$ sudo ufw allow 24008
$ sudo ufw allow 24009
$ sudo ufw allow 24010
$ sudo ufw allow 24011
To have a two bricks glusterfs volume.
Let’s introduce the servers, letting them meet each other doing a peer probe. That’s easy, on our first host (storagenode1.mydomain.com):
$ sudo gluster peer probe storagenode2.mydomain.com
Now, we have a storage pool. We can use the bricks whatever we want, but in this example we will use the bricks for data replication across the network.
If we write (on storagenode1.mydomain.com):
$ sudo gluster peer status
Number of Peers: 1
State: Peer in Cluster (Connected)
Just doing it in one node, the other node will recognise the other, so if we ask for peer status inside storagenode2.mydomain.com will say storagenode1.mydomain.com is in cluster.
Set up the directories
Now, we must create our data directories in our server nodes. So, in both nodes we will use /home/user/shared/volumes/statics to store information. So we will write in both servers:
$ sudo mkdir -p /home/user/shared/volumes/statics
This directory may be auto-created my glusterfs, but I like to create it by hand to assign user permissions or owner before anything else (chown/chmod the way you like).
Why inside /home? That’s my case, because in my VPS (where I’m testing all of these instructions) i have more storage space in this directory.
Create the replicated volume
We will create a replicated file system with two bricks (one in storage1.mydomain.com and the other one in storagenode2.mydomain.com)
$ sudo gluster volume create statics replica 2 transport tcp storagenode1.mydomain.com:/home/user/shared/volumes/statics storage2.mydomain.com:/home/user/shared/volumes/statics
Creation of volume statics has been successful. Please start the volume to access data.
If you experience failed operations here, test our servers are seeing each other, ping them. Look at
$ sudo gluster peer status
to verify they are both in cluster. If not, you can delete them by deleting files in /etc/glusterd/peers/* (you can delete all, but if you have running clusters it’s not recommended), then stop the daemons and start it all again.
Our volume name is statics. Let’s get some information:
$ sudo gluster volume info statics
Volume Name: statics
Number of Bricks: 2
Let’s start the volume and enjoy:
$ sudo gluster volume start statics
Starting volume statics has been successful
Now, both nodes are serving our volume. Running volume info again will say Status: Started.
Mount volume on one staticnode1.mydomain.com
Create a directory, for example in /mnt/statics:
$ sudo mkdir -p /mnt/statics
And then, mount our replicated volume there:
$ sudo mount -t glusterfs staticnode1.mydomain.com:statics /mnt/statics
Here it is, we have the shred volume operating. We can try df:
$ df -h
$ df -h
Filesystem Size Used Avail Use% Mounted on
rootfs 10G 6,9G 3,1G 69% /
/dev/xvdb1 100G 10G 90G 10% /home
udev 10M 0 10M 0% /dev
tmpfs 101M 148K 101M 1% /run
/dev/disk/by-label/DOROOT 30G 6,9G 22G 25% /
tmpfs 5,0M 0 5,0M 0% /run/lock
tmpfs 201M 0 201M 0% /run/shm
staticnode1.mydomain.com:statics 50G 10G 40G 20% /mnt/statics
The total size of our volume is the smallest one. The df was executed in staticnode1, but, in staticnode2.mydomain.com we have a /home device of 50G, that’s because the maximum size is 50G, that’s the maximum space we can use, but in staticnode1, we will have more space. Also, the space used by our statics volume is NOT 10G, that’s the space used by the device, It may not be the same we used for our replicated volume if the device is not exclusive.
Let’s play a little bit!
We can create/copy files into our volume (/mnt/statics):
$ touch /mnt/statics/hola
$ cp /boot/initrd* /mnt/statics/
And we can ls /home/user/shared/volumes/statics and we will see the files there. Also in staticnode2.mydomain.com we can ls /home/user/shared/volumes/statics and the files will also be there (they may have a bit of lag if the network is congested or the files are very large), but in my experience, it behaves right.
Connecting an external client
If we want to connect an external client to our cluster, all we have to do is to mount it, as we did in our server. Our client must know the host where the files are, edit /etc/hosts if not. Also glusterfs-client package must be installed.
$ sudo mount -t glusterfs staticnode1.mydomain.com:statics /mnt/wherever
But it’s recommended we add the auth.allow directive to our cluster. By default, all clients are enabled, but sometimes it’s not desirable. Just write what follows:
$ sudo gluster volume set statics auth.allow 10.1.2.3
10.1.2.3 is our client address, and we can use wildcards (10.1.2.*) or comma-separated addresses (10.1.2.3,10.1.2.5 (without spaces))
Installing to fstab
The best way to automate all of this is to include a fstab line and make it mount automatically on system boot, or make it easier to mount manually. Let’s edit /etc/fstab and include this line:
staticnode1.mydomain.com:statics /mnt/statics/ glusterfs defaults,_netdev 0 0
The _netdev option is interesting, because it mounts our network device after network is up and working (if we try to do it before, the system will boot slower waiting for the network and will probably not mount our device. Of course we have to do this in all the host that will have access to our cluster volume.
Some more play
Glusterfs has lots of options, I’m not writing about all of them, here you can find much more information. But some interesting options (for me) are:
- $ sudo gluster volume set [volume_name] nfs.disable 1
Glusterfs uses nfs as option for file transport, and it’s fast. But sometimes it’s not the better option.
- $ sudo gluster peer detach [peer]
The opposite of peer probe, to let this server forget about [peer]
- $ sudo gluster volume set [volume_name] auth.allow/auth.reject [peer/ip]
Allows or rejects a client for this volume
- $ sudo gluster volume set [volume_name] network.ping-timeout [seconds]
Seconds to consider host as unavailable or disconnected. Use it wisely, connections are expensive but sometimes there are microcuts in our network. But this value (defaults to 42sec) must be slow enough to avoid waiting for an unavailable resource. I think, when we are on a local network, it can be 1 or 2 secs
- $ sudo gluster volume set [volume_name] cluster.min-free-disk [percent]
Minimun free disk to keep. If we’re not using a dedicated partition for our shared volumes. This is a great idea, to avoid a disk full.
- $ sudo gluster volume stop [volume_name]
stops a volume
Feel free to comment doubts and suggestions. I’d like to make this guide more complete.