This was a waste of time, AOE on Linux this way is far too slow. - and it wasn't anything to do with the encryption. Or LUKS I did the same with mdadm
This year's project started out as an idea for building a performant SQL database. The size of the data I plan to process is larger than the system memory I have, so it will be I/O limited. I have a mix of SSD and mechanical drives available.
I have some ideas to explore so this blog will follow the success and failure of those ideas as I explore the space.
Hardware
I have 5x Western Digital WD RE2-GP WD1000FYPS 1 Tb hard drives. I got those for £18 each from ebay. I made sure these were not Shingle Drives.
I had a spare mtoherboard which was declared unrepairable and the company I worked for paid £2000 for a replacement. I was tasked with disposing of it, so I took it home - free case and a PSU. But when I booted it up, I noticed it booted and then overheated - I spent £3 on a new CPU fan and it worked once again. It only had 2Gb of RAM so I replaced it with 16GB for £47.80
Finding a case with enough hard drive bays was a challenge. I bought what I thought was one for £30 but it had a misleading description. I've ended up swapping out my existing terminal.
Operating System
The plan is to serve AOE targets and combine them into a RAID. That way the storage capacity is expandable by adding more vblade servers and is not limited by the capacity of a single machine. It also offeres the possibilty of using different OSes to do the serving. I wanted to use OpenBSD but discovered they had removed AoE from their offering. Thanks Theo. Plan9 will serve AoE and I will add that via 9front at some point but am not keen to mess about with that today. I shall use the old warhorse Debian for the moment.
A beauty of simply serving vblades is the host is simple. It just needs decent I/O (which might not be in Linux favour). And you can swap your OS out and use another and just serve up the raw disk.
For some reason, USB storage devices seem to have complexified. Just dd the iso to the USB device wants to not work all the time. So one technqiue I like now is to use QEMU to boot the install disk and write straight to the drive.
With the target drive attached to a Linux box which shows up as /dev/sdf stsart qemu up like so:
qemu-system-x86_64 --enable-kvm -m 1024 -cdrom debian-10.4.0-amd64-netinst.iso -drive file=/dev/sdf,driver=raw
(make sure the -m 1024 is there, Debian installer will crash with a crypitc message)
and then run the Debian installer as normal.
Next up add the vblade package apt install vblade
#!/bin/sh
if=enp3s0
/usr/sbin/vbladed -d 1 1 $if /dev/sdb
/usr/sbin/vbladed -d 1 2 $if /dev/sdc
/usr/sbin/vbladed -d 1 3 $if /dev/sdd
/usr/sbin/vbladed -d 1 4 $if /dev/sde
/usr/sbin/vbladed -d 1 5 $if /dev/sdf
So that's the first setup, the next stage is to create the software raid on another machine
RAIDin time
Over to another machine and we can test the vblades
apt install aoetools
Then seek out the vblades
root@hex:/ # aoe-discover; aoe-stat
e1.1 1000.204GB enp3s0 1024 up
e1.2 1000.204GB enp3s0 1024 up
e1.3 1000.204GB enp3s0 1024 up
e1.4 1000.204GB enp3s0 1024 up
e1.5 1000.204GB enp3s0 1024 up
All looks good, now let's join them together, for that we'll need lvm - at least I hope this is the right way, I've never done it before, I'm copying this process from the gentoo wiki
apt install lvm2
Tag the partitions with volumes - the disks already had a partition table, hopefully it works - I've only left the message on the first one.
# lvm pvcreate /dev/etherd/e1.1p1
WARNING: ntfs signature detected on e1.1p1 at offset 3. Wipe it? [y/n]: y
Wiping ntfs signature on e1.1p1.
Physical volume "e1.1p1" successfully created.
lvm pvcreate /dev/etherd/e1.2p1
lvm pvcreate /dev/etherd/e1.3p1
lvm pvcreate /dev/etherd/e1.4p1
lvm pvcreate /dev/etherd/e1.5p1
Then join them together
# cd /dev/etherd
# vgcreate raid0vg0 e1.1p1 e1.2p1 e1.3p1 e1.4p1 e1.5p1
The instructions there use RAID1, so for this first experiment I'll do that. I don't plan to do that but for now I'll go with it for learning sake
root@hex:/dev/etherd# lvcreate --mirrors 1 --type raid1 -l 100%FREE --nosync -n raid0lv0 raid0vg0
WARNING: New raid1 won't be synchronised. Don't read what you didn't write!
Logical volume "raid0lv0" created.
The Wiki creates an EXT4 filesystem on this
# mkfs.ext4 /dev/raid0vg0/raid0lv0
mke2fs 1.43.4 (31-Jan-2017)
Creating filesystem with 488377344 4k blocks and 122101760 inodes
Filesystem UUID: 8c8a45f5-27fa-4964-befa-143d32365878
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
this took about 30s and the two of the 5 vblades were at about 15% CPU during that time. It is now mountable.
root@hex:/dev# mkdir /mnt/raid
root@hex:/dev# mount /dev/raid0vg0/raid0lv0 /mnt/raid
root@hex:/dev# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/raid0vg0-raid0lv0 1.8T 77M 1.7T 1% /mnt/raid
runnign fio on that got some terrible speeds
WRITE: io=1432.9MB, aggrb=3206KB/s, minb=3206KB/s, maxb=3206KB/s, mint=457632msec, maxt=457632msec
3Mb/s! I don't know if it was disk constrained or what. Only 1.1 and 1.2 had any activity in iotop, so perhaps the RAID level is a factor - it's not striping across all disks. I'm also going to turn on Jumbo frames
# ip link set enp3s0 mtu 9000
and see if that makes a difference. I'm also going to run iftop as well to see what that says
WRITE: io=1446.9MB, aggrb=6335KB/s, minb=6335KB/s, maxb=6335KB/s, mint=233857msec, maxt=233857msec
At least it doubled it. Interestingly it didn't show up on the server side. Ah, RTFM, it was a single process and waiting for a fsync. Try it multi threaded and don't wait for fsync like a real server
fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=16 --size=4g --iodepth=8 --runtime=60 --time_based --end_fsync=0 --filename=/dev/mapper/raid0vg0-raid0lv0
and the result is a saturated 1Gb network link, just like I originally expected
WRITE: io=66441MB, aggrb=1105.7MB/s, minb=58076KB/s, maxb=76711KB/s, mint=60001msec, maxt=60092msec
So what good is a 1Gb disk? SATA on it's own would be fatser than that .... the answer is why I started - I have a couple of 10Gbe NICs and a dual 10Gbe motherboard (I haven't got that going yet, I need a VGA cable!
On top of that, we have encryption to do and we can do that on 1gb
# apt install cryptsetup
Need to reboot after this. And then get the LVM back into the system
# vgchange -a y raid0vg0
# lvchange -a y raid0vg0 # (this might not be needed)
And then we can encrypt the volume
# cryptsetup luksFormat -c aes-xts-plain64:sha256 -s 256 /dev/raid0vg0/raid0lv0
And then map the partition, it appears in /dev/mapper
cryptsetup luksOpen /dev/raid0vg0/raid0lv0 raid0lv0encripted
And then we can make a filesystem on it
# mkfs.ext4 /dev/mapper/raid0lv0encripted
One I got my ducks in a row with MTUs and testing the correct volume I got
WRITE: io=85259MB, aggrb=1420.5MB/s, minb=76220KB/s, maxb=103820KB/s, mint=60001msec, maxt=60023msec
Which isn't good - and 100% CPU on the client end doing the decryption - it certainly got the fans working!