Record of some of the computer tech I deal with so that it's documented at least somewhere.

Sunday 14 June 2020

Building an Encrypted ATA Over ethernet (AOE) RAID Network for Serving Big Data? Don't bother

This was a waste of time, AOE on Linux this way is far too slow. - and it wasn't anything to do with the encryption. Or LUKS I did the same with mdadm

This year's project started out as an idea for building a performant SQL database. The size of the data I plan to process is larger than the system memory I have, so it will be I/O limited. I have a mix of SSD and mechanical drives available.

I have some ideas to explore so this blog will follow the success and failure of those ideas as I explore the space.

Hardware

I have 5x Western Digital WD RE2-GP WD1000FYPS 1 Tb hard drives. I got those for £18 each from ebay. I made sure these were not Shingle Drives.

I had a spare mtoherboard which was declared unrepairable and the company I worked for paid £2000 for a replacement. I was tasked with disposing of it, so I took it home - free case and a PSU. But when I booted it up, I noticed it booted and then overheated - I spent £3 on a new CPU fan and it worked once again. It only had 2Gb of RAM so I replaced it with 16GB for £47.80

Finding a case with enough hard drive bays was a challenge. I bought what I thought was one for £30 but it had a misleading description. I've ended up swapping out my existing terminal.

Operating System

The plan is to serve AOE targets and combine them into a RAID. That way the storage capacity is expandable by adding more vblade servers and is not limited by the capacity of a single machine. It also offeres the possibilty of using different OSes to do the serving. I wanted to use OpenBSD but discovered they had removed AoE from their offering. Thanks Theo. Plan9 will serve AoE and I will add that via 9front at some point but am not keen to mess about with that today. I shall use the old warhorse Debian for the moment.

A beauty of simply serving vblades is the host is simple. It just needs decent I/O (which might not be in Linux favour). And you can swap your OS out and use another and just serve up the raw disk.

For some reason, USB storage devices seem to have complexified. Just dd the iso to the USB device wants to not work all the time. So one technqiue I like now is to use QEMU to boot the install disk and write straight to the drive.

With the target drive attached to a Linux box which shows up as /dev/sdf stsart qemu up like so:

qemu-system-x86_64 --enable-kvm -m 1024 -cdrom debian-10.4.0-amd64-netinst.iso -drive file=/dev/sdf,driver=raw

(make sure the -m 1024 is there, Debian installer will crash with a crypitc message)
and then run the Debian installer as normal.

Next up add the vblade package apt install vblade

#!/bin/sh if=enp3s0 /usr/sbin/vbladed -d 1 1 $if /dev/sdb /usr/sbin/vbladed -d 1 2 $if /dev/sdc /usr/sbin/vbladed -d 1 3 $if /dev/sdd /usr/sbin/vbladed -d 1 4 $if /dev/sde /usr/sbin/vbladed -d 1 5 $if /dev/sdf

So that's the first setup, the next stage is to create the software raid on another machine

RAIDin time

Over to another machine and we can test the vblades

apt install aoetools

Then seek out the vblades

root@hex:/ # aoe-discover; aoe-stat e1.1 1000.204GB enp3s0 1024 up e1.2 1000.204GB enp3s0 1024 up e1.3 1000.204GB enp3s0 1024 up e1.4 1000.204GB enp3s0 1024 up e1.5 1000.204GB enp3s0 1024 up

All looks good, now let's join them together, for that we'll need lvm - at least I hope this is the right way, I've never done it before, I'm copying this process from the gentoo wiki

apt install lvm2

Tag the partitions with volumes - the disks already had a partition table, hopefully it works - I've only left the message on the first one.

# lvm pvcreate /dev/etherd/e1.1p1 WARNING: ntfs signature detected on e1.1p1 at offset 3. Wipe it? [y/n]: y Wiping ntfs signature on e1.1p1. Physical volume "e1.1p1" successfully created. lvm pvcreate /dev/etherd/e1.2p1 lvm pvcreate /dev/etherd/e1.3p1 lvm pvcreate /dev/etherd/e1.4p1 lvm pvcreate /dev/etherd/e1.5p1

Then join them together

# cd /dev/etherd # vgcreate raid0vg0 e1.1p1 e1.2p1 e1.3p1 e1.4p1 e1.5p1

The instructions there use RAID1, so for this first experiment I'll do that. I don't plan to do that but for now I'll go with it for learning sake

root@hex:/dev/etherd# lvcreate --mirrors 1 --type raid1 -l 100%FREE --nosync -n raid0lv0 raid0vg0 WARNING: New raid1 won't be synchronised. Don't read what you didn't write! Logical volume "raid0lv0" created.

The Wiki creates an EXT4 filesystem on this

# mkfs.ext4 /dev/raid0vg0/raid0lv0 mke2fs 1.43.4 (31-Jan-2017) Creating filesystem with 488377344 4k blocks and 122101760 inodes Filesystem UUID: 8c8a45f5-27fa-4964-befa-143d32365878 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 Allocating group tables: done Writing inode tables: done Creating journal (262144 blocks): done Writing superblocks and filesystem accounting information: done

this took about 30s and the two of the 5 vblades were at about 15% CPU during that time. It is now mountable.

root@hex:/dev# mkdir /mnt/raid root@hex:/dev# mount /dev/raid0vg0/raid0lv0 /mnt/raid root@hex:/dev# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/raid0vg0-raid0lv0 1.8T 77M 1.7T 1% /mnt/raid

runnign fio on that got some terrible speeds

WRITE: io=1432.9MB, aggrb=3206KB/s, minb=3206KB/s, maxb=3206KB/s, mint=457632msec, maxt=457632msec

3Mb/s! I don't know if it was disk constrained or what. Only 1.1 and 1.2 had any activity in iotop, so perhaps the RAID level is a factor - it's not striping across all disks. I'm also going to turn on Jumbo frames

# ip link set enp3s0 mtu 9000

and see if that makes a difference. I'm also going to run iftop as well to see what that says

WRITE: io=1446.9MB, aggrb=6335KB/s, minb=6335KB/s, maxb=6335KB/s, mint=233857msec, maxt=233857msec

At least it doubled it. Interestingly it didn't show up on the server side. Ah, RTFM, it was a single process and waiting for a fsync. Try it multi threaded and don't wait for fsync like a real server

fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=16 --size=4g --iodepth=8 --runtime=60 --time_based --end_fsync=0 --filename=/dev/mapper/raid0vg0-raid0lv0

and the result is a saturated 1Gb network link, just like I originally expected

WRITE: io=66441MB, aggrb=1105.7MB/s, minb=58076KB/s, maxb=76711KB/s, mint=60001msec, maxt=60092msec

So what good is a 1Gb disk? SATA on it's own would be fatser than that .... the answer is why I started - I have a couple of 10Gbe NICs and a dual 10Gbe motherboard (I haven't got that going yet, I need a VGA cable!

On top of that, we have encryption to do and we can do that on 1gb

# apt install cryptsetup

Need to reboot after this. And then get the LVM back into the system

# vgchange -a y raid0vg0 # lvchange -a y raid0vg0 # (this might not be needed)

And then we can encrypt the volume

# cryptsetup luksFormat -c aes-xts-plain64:sha256 -s 256 /dev/raid0vg0/raid0lv0

And then map the partition, it appears in /dev/mapper

cryptsetup luksOpen /dev/raid0vg0/raid0lv0 raid0lv0encripted

And then we can make a filesystem on it

# mkfs.ext4 /dev/mapper/raid0lv0encripted

One I got my ducks in a row with MTUs and testing the correct volume I got

WRITE: io=85259MB, aggrb=1420.5MB/s, minb=76220KB/s, maxb=103820KB/s, mint=60001msec, maxt=60023msec

Which isn't good - and 100% CPU on the client end doing the decryption - it certainly got the fans working!

No comments: