Record of some of the computer tech I deal with so that it's documented at least somewhere.

Wednesday 27 January 2010

Venti on Linux via p9p

The plan is to backup arenas to DVD
DVD-Rs store 4707319808 bytes
Four arenas should be enough to start with

(when I posted this mycrotiv shoed me his venti setup script)

% echo '4 * 4707319808 / 1024' | hoc
18387968
% time dd if=/dev/zero of=arenas bs=1024 count=18387968 # stupid linux dd and its =
real 14m0.207s


venti(8) recommends an index size that is 5% of the active data log.

It doesn't describe the difference between the active data log and any other sort so I'll assume 5% of the arena size which in this case is
% echo '0.05 * 4 * 4707319808 / 1024' | hoc
919398.4
% dd if=/dev/zero of=isect bs=1024 count=919399


Seeing as I'm using it for backup all sorts of shizzle, I'm going to include the bloom filter.

The bloom filter thus has two parameters: nhash (maximum 32) and the total bitmap size (maximum 512MB, 232 bits). nhash × nblock <= 0.7 × b, where nblock is the expected number of blocks stored on the server and b is the bitmap size in bits.
b = 232, nash = 32
32 × nblock <= 0.7 × 232
nblock <= 0.7 × 232 / 32
nblock <= 5.075

so I guess nblock = 5, anyway says the maximum size is 512Mb
% echo '512 * 1024' | hoc
524288
% dd if=/dev/zero of=bloom bs=1024 count=524288


That's the storage reserved now for formatting

I'm going to round down the actual arena size a bit
% time venti/fmtareanas -a 4705000000 -Z arenas0. /home/venti/arenas # I dd'd in zero
fmtarenas /home/venti/arenas: 5 arenas, 18,828,484,608 bytes storage, 524,288 bytes for index map
real 0m1.312s
% time venti/fmtisect -Z isect0. /home/venti/isect
fmtisect /home/venti/isect: 114,827 buckets of 215 entries, 524,288 bytes for index map
real 0m0.513s
% time venti/fmtbloom -N 32 /home/venti/bloom
fmtbloom: using 512MB, 32 hashes/score, best up to 95,443,717 blocks
real 0m26.742s


Ok that looks like the formatting is done, set up venti.conf
% cat venti.conf
index main
isect /home/venti/isect
arenas /home/venti/arenas
bloom /home/venti/bloom
httpdaddr tcp!127.0.0.1!808
% time /usr/local/plan9/bin/venti/fmtindex venti.conf
fmtindex: 5 arenas, 114,827 index buckets, 18,828,402,688 bytes storage
real 0m0.483s


hmm, ok , lets run this sucker

% /usr/local/plan9/bin/venti/venti
2010/0127 21:12:22 venti: conf...httpd tcp!127.0.0.1!808...init...icache 0 bytes = 1,000 entries; 4 scache
sync...announce tcp!*!venti...serving.
% wget -q -O - http://127.0.0.1:808/storage
index=main
total arenas=5 active=0
total space=18,828,402,688 used=0
clumps=0 compressed clumps=0 data=0 compressed data=0


as usual, despressingly straight forward, I didn't even need a loopback

1 comment:

ikrabbe said...

After some discussion on #plan9 @freenode.net, yesterday, we came to the conclusion, that your guide works here, but has a problem, when you specify your fmtarenas argument with 4705000000, as its not a multiple of block size.

Actually the reserved bytes will be unused, but there is another pit, that is introduced by this:

a rdarena/wrarena cycle fails, as rdarena will cut the read image by a multiple of blocksize, here "venti/wrarena: arena is truncated: want 4705000000 bytes have 4704993280".

This way it will be much better to increase or reduce block size to a multiple of 8192 or whatever you want blocksize to be.

Anyway, your blog entry is quite helpful and actually we may should fix that rdarena thing here, or spit at least in warning in fmtarenas.

cheers, ikrabbe