Record of some of the computer tech I deal with so that it's documented at least somewhere.

Saturday 20 June 2009

Serialising multiple writers in the shell

A few years ago I came across wait_on for monitoring file changes on FreeBSD (from the kqueue) in the shell. Linux has inotify-tools for using inotify the same way. Using these tools we can build a fifo daemon with multiple writers and one reader. I'm not sure if Plan 9 already has a way to do this in userland, maybe even by using snoopy.

Here's the Linux version I use.


% cat /usr/local/bin/manage_queue
#!/usr/local/plan9/bin/rc

# we assume that / will never appear in a filename and thus is a safe delimeter
DELIMETER=/

fn waiter {
inotifywait -q $3 --format '%f'^$DELIMETER -e $2 $1
}

fn watch {
if(~ $1 -1) # do it once
waiter $*(2-)
if not # do it forever
waiter $1 $2 -m
}

fn process {
read_until $DELIMETER | sed 's/.$//' | $*
/usr/local/plan9/bin/dd -ibs 1 -count 1 >[2] /dev/null
}

# start the processing cmd in the queue directory
if(~ $1 -1) { # do it once
cd $2
watch -1 $2 $3 | process $*(4-)
}
if not { # ad infinitum
cd $1
watch $1 $2 | while() { process $*(3-) }
}


and you invoke it thus :
su -c 'manage_queue /tmp/uploaded_images/ moved_to make_thumbnails' uploadprocessor
although I use daemontools so that is the contents of /etc/service/uploads/run

My other usual option is close_write instead of moved_to, it depends how your files get there and are processed once they arrive.

The -1 option is for doing a spot of debugging, it does one file and then exits

The bit of script missing from there is read_until


% cat /usr/local/bin/read_until
#!/usr/local/plan9/bin/rc

ifs=()

while() {
c=`{/usr/local/plan9/bin/dd -ibs 1 -count 1 >[2] /dev/null}
if(~ $#c 0)
echo -n ' '
if not
switch($c) {
case $1
exit
case *
echo -n $c
}
}
}


I had to make that because newline is a valid filename character thus defeating various line reading commands.

Now you've got the thing going, you'll need a processor like make_thumbnails, which I've made up here as an example (gm is graphicsmagick which kept the older api when ImageMagick changed):

#!/usr/local/plan9/bin/rc
ifs=()
fin=`{cat}
gm convert -resize 100x100 /tmp/uploads/$fin /var/www/thumbs/$fin.jpeg
gm convert -resize 350x500 /tmp/uploads/$fin /var/www/mids/$fin.jpeg



So, using this mechanism the user uploads the image via a web browser, when the CGI is satisfied it can do 'mv /uploadir/$up /tmp/uploads/$up' and carry on sending html to the user, the thumbnail will get made when it gets to the front of the queue.

The other issues this can solve is ring fencing users from each other and just communicating via files and filenames (of which the above is an example but also internal users). www can write to /tmp/downloads but not /var/www/*. By passing the uploaded image through gm it gets (hopefully) sanitized, by running gm as its own user that mitigates attacks against gm, one can use the OSes limiters to let gm only have so much cpu etc.

By serialising like this it also protects against a DoS uploading loads of massive images and trying to thumbnail them in parallel. You've still got the problem of processing them but at least only one process will be pegged at 100% not one per upload.

Though I've not done it yet, you could also use it as a marshall which does some load balancing (perhaps using Xcpu), or just forked threads.

--

No comments: