Record of some of the computer tech I deal with so that it's documented at least somewhere.

Friday, 26 March 2010

Go Slow

Cores up to your ears, so stop writing single threaded code! You can even grab multi machine programming, you'll hardly know the difference and there are some things that can really benefit from it. Imagine image editors grabbing spare cycles from the machines on the LAN as required. It's here today and it's called Go.

In this entry I'm going to take someone's existing Go implementation of the smallpt global illumination renderer. I hope to demonstrate the easy multi-threaded nature of CSP based techniques, even spanning machines!

The single threaded Go version gives me :
% time ./smallpt 8
Rendering (8 spp) 100.00
721.50 user 5.08 system 12:06.84 elapsed 99%CPU


Single threaded C++ with -O3
% time ./small-sthread 8
Rendering (8 spp) 100.00%
61.76u 0.27s 62.04r ./small-sthread 8


OpenMP Multi threaded Code, 2 cores + HT
% time ./small-omp 8
Rendering (8 spp) 100.00%
88.55u 0.90s 24.83r ./small-omp 8


Oh dear, that's amazingly piss poor. An order of magnitude to find!

OK I added an I/O thread to gather the pixels
time ./smallpt.io 8
Rendering (8 spp) 100.00
849.65 user 54.19 system 14:02.77 elapsed 107%CPU


And not suprisingly I added 50s of Syscalls, some 10% overhead but at least they seem to have ended up on 1 CPU.

Ultimate overload : 1 thread per pixel.

crashed


with a threadpool of 4 on a machine with 2 HT CPUS
time ./smallpt 8
Submitting (8 spp) 100.00
1106.38 user 369.83 system 15:47.15 elapsed 155% CPU


This used more CPU time but the running time was much the same. All the SMP gain is used up by the overhead.

I manually inlined the function calls - getting rid of the Vec methods
% time ./smallpt.un 8
Rendering (8 spp) 100.00
273.82 user 5.62 system 4:40.05 elapsed 99% CPU


shit, now I fixed it it is 4x slower !

One interesting question I have is, do Go routines without a CPU operate in non pre-empting co-op mode like Limbo? So if I spawn my Go routine to do slow I/O will it be interleaved or sit there doing nothing while the heavy CPU work co-ops ?

Go Routined version
time ./smallpt_gr.go 8
1012.35 user 454.21 system 16:21.66 elapsed 149% CPU


Libbed out vector you'll need

Thursday, 4 March 2010

My arenas are too big

I thought having 4gb arenas would be a good match for dumping to DVD but it's not because you have to read the arenas out in one go, there's no offsetting and although arenas only grow, the daily gpg encrypted version of the unsealed areana has different output depending on the length as well as the contents. Offsite copying is also maximized in this manner.

The ideal solution is to match the arena size to the daily growth and adjust when adding new arenas. I've not added new arenas to my venti yet so maybe there's even a way of automating it.

I'm going to go for 200Mb arenas. Gpg manages some compression so we might get 4 arenas on a CD or 25 on a DVD.

So apart from the occasional massive burst of Gb files, the daily "email & logs" reports will be maxed out at 200Mb instead of the just under 1.2Gb I've got atm. Uploading that offsite is just too much.