Record of some of the computer tech I deal with so that it's documented at least somewhere.

Friday 26 March 2010

Go Slow

Cores up to your ears, so stop writing single threaded code! You can even grab multi machine programming, you'll hardly know the difference and there are some things that can really benefit from it. Imagine image editors grabbing spare cycles from the machines on the LAN as required. It's here today and it's called Go.

In this entry I'm going to take someone's existing Go implementation of the smallpt global illumination renderer. I hope to demonstrate the easy multi-threaded nature of CSP based techniques, even spanning machines!

The single threaded Go version gives me :
% time ./smallpt 8
Rendering (8 spp) 100.00
721.50 user 5.08 system 12:06.84 elapsed 99%CPU


Single threaded C++ with -O3
% time ./small-sthread 8
Rendering (8 spp) 100.00%
61.76u 0.27s 62.04r ./small-sthread 8


OpenMP Multi threaded Code, 2 cores + HT
% time ./small-omp 8
Rendering (8 spp) 100.00%
88.55u 0.90s 24.83r ./small-omp 8


Oh dear, that's amazingly piss poor. An order of magnitude to find!

OK I added an I/O thread to gather the pixels
time ./smallpt.io 8
Rendering (8 spp) 100.00
849.65 user 54.19 system 14:02.77 elapsed 107%CPU


And not suprisingly I added 50s of Syscalls, some 10% overhead but at least they seem to have ended up on 1 CPU.

Ultimate overload : 1 thread per pixel.

crashed


with a threadpool of 4 on a machine with 2 HT CPUS
time ./smallpt 8
Submitting (8 spp) 100.00
1106.38 user 369.83 system 15:47.15 elapsed 155% CPU


This used more CPU time but the running time was much the same. All the SMP gain is used up by the overhead.

I manually inlined the function calls - getting rid of the Vec methods
% time ./smallpt.un 8
Rendering (8 spp) 100.00
273.82 user 5.62 system 4:40.05 elapsed 99% CPU


shit, now I fixed it it is 4x slower !

One interesting question I have is, do Go routines without a CPU operate in non pre-empting co-op mode like Limbo? So if I spawn my Go routine to do slow I/O will it be interleaved or sit there doing nothing while the heavy CPU work co-ops ?

Go Routined version
time ./smallpt_gr.go 8
1012.35 user 454.21 system 16:21.66 elapsed 149% CPU


Libbed out vector you'll need

No comments: