With the loading of the setting (a plain word for the prelude) taking up to 9 seconds I decided to focus on improving mildews speed to an bearable level.
With compiling the bytecode down to C code not making a substantiall improvement I decided to use a profile.
Gprof seemed the most obvious choice, after making scons add the -pg flag in all the required places (which makes gcc add profiling instrumentation)
It turned out that the recent versions of gprof doesn't support shared libraries.
(Which should have been documented instead of being found as result of googling through mailing lists :( )
I briefly tried OProf but I found the documentation incredibly dense I turned to sysprof.
Sysprof uses a kernel module to profile *all* the running programs so I had to upgrade my kernel.
Using sysprof i found out that smop spent lots of time managing pthread locks.
Turning locking off made smop run nearly twice as fast.