Tuesday, 29 November 2011

[gsheldr: Re: [ccp4bb] phaser openmp]


Date: 9 November 2011 11:36





---------- Forwarded message ----------
From: "George M. Sheldrick" <gsheldr>
To: Pascal 
Date: Wed, 9 Nov 2011 11:51:40 +0100
Subject: Re: [ccp4bb] phaser openmp
In my experience, writing efficient multithreaded code is much harder
than writing efficient single-thread code, and some algorithms scale
up much better than others. It is important to avoid cache misses, but
because each CPU has its own cache, on rare occasions it is possible
to scale up by more than the number of CPUs, because by dividing up
the memory the number of cache misses can be reduced. In the case of
the multi-CPU version of SHELXD (part of the current beta-test,
available on email request) I was able - with some effort - to keep
the effects of Amdahl's law within limits (on a 32 CPU machine it is
about 29 times faster than with one CPU).

George

On Wed, Nov 09, 2011 at 11:21:11AM +0100, Pascal wrote:
> Le Tue, 8 Nov 2011 16:25:22 -0800,
> Nat Echols a écrit :
>
> > On Tue, Nov 8, 2011 at 4:22 PM, Francois Berenger 
> > wrote:
> > > In the past I have been quite badly surprised by
> > > the no-acceleration I gained when using OpenMP
> > > with some of my programs... :(
>
> You need big parallel jobs and avoid synchronisations, barriers or this
> kind of things. Using data reduction is much more efficient. It's working
> very well for structure factors calculations for exemple.
>
> >
> > Amdahl's law is cruel:
> >
> > http://en.wikipedia.org/wiki/Amdahl's_law
>
> You can have much less than 5% of serial code.
>
> I have more problems with L2 misse cache events and memory bandwidth. A
> quad cores means 4 times the bandwidth necessary for a single process...
> If your code is already a bit greedy, the scale up is not good.
>
> Pascal
>

--
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany




No comments:

Post a Comment