Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 6 May 2015 10:56:00 +0800
From: Lei Zhang <>
Subject: Re: [GSoC] JtR SIMD support enhancements

> On May 5, 2015, at 8:48 PM, Solar Designer <> wrote:
> Thank you!  I think we'll want you to try VTune on other formats as
> well, and to see lower-level bottlenecks as well (cache misses, etc.)
> (But for phpass you'd proceed with that later - need to deal with the
> immediate bottleneck first.)

Do you mean profiling a normal x86 build or a MIC build? Actually it's a bit of pain to profile MIC programs. The older version of VTune has very limited support for MIC, while the latest version is incompatible with an older MPSS installation. Our lab's server has a rather old MPSS, and currently I don't have the permission to upgrade it, so I have to live with the older VTune.

>> Actually by setting OMP_NUM_THREADS to smaller values, I could get better results than the above.
> OK.  The first thing I'd try is tuning OMP_SCALE.  Perhaps the current
> value is way too low, resulting in the threads needing synchronization
> too often.  Please experiment with it.  Please stay at 240 threads.

I'm not sure where OMP_SCALE is defined for phpass. I guess it's in dynamic_types.h, and that's the result after tuning it:

1			18249 c/s real, 75.9 c/s virtual
4			18917 c/s real, 79.1 c/s virtual
16			18629 c/s real, 77.5 c/s virtual
64			18541 c/s real, 77.8 c/s virtual
256			18541 c/s real, 77.7 c/s virtual

It appears tuning OMP_SCALE makes not much difference here. BTW, setting OMP_SCALE to 2048 or higher would cause the failure of memory allocation. From this experiment, I'm uncertain about if OMP_SCALE has a significant impact on synchronization overhead. 


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.