Date: Wed, 6 May 2015 10:56:00 +0800 From: Lei Zhang <zhanglei.april@...il.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] JtR SIMD support enhancements > On May 5, 2015, at 8:48 PM, Solar Designer <solar@...nwall.com> wrote: > > Thank you! I think we'll want you to try VTune on other formats as > well, and to see lower-level bottlenecks as well (cache misses, etc.) > (But for phpass you'd proceed with that later - need to deal with the > immediate bottleneck first.) Do you mean profiling a normal x86 build or a MIC build? Actually it's a bit of pain to profile MIC programs. The older version of VTune has very limited support for MIC, while the latest version is incompatible with an older MPSS installation. Our lab's server has a rather old MPSS, and currently I don't have the permission to upgrade it, so I have to live with the older VTune. >> Actually by setting OMP_NUM_THREADS to smaller values, I could get better results than the above. > > OK. The first thing I'd try is tuning OMP_SCALE. Perhaps the current > value is way too low, resulting in the threads needing synchronization > too often. Please experiment with it. Please stay at 240 threads. I'm not sure where OMP_SCALE is defined for phpass. I guess it's in dynamic_types.h, and that's the result after tuning it: OMP_SCALE Result ----------------------------------- 1 18249 c/s real, 75.9 c/s virtual 4 18917 c/s real, 79.1 c/s virtual 16 18629 c/s real, 77.5 c/s virtual 64 18541 c/s real, 77.8 c/s virtual 256 18541 c/s real, 77.7 c/s virtual It appears tuning OMP_SCALE makes not much difference here. BTW, setting OMP_SCALE to 2048 or higher would cause the failure of memory allocation. From this experiment, I'm uncertain about if OMP_SCALE has a significant impact on synchronization overhead. Lei
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.