Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 25 Apr 2015 15:46:21 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] JtR SIMD support enhancements

Lei,

On Thu, Apr 23, 2015 at 11:35:44PM +0800, Lei Zhang wrote:
> > Regarding OpenMP offload experiments:
> > 
> >> BF_std:
> >> Currently this is the only one that works.
> >> -----------------------------------------------------
> >> [zhanglei@...ter src]$ ../run/john --test --format=bcrypt
> >> Will run 12 OpenMP threads
> >> Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]... DONE
> >> Raw:    1552 c/s real, 1555 c/s virtual
> >> -----------------------------------------------------
> > 
> > What exactly is benchmarked here?  Is this 12 threads running on MIC?
> > I guess 12 came from the host CPU's number of hardware threads, and as
> > we know it is way too low for MIC.  What will happen if you force
> > OMP_NUM_THREADS=240 in this test?  Anyway, we should have it run the
> > proper number of threads for the device it's offloading to - but only on
> > that device, obviously.
> > 
> > In fact, the performance you're seeing here is too good to be for 12
> > threads (out of 240 possible) on MIC, but too poor to be for 12 threads
> > on host.  So I am puzzled.  Can you figure this out?  Check "micsmc -a |
> > less" and "top" (on both host and MIC) while this is running, etc.
> 
> Actually, in BF_std.c, I only added a single line of pragma directive (plus a bunch of "__attribute__((target(mic)))"s):
> -----------------------------------------------------
> #pragma offload target(mic) inout(salt:length(1))
> #pragma omp parallel for ...
> -----------------------------------------------------
> The '12 OpenMP threads' reported should've been detected by host code. The default number of threads used by offloaded code for MIC should be 236. I tried adding a "printf("%d\n", omp_get_num_threads());" in the offloaded code, and the output confirmed my expectation. 
> 
> BTW, I did some experiment to find out the default number of threads is 240 in native mode, but 236 in offload mode. I guess that, in offload mode,  one of MIC's 60 cores is preserved for communicating with the host.

Yes, I had read about that.  I think they similarly allocate the last
core for communication when using MIC via OpenCL.

So, any idea about the weird speed you got for bcrypt here?  Here's
mine: maybe max_keys_per_crypt is set based on the host's number of
threads, so is too low for MIC?

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.