Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 23 Apr 2015 23:35:44 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] JtR SIMD support enhancements


> On Apr 23, 2015, at 6:37 PM, Solar Designer <solar@...nwall.com> wrote:
> 
> Hi Lei,
> 
> On Thu, Apr 23, 2015 at 06:25:27PM +0800, Lei Zhang wrote:
>> I just finished adding MIC/AVX512 support to the remaining formats in JtR (great thanks to magnum's work). Now all formats with MIC intrinsics enabled passed self-tests on MIC.
> 
> Great.  What speeds are you getting?

Please see the attachment for a full report.

> Have you tried tuning the interleave factors already?  And simpler
> things such as OMP_SCALE?

I did tune a bunch of OMP_SCALEs. Some them are too big by default and would drain MIC's memory if not tuned. There're just too many formats there to do a thorough check. So I just picked out some formats that have too big a OMP_SCALE (e.g. > 4096), and experimentally tuned it one by one. 

I'm not sure of the "interleave factors". Could you be more specific?


> Regarding OpenMP offload experiments:
> 
>> BF_std:
>> Currently this is the only one that works.
>> -----------------------------------------------------
>> [zhanglei@...ter src]$ ../run/john --test --format=bcrypt
>> Will run 12 OpenMP threads
>> Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]... DONE
>> Raw:    1552 c/s real, 1555 c/s virtual
>> -----------------------------------------------------
> 
> What exactly is benchmarked here?  Is this 12 threads running on MIC?
> I guess 12 came from the host CPU's number of hardware threads, and as
> we know it is way too low for MIC.  What will happen if you force
> OMP_NUM_THREADS=240 in this test?  Anyway, we should have it run the
> proper number of threads for the device it's offloading to - but only on
> that device, obviously.
> 
> In fact, the performance you're seeing here is too good to be for 12
> threads (out of 240 possible) on MIC, but too poor to be for 12 threads
> on host.  So I am puzzled.  Can you figure this out?  Check "micsmc -a |
> less" and "top" (on both host and MIC) while this is running, etc.

Actually, in BF_std.c, I only added a single line of pragma directive (plus a bunch of "__attribute__((target(mic)))"s):
-----------------------------------------------------
#pragma offload target(mic) inout(salt:length(1))
#pragma omp parallel for ...
-----------------------------------------------------
The '12 OpenMP threads' reported should've been detected by host code. The default number of threads used by offloaded code for MIC should be 236. I tried adding a "printf("%d\n", omp_get_num_threads());" in the offloaded code, and the output confirmed my expectation. 

BTW, I did some experiment to find out the default number of threads is 240 in native mode, but 236 in offload mode. I guess that, in offload mode,  one of MIC's 60 cores is preserved for communicating with the host.


Lei


View attachment "log.txt" of type "text/plain" (44926 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.