Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 5 May 2015 19:30:03 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] JtR SIMD support enhancements


> On Apr 25, 2015, at 8:34 PM, Solar Designer <solar@...nwall.com> wrote:
> 
>> Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 128/128 MIC 16x1]... (240xOMP) DONE
>> Raw:	17976 c/s real, 75.5 c/s virtual
> 
> This is very poor speed.  Needs to be investigated.

Out of curiosity, I used Intel VTune to profile this self-test, and got a execution time distribution table:

Function			CPU Time
-----------------------------------------
[libiomp5.so]		248.451s
[vmlinux]			22.605s
[john]			6.882s
[libc-2.14.90.so]	0.627s
[libcrypto.so.10]	0.171s
[Others]			0.067s

The program spends most of its on libiomp5.so, which I guess is where inter-threads synchronization happens. I think this poor speed results from the high synchronization overhead. 

Actually by setting OMP_NUM_THREADS to smaller values, I could get better results than the above.
--------------------------------------------------------
[zhanglei@...0 zhanglei]$ OMP_NUM_THREADS=120 jumbo/john --test --format=phpass
Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 128/128 MIC 16x1]... (120xOMP) DONE
Raw:	21576 c/s real, 179 c/s virtual

[zhanglei@...0 zhanglei]$ OMP_NUM_THREADS=60 jumbo/john --test --format=phpass
Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 128/128 MIC 16x1]... (60xOMP) DONE
Raw:	22494 c/s real, 374 c/s virtual

[zhanglei@...0 zhanglei]$ OMP_NUM_THREADS=30 jumbo/john --test --format=phpass
Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 128/128 MIC 16x1]... (30xOMP) DONE
Raw:	21530 c/s real, 714 c/s virtual

[zhanglei@...0 zhanglei]$ OMP_NUM_THREADS=15 jumbo/john --test --format=phpass
Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 128/128 MIC 16x1]... (15xOMP) DONE
Raw:	24237 c/s real, 1613 c/s virtual

[zhanglei@...0 zhanglei]$ OMP_NUM_THREADS=8 jumbo/john --test --format=phpass
Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 128/128 MIC 16x1]... (8xOMP) DONE
Raw:	24000 c/s real, 3000 c/s virtual

[zhanglei@...0 zhanglei]$ OMP_NUM_THREADS=4 jumbo/john --test --format=phpass
Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 128/128 MIC 16x1]... (4xOMP) DONE
Raw:	12166 c/s real, 3049 c/s virtual

[zhanglei@...0 zhanglei]$ OMP_NUM_THREADS=2 jumbo/john --test --format=phpass
Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 128/128 MIC 16x1]... (2xOMP) DONE
Raw:	8188 c/s real, 4094 c/s virtual
--------------------------------------------------------

It appears that the default OMP_NUM_THREADS=240 isn't optimal for MIC, as the synchronization overhead is too high. Maybe we should tune OMP_NUM_THREADS individually for each format, just like OMP_SCALE.


Lei


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ