Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 27 Apr 2015 10:15:55 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] JtR SIMD support enhancements


> On Apr 27, 2015, at 1:30 AM, magnum <john.magnum@...hmail.com> wrote:
> 
> On Thu, Apr 23, 2015 at 11:35:44PM +0800, Lei Zhang wrote:
>> Benchmarking: nt2, NT [MD4 512/512 MIC 16x]... DONE
>> Raw:	4907K c/s real, 4907K c/s virtual
> 
> So this is one 1 GHz core running AVX-512. Here's my 2.3 GHz laptop running AVX:
> 
> Benchmarking: nt2, NT [MD4 128/128 AVX 4x3]... DONE
> Raw:	43748K c/s real, 43748K c/s virtual
> 
> Accounting only for clock and width, the MIC should do 83519K. What is it that makes it 17x slower than that figure?

Well, it's rather imprecise to compare the performance of MIC and a laptop CPU simply based on clock rate. IIRC, the current-generation MIC is using a low-power core design, based on the Atom series. It lacks advanced features like out-of-order execution which is commonplace in desktop CPUs. 

A lab collogue of mine once experimented a single-threaded algorithm on MIC and his laptop (2.5 GHz) respectively, and MIC performed 17x slower than his laptop. There might be some other factors involved though, like I/O.

As we've seen many times in our lab, it's very likely a naively ported multi-thread program performs worse on MIC than a CPU. You need to make efficient use of MIC's SIMD units and high memory throughput to achieve ideal performance.


> Comparing a slow hash and threads, here's MIC's md5crypt:
> 
> Benchmarking: md5crypt, crypt(3) $1$ [MD5 512/512 MIC 16x]... (240xOMP) DONE
> Raw:	864932 c/s real, 3627 c/s virtual
> 
> And here's mine (4 cores + HT):
> 
> $ ../run/john -test -form:md5crypt
> Will run 8 OpenMP threads
> Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AVX 4x3]... (8xOMP) DONE
> Raw:	159360 c/s real, 22102 c/s virtual
> 
> MIC should do 9127K if only clock and width was different. The real figure is 10x slower than that. Why?

Ditto.


Lei

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ