john-dev - Re: Interleaving of intrinsics

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e1e6d5931a87053cc054bf726262cb88@smtp.hushmail.com>
Date: Fri, 05 Jun 2015 19:01:17 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Interleaving of intrinsics

On 2015-06-05 17:07, Lei Zhang wrote:
> Hi,
>
> I haven't got useful info from viewing the assembly yet. But I tried
> to collect some statistics using VTune.
>
> Running PBKDF2-HMAC-SHA256 with various interleaving factors,
> OpenMP-disabled, on a Linux VM (Ivy Bridge):
>
> [x1]
> Function					CPU Time
> __memcpy_sse2_unaligned	0.094s
> memcpy					0.080s
> cfg_get_section			0.060s
> pbkdf2_sha256_sse		0.036s
> _mm_xor_si128			0.020s
> [Others]					1.140s
>
> [x2]
> Function					CPU Time
> SSESHA256body			0.276s
> cfg_get_section			0.042s
> _mm_add_epi32			0.028s
> pbkdf2_sha256_sse		0.028s
> _mm_add_epi32			0.024s
> [Others]					1.452s
>(...)

You should probably do much longer runs (eg --test=15 or more) to get 
things like cfg_get_section completely out of the way.

> '__memcpy_sse2_unaligned' might imply some overhead incurred from
> unaligned memcpy, which is irrelevant to this topic though.

If this is still seen on longer runs, we should look into it. Maybe we 
should try callgrind with very long test runs (--test=60 or much more) 
and see if it can sample enough for kcachegrind to show some info within 
the hash functions. This might even help see relations between the 
source and the resulting assembler.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.