Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 6 Jun 2015 19:38:18 +0800
From: Lei Zhang <>
Subject: Re: Interleaving of intrinsics

> On Jun 6, 2015, at 1:01 AM, magnum < <>> wrote:
> You should probably do much longer runs (eg --test=15 or more) to get things like cfg_get_section completely out of the way.
>> '__memcpy_sse2_unaligned' might imply some overhead incurred from
>> unaligned memcpy, which is irrelevant to this topic though.
> If this is still seen on longer runs, we should look into it. Maybe we should try callgrind with very long test runs (--test=60 or much more) and see if it can sample enough for kcachegrind to show some info within the hash functions. This might even help see relations between the source and the resulting assembler.

Same settings as the previous, except for longer run time (--test=20):

Function					CPU Time
__memcpy_sse2_unaligned	1.475s
memcpy					0.975s
pbkdf2_sha256_sse		0.383s
SSESHA256body			0.224s
_mm_xor_si128			0.088s
[Others]					17.294s

Function					CPU Time
SSESHA256body			3.759s
__memcpy_sse2_unaligned	0.735s
pbkdf2_sha256_sse		0.280s
_mm_xor_si128			0.108s
_mm_srli_epi32			0.056s
[Others]					15.929s

Function					CPU Time
SSESHA256body			4.155s
__memcpy_sse2_unaligned	0.563s
pbkdf2_sha256_sse		0.426s
_mm_srli_epi32			0.108s
_mm_add_epi32			0.076s
[Others]					15.929s

Function					CPU Time
SSESHA256body			4.292s
__memcpy_sse2_unaligned	0.559s
pbkdf2_sha256_sse		0.340s
_mm_srli_epi32			0.148s
_mm_slli_epi32			0.083s
[Others]					16.184s

Use of intrinsics is counted as function calls, so their run time is included in the '[Others]' row, rather than in SSESHA256body. Yet '__memcpy_sse2_unaligned' is still there. It's likely used to accelerate memcpy by using SIMD instructions. I thinks it just implies the overhead of memcpys in JtR, which is normal.


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ