Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 8 May 2015 09:52:53 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Adding OpenMP support to SunMD5


> On May 7, 2015, at 3:35 AM, Solar Designer <solar@...nwall.com> wrote:
> 
> To add OpenMP to this, we may introduce an extra outer loop:
> 
> 	for each sub-group of candidates
> 		for each round
> 			for each candidate in the sub-group
> 				do some processing
> 			end for
> 		end for
> 	end for

Thanks for clearing it out. Now I've got SunMD5 to work with OpenMP, and here's the result obtained on my laptop:

[Before]
Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... DONE
Speed for cost 1 (iteration count) of 5000
Raw:	605 c/s real, 610 c/s virtual

[After]
Will run 4 OpenMP threads
Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (4xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw:	1269 c/s real, 326 c/s virtual

Currently I only multiplied max_keys_per_crypt by number of threads. There might be more space for further improvement.

BTW, there's a minor issue I've encountered. See the following code (simplified):
--------------------------------------------------------------------
#if defined (_DEBUG)
static unsigned char input_buf[BLK_CNT*MD5_CBLOCK];
static unsigned char out_buf[BLK_CNT*MD5_DIGEST_LENGTH];
static unsigned char input_buf_big[25][BLK_CNT*MD5_CBLOCK];
#else
static unsigned char *input_buf;
static unsigned char *out_buf;
static unsigned char (*input_buf_big)[BLK_CNT*MD5_CBLOCK];
#endif
--------------------------------------------------------------------

Those three arrays have static size, but are dynamically allocated in a non-debug build. With OpenMP enabled, each thread should have a private copy of these arrays. If they're statically allocated, I can use a single clause like this:
--------------------------------------------------------------------
#pragma omp threadprivate(out_buf, input_buf, input_buf_big)
--------------------------------------------------------------------

But if not, I probably need to do some explicit copying for each thread. Currently I manually defined _DEBUG for ease of experimenting. I'm not sure about the point of allocating those array dynamically. Is there some optimization involved? If dynamic allocation isn't necessary, I'd abandon it and keep the OpenMP code clearer.


Lei
Content of type "text/html" skipped

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ