john-dev - Re: autotune

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b9ba696a554e11ea3cbf732b33dff8ad@smtp.hushmail.com>
Date: Fri, 22 May 2015 02:56:18 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: autotune_run problem

On 2015-05-22 02:00, Agnieszka Bielec wrote:
> hi,
> I've fixed one bug in my parallel-opencl but I have problem with autotune_run
> I discovered that for different set of arguments is running different
> algorithm to determine when tuning for gws is stopped
> If i make
> autotune_run(self, 1000, 0, 500);

As you probably noticed already, the last argument comes in two flavors: 
If 1000 or below, it's parsed as milliseconds and limits a single kernel 
invocation (intended for loop kernels). If above 1000 (but should not be 
more than 10000000000UL) it's counted as nanoseconds of *total* time for 
all kernels and loops (basically total time for crypt_all()). This 
syntax is awful but that's how it is currently.

For looped kernels, a value of 200 is sane, (if not, something else 
should be tweaked, eg. number of iterations per call). For single-run 
do-it-all kernels, 10000000000UL is fine.


> computed gws is optimal on my laptop and --dev=1 but not in --dev=5,
> it prints exceed for the optimal value and setting highest
> duration_time doesn't work

Did you try setting it to 1000 instead of 500? If that works better you 
should implement a split kernel though. A full second duration is way to 
long.

> when my autotune_run call looks like:
> autotune_run(self, 1, 1000, 100000);
> the time when we stop computing is determined by:
> if (best_speed && speed < 1.8 * best_speed &&
>                  max_run_time && run_time > max_run_time) {
>              if (!optimal_gws)
>                  optimal_gws = num;
>
>              if (options.verbosity > 3)
>                  fprintf(stderr, " - too slow\n");
>              break;
>          }
>
> we stop computing new values for gws only when new speed isn't 1.8
> faster than the previous
>
> and 1.8 is a wrong value for parallel, a change from 1.8 to 1.1  works
> good for --dev=1 and on my laptop but for --dev=5 it stops for
> unoptimal gws=4096.

It's very hard to make these functions good with all formats. Maybe we 
should introduce another parameter for that 1.1/1.8 figure.

> it stops on 4096 because there is no difference in the speed for
> gws=4094 and 8192
> for 32768 the speed is better

Perhaps you could try bumping your starting figure. You could make it 
start at eg. 16384 or 32768 by changing the SEED macro. Setting it too 
high might be too hard for weak devices though. This might break running 
on weak device though - even if they are slower than CPU and utterly 
unusable, we should still behave.

> any idea how I can set optimal gws also for --dev=5 ?
> reults above might suggest that we can have some hashes not autotuned
> properly but persons with better knowledge about autotune_run should
> comment this

You should run with -verb=5, this will illustrate better what happens. 
Please try two runs on -dev=5 with -verb=5 comparing using 1000 and 
10000000000UL for autotune_run() and post the results.

magnum
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.