Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 30 Mar 2015 10:24:45 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] John the Ripper support for PHC finalists

Hi Agnieszka,

On Mon, Mar 30, 2015 at 01:56:03AM +0200, Agnieszka Bielec wrote:
> I have added OpenCL and  I have fixed almost all things commented
> by magnumripper https://github.com/Lucife-r/JohnTheRipper

Thanks!

> I would like to know if, program with my adjustment is working
> on another machines.

Even on this same machine, pomelo-opencl fails on the NVIDIA GPU, this
is -dev=5.  Please test.

More importantly, though, please let us know which JtR source files you
used as templates for your format, both host and OpenCL.  At first
glance, it looks like you used a fast hash's one, with hacks that are
unlikely to be of much relevance to slow hashes such as PHC's (when
invoked with reasonable settings).

Also, where did you obtain the test vectors for POMELO from?  It looks
like they're for fairly low cost settings, perhaps lower than what
POMELO would normally be used with.  Here are the speeds I am getting:

[solar@...er src]$ OMP_NUM_THREADS=1 ../run/john -te -form=pomelo
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Benchmarking: pomelo, Generic pomelo [Pomelo]... DONE
Many salts:     10944 c/s real, 10944 c/s virtual
Only one salt:  10944 c/s real, 10944 c/s virtual

[solar@...er src]$ OMP_NUM_THREADS=16 ../run/john -te -form=pomelo
Will run 16 OpenMP threads
Benchmarking: pomelo, Generic pomelo [Pomelo]... (16xOMP) DONE
Many salts:     157184 c/s real, 9836 c/s virtual
Only one salt:  156672 c/s real, 9816 c/s virtual

[solar@...er src]$ export GOMP_CPU_AFFINITY=0-31
[solar@...er src]$ ../run/john -te -form=pomelo
Will run 32 OpenMP threads
Benchmarking: pomelo, Generic pomelo [Pomelo]... (32xOMP) DONE
Many salts:     167794 c/s real, 5305 c/s virtual
Only one salt:  168960 c/s real, 5284 c/s virtual

[solar@...er src]$ ../run/john -te -form=pomelo-opencl
Device 0: Tahiti [AMD Radeon HD 7900 Series]
Local worksize (LWS) 64, global worksize (GWS) 1024
Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, development use only)]... DONE
Raw:    9309 c/s real, 1024K c/s virtual

[solar@...er src]$ ../run/john -te -form=pomelo-opencl -dev=5
Device 5: GeForce GTX TITAN
Options used: -I ../run/kernels -cl-mad-enable -cl-nv-verbose -DDEVICE_INFO=4114 -D_OPENCL_COMPILER -DDEV_VER_MAJOR=319 -DDEV_VER_MINOR=60 
Build log: :162:15: error: 'long long' type is not supported
        state_size = 1ULL << (13 + m_cost);     //m_cost=3 is max
                     ^
[...]

[solar@...er src]$ ../run/john -te -form=pomelo-opencl -dev=2
Device 2: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Local worksize (LWS) 1, global worksize (GWS) 1024
Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, development use only)]... DONE
Raw:    163840 c/s real, 5414 c/s virtual

[solar@...er src]$ ../run/john -te -form=pomelo-opencl -dev=3
Device 3: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Build log: Compilation started
Compilation done
Linking started
Linking done
Kernel <pomelo_crypt_kernel> was not vectorized
Done.
Local worksize (LWS) 8, global worksize (GWS) 512
Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, development use only)]... DONE
Raw:    11043 c/s real, 11043 c/s virtual

[solar@...er src]$ ../run/john -te -form=pomelo-opencl -dev=4
Device 4: Intel(R) Many Integrated Core Acceleration Card
Build log: Compilation started
Compilation done
Linking started
Linking done
Build started
Kernel <pomelo_crypt_kernel> was successfully vectorized
Done.
Local worksize (LWS) 64, global worksize (GWS) 64
Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, development use only)]... DONE
Raw:    7.3 c/s real, 6400 c/s virtual

So the speed of C code is maybe good - I say maybe because we don't know
yet how much better it can be made.  One of two OpenCL SDKs running on
the CPUs achieves about the same speed, which is a good sanity check.
The other fails to vectorize the code, resulting in much lower speed.
The speed on Xeon Phi via OpenCL is a joke, but that's not too
surprising given that OpenCL isn't currently a good way to program Xeon
Phi (Intel's OpenCL implementation for Xeon Phi is too poor).  On AMD
GPU, the performance is low - this needs to be looked into.  On NVIDIA,
the kernel fails to compile.

> Curently the max m_cost in pomelo is 4 and I would like to get rid
> of this limit

Where does this limit come from?

> Is it possible to change 'count' variable which is passed to crypt_all,
> after the execution of opencl_ini_auto_setup() ?

magnum is right - first please describe the problem and how you're
trying to solve it by this.

> I have also a few another questions. I've found in pomelo code
> mentioned below.
> 
> //check the size of password, salt and output. Password is at most
> //256 bytes; the salt is at most 32 bytes.
>     if (inlen > 256 || saltlen > 64 || outlen > 256 || inlen < 0 ||
>         saltlen < 0 || outlen < 0)
> 
> 
> I'm not sure how I should set SALT_SIZE ?  I ought to set it with
> 32 or 64 ?

magnum already answered what this means for JtR - just set SALT_SIZE to
the maximum supported salt length.  However, there might be a bug in
POMELO's reference code here.  Did this inconsistent commend and code
come from there ("at most 32 bytes" vs. "saltlen > 64")?  If so, we
should report it on the PHC discussions list.  Can you join that list
and post in there, please?

> magnumripper commented on src/pomelo_fmt_plug.c in c36a2ed
> >Is there any specific reason (eg. performance) to limit max length?
> >I would like to suggest you bump it to 125 which is the max of core john
> 
> did you mean "#define PLAINTEXT_LENGTH        100"   ?

I guess this is what magnum meant, yes.

Thanks,

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ