Date: Tue, 4 Jan 2022 20:59:43 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: sha512crypt-opencl / Self test failed (cmp_all(1)) Hi Jason, On Mon, Jan 03, 2022 at 03:15:17PM -0500, Jason Cooper wrote: > I'm encountering the following error: > > ``` > $ john --wordlist=crackstation.txt --rules --format=sha512crypt-opencl > passwd > Device 1@...alhost: Intel(R) UHD Graphics [0x9bc4] > Using default input encoding: UTF-8 > Loaded 1 password hash (sha512crypt-opencl, crypt(3) $6$ [SHA512 OpenCL]) > Cost 1 (iteration count) is 5000 for all loaded hashes > Error creating binary cache file: No such file or directory > Self test failed (cmp_all(1)) > ``` > > It runs fine, but slow, when I remove `--format=..` Unfortunately, OpenCL on Intel embedded GPUs is generally unreliable - if it passes self-test for your desired JtR format, you're lucky. In this case, no luck for you, it seems. As to performance, it is unlikely that embedded GPU would be any faster than your CPU cores - more likely, it'd be slower yet. However, if you were lucky, you could use both simultaneously (preferably with separate sessions and different attacks), so the cumulative performance would be somewhat better. > john version: > > ``` > $ john > John the Ripper 1.9.0-jumbo-1 MPI + OMP [linux-gnu 64-bit x86_64 AVX AC] > ``` I'm not familiar with the Arch Linux package, and I don't know your laptop's exact specs, but moving from AVX to AVX2 or AVX-512 (if your CPU supports those) will likely make a greater difference than trying to use the poor little embedded GPU. For example, here's the old i7-4770K's CPU cores vs. its embedded GPU: $ ./john -test -form=sha512crypt Will run 8 OpenMP threads Benchmarking: sha512crypt, crypt(3) $6$ (rounds=5000) [SHA512 256/256 AVX2 4x]... (8xOMP) DONE Speed for cost 1 (iteration count) of 5000 Raw: 6493 c/s real, 813 c/s virtual $ ./john -test -form=sha512crypt-opencl -dev=1 Device 1: Intel(R) HD Graphics Benchmarking: sha512crypt-opencl, crypt(3) $6$ (rounds=5000) [SHA512 OpenCL]... Build log: fcl build 1 succeeded. fcl build 2 succeeded. bcl build succeeded. LWS=8 GWS=640 (80 blocks) DONE Speed for cost 1 (iteration count) of 5000 Raw: 1213 c/s > My GPU: > > ``` > $ john --list=opencl-devices > Platform #0 name: Intel(R) OpenCL HD Graphics, version: OpenCL 3.0 > Device #0 (1) name: Intel(R) UHD Graphics [0x9bc4] > Device vendor: Intel(R) Corporation > Device type: GPU (LE) > Device version: OpenCL 3.0 NEO > Driver version: 21.49.21786 > Native vector widths: char 16, short 8, int 4, long 1 > Preferred vector width: char 16, short 8, int 4, long 1 > Global Memory: 12496 MB > Global Memory Cache: 512 KB > Local Memory: 64 KB (Local) > Constant Buffer size: 4095 MB > Max memory alloc. size: 4095 MB > Max clock (MHz): 1150 > Profiling timer res.: 83 ns > Max Work Group Size: 256 > Parallel compute cores: 24 > Stream processors: 192 (24 x 8) > Speed index: 220800 > ``` Yours should be only a tiny bit faster than what I benchmarked above, which was: $ ./john --list=opencl-devices -dev=1 Platform #0 name: Intel(R) OpenCL, version: OpenCL 1.2 Device #0 (1) name: Intel(R) HD Graphics Device vendor: Intel(R) Corporation Device type: GPU (LE) Device version: OpenCL 1.2 Driver version: 126.96.36.199.39163 Native vector widths: char 1, short 1, int 1, long 1 Preferred vector width: char 1, short 1, int 1, long 1 Global Memory: 1630 MiB Global Memory Cache: 256 KiB Local Memory: 64 KiB (Local) Constant Buffer size: 64 KiB Max memory alloc. size: 407 MiB Max clock (MHz): 1250 Profiling timer res.: 80 ns Max Work Group Size: 512 Parallel compute cores: 20 Stream processors: 160 (20 x 8) Speed index: 200000 As you can see, these are pretty slow (at least at running this kernel). > I've tried every way I can think of to increase the verbosity, without > success. > I even ran `strings $(which john) | grep ^OCL` with no results. There's no need - it's a miscompile resulting in miscomputation, and that's it. > How do I debug this? Or, better yet, how do I fix it? Unfortunately, only by trial and error. You could try different versions of Intel OpenCL. You could try modifying the OpenCL kernel code to hopefully avoid triggering whatever miscompile there is. To guide this process, you could introduce debugging printf()s in there and compare against a run that passes self-tests on another device. However, I wouldn't bother. Instead, I recommend that you build the latest bleeding-jumbo off GitHub on your system, and use that. It might run faster than the package, and we've made various improvements since the 1.9.0-jumbo-1 release. And who knows, maybe it'd also avoid whatever OpenCL kernel miscompile you ran into on the embedded GPU. I hope this helps. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.