Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 31 May 2018 10:00:36 -0600 (MDT)
From: Leonard Rose <len@...itude.net>
To: Solar Designer <solar@...nwall.com>
Cc: john-users@...ts.openwall.com
Subject: Re: NVIDIA Jetson TK1 GPU & John

Hi! I was able to get this working easily! I got a link to the CUDA branch  https://github.com/magnumripper/JohnTheRipper/tree/CUDA from someone at the NVIDIA developer's forum and then proceeded to build the code. With the addition of a few additional packages in Tegra OpenMPI and CUDA support works well using configure options. 

I have been using a cluster of (4) TK1 Kepler GPUs for about a week now and I am very pleased with how easy it was. I have been using mpich-2 on another model cluster (66 ARM A10 cpu) I built so have worked with MPI John a long time. Building this was straightforward and it simply works out of the box. 

I had some issues with NVIDIA's Tegra namely they had some mistakes in their shared library installation you have to fix. In order to get the code working with OpenMPI on multiple nods you need to add /usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib to LD environment. I share John run via NFS to all nodes in the cluster. 

When I tried to run ldconfig with this new path ld produced an error. After a few minutes of study you will find that NVIDIA's development package has an error in how it installed some of the library packages.  

It turns out that instead of linking (2) additional libraries (libcudnn.so.6.5 and libcudnn.so) to libcudnn.so.6.5.48 they made binary copies. All you have to do to resolve this to remove the files and recreate the proper links. 

IE:


root@...02:/etc/ld.so.conf.d# ldconfig
/sbin/ldconfig.real: /usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib/libcudnn.so.6.5 is not a symbolic link

root@...02:/etc/ld.so.conf.d# cd /usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib
root@...02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# ls -l *cudnn*
-rwxr-xr-x 1 root root 8978224 Apr 26 21:49 libcudnn.so
-rwxr-xr-x 1 root root 8978224 Apr 26 21:49 libcudnn.so.6.5
-rwxr-xr-x 1 root root 8978224 Apr 26 21:49 libcudnn.so.6.5.48
-rwxr-xr-x 1 root root 9308614 Apr 26 21:49 libcudnn_static.a
root@...02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# rm libcudnn.so libcudnn.so.6.5
root@...02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# ln -s libcudnn.so.6.5.48 libcudnn.so.6.5
root@...02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# ln -s libcudnn.so.6.5.48 libcudnn.so
root@...02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# ls -l *cudnn*
lrwxrwxrwx 1 root root      18 May 25 01:02 libcudnn.so -> libcudnn.so.6.5.48
lrwxrwxrwx 1 root root      18 May 25 01:02 libcudnn.so.6.5 -> libcudnn.so.6.5.48
-rwxr-xr-x 1 root root 8978224 Apr 26 21:49 libcudnn.so.6.5.48
-rwxr-xr-x 1 root root 9308614 Apr 26 21:49 libcudnn_static.a
root@...02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# ldconfig
root@...02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib#
root@...02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# ldconfig
root@...02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib#

I am always amazed with the advances since the days of standalone array processors and clunky fortran to the current day where we have magnitudes more power in tiny wafers of silicon lit up by the code that drives it all.  For me this was one of those epiphanies when I realized the level of performance I was seeing (even without OpenCL support):

I.E. 

mpirun: Forwarding signal 10 to job
2 0g 5:09:20:40  3/3 0g/s 844.7p/s 844.7c/s 844.7C/s l0g12421..l0g10674
1 0g 5:09:18:53  3/3 0g/s 847.0p/s 847.0c/s 847.0C/s sitcr2d3..sitatziz
3 0g 5:09:35:26  3/3 0g/s 840.1p/s 840.1c/s 840.1C/s st.gearte..st.grucho
4 0g 5:09:20:16  3/3 0g/s 752.0p/s 752.0c/s 752.0C/s mhbcmlp..mhhurg9

(snip)

1 1:20:13:11 - Switching to distributing words
1 1:20:13:13 Proceeding with "incremental" mode: ASCII
1 1:20:13:13 - Lengths 0 to 13, up to 95 different characters
2 1:22:39:09 - Switching to distributing words
2 1:22:39:10 Proceeding with "incremental" mode: ASCII
2 1:22:39:10 - Lengths 0 to 13, up to 95 different characters
4 2:04:24:32 - Switching to distributing words
4 2:04:24:54 Proceeding with "incremental" mode: ASCII
4 2:04:24:54 - Lengths 0 to 13, up to 95 different characters



Thank you to everyone who worked on CUDA and GPU support, it's really incredible what you have done placing such raw power at or fingertips. 

Here is what it looks like as of today http://t2k.wdfiles.com/local--files/jetson-tk1/2018-05-21-jetson-tk1-01.jpg hasn't quite made it off the bench yet.


----- Original Message -----
From: "Solar Designer" <solar@...nwall.com>
To: "Len Rose" <len@...itude.net>
Cc: "john-users" <john-users@...ts.openwall.com>
Sent: Thursday, May 31, 2018 8:50:55 AM
Subject: Re: [john-users] NVIDIA Jetson TK1 GPU & John

Hi Leonard,

On Thu, Apr 12, 2018 at 04:27:38PM -0600, Leonard Rose wrote:
> Has anyone used this development board successfully with JTR? I recently bought one used that I wanted to try JtR with 192 CUDA cores (up to 326 GFLOPS!) In the past I have built a small cluster using MPI of 66 ARM cpu but was hoping to build something useful with these NVIDIA boards. If I can get this working I can see a lot of fun in the future learning about GPU and OpenCL.... 

I'm sorry no one seems to have replied to you so far.

Yes, people tried JtR on NVIDIA Jetson TK1 before:

http://www.openwall.com/lists/john-users/2014/07/17/4
http://www.openwall.com/lists/john-users/2015/10/29/1

These threads mention some old build issues, etc., but those are
supposed to be fixed or otherwise irrelevant in current bleeding-jumbo
(and yes, it'll be solely OpenCL now, including on NVIDIA, as we've
dropped CUDA support).  So I am referring to the threads not for the way
outdated advice/workarounds given there, but merely to answer your
question.  You'll actually want to use bleeding-jumbo, and just try to
build it in the usual way (e.g., "./configure && make -sj4") without any
tweaks first.

Please just give this a try and report any issues there might be - or
report success even if there are no issues.

Of course, these boards are actually very slow compared to modern large
GPUs that you'd plug into x86 boxes.  But that shouldn't stop you from
having fun.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.