john-users - John the ripper with Tesla GPUs on Debian

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABJfu+Yeuga44jN9MMbXot_JTTab=J_bg5bS81nwEmBa_O1MRg@mail.gmail.com>
Date: Thu, 30 Jul 2015 23:28:33 +0200
From: Viktor Gazdag <woodspeed@...il.com>
To: john-users@...ts.openwall.com
Subject: John the ripper with Tesla GPUs on Debian

Hi all,


I've installed the latest Tesla driver from nvidia site, not the
nvidia vga driver and not from repo.
We downloaded and installed the ubuntu 14.04 deb package and the
dependences with aptitude and some dependencies with ubuntu packages,
because some version weren't high enough or weren't available. Maybe
the cuda*.run installer would be better, but I haven't had that much
free space left to try to install it.
We've installed librexgen from github, but john didn't find one lib,
because it was in an another directory (I think it was in the
librexgen/c/ directory).

Before run configure, I've added the following env (these are needed
for running john, too):
export CC=gcc-4.9
export PATH=$PATH:/usr/local/cuda-7.0/bin
export LD_LIBRARY_PATH=:/usr/local/cuda-7.0/lib64
export PATH=$PATH:/usr/local/cuda-7.0/targets/x86_64-linux/include
export LD_LIBRARY_PATH=:/usr/local/cuda-7.0/targets/x86_64-linux/lib
export PATH=$PATH:/usr/lib/gcc/x86_64-linux-gnu/4.9

The last export is needed, because system didn't find cc1plus to compile things.

./configure --enable-experimental-code
make clean && make -s

After it, I could list cuda devices:

./john --list=cuda-devices
CUDA runtime version 7.0
CUDA driver version 7.0
8 CUDA devices found:

CUDA Device #0
    Name:                          Tesla K20Xm
    Type:                          discrete
    Compute capability:            3.5 (sm_35)
    Number of stream processors:   2688 (14 x 192)
    Clock rate:                    732 Mhz
    Memory clock rate (peak)       2600 Mhz
    Memory bus width               384 bits
    Peak memory bandwidth:         249 GB/s
    Total global memory:           5.0 GB (ECC)
    Total shared memory per block: 48.0 KB
    Total constant memory:         64.0 KB
    L2 cache size                  1.0 MB
    Kernel execution timeout:      No
    Concurrent copy and execution: Bi-directional
    Concurrent kernels support:    Yes
    Warp size:                     32
    Max. GPRs/thread block         65536
    Max. threads per block         1024
    Max. resident threads per MP   2048
    PCI device topology:           0d:00.0
    NVML id:                       2
    Fan speed:                     n/a
    GPU temp:                      24°C
    Utilization:                   0%


CUDA Device #1
    Name:                          Tesla K20Xm
    Type:                          discrete
    Compute capability:            3.5 (sm_35)
    Number of stream processors:   2688 (14 x 192)
    Clock rate:                    732 Mhz
    Memory clock rate (peak)       2600 Mhz
    Memory bus width               384 bits
    Peak memory bandwidth:         249 GB/s
    Total global memory:           5.0 GB (ECC)
    Total shared memory per block: 48.0 KB
    Total constant memory:         64.0 KB
    L2 cache size                  1.0 MB
    Kernel execution timeout:      No
    Concurrent copy and execution: Bi-directional
    Concurrent kernels support:    Yes
    Warp size:                     32
    Max. GPRs/thread block         65536
    Max. threads per block         1024
    Max. resident threads per MP   2048
    PCI device topology:           0e:00.0
    NVML id:                       3
    Fan speed:                     n/a
    GPU temp:                      24°C
    Utilization:                   0%


CUDA Device #2
    Name:                          Tesla K20Xm
    Type:                          discrete
    Compute capability:            3.5 (sm_35)
    Number of stream processors:   2688 (14 x 192)
    Clock rate:                    732 Mhz
    Memory clock rate (peak)       2600 Mhz
    Memory bus width               384 bits
    Peak memory bandwidth:         249 GB/s
    Total global memory:           5.0 GB (ECC)
    Total shared memory per block: 48.0 KB
    Total constant memory:         64.0 KB
    L2 cache size                  1.0 MB
    Kernel execution timeout:      No
    Concurrent copy and execution: Bi-directional
    Concurrent kernels support:    Yes
    Warp size:                     32
    Max. GPRs/thread block         65536
    Max. threads per block         1024
    Max. resident threads per MP   2048
    PCI device topology:           09:00.0
    NVML id:                       0
    Fan speed:                     n/a
    GPU temp:                      23°C
    Utilization:                   0%


CUDA Device #3
    Name:                          Tesla K20Xm
    Type:                          discrete
    Compute capability:            3.5 (sm_35)
    Number of stream processors:   2688 (14 x 192)
    Clock rate:                    732 Mhz
    Memory clock rate (peak)       2600 Mhz
    Memory bus width               384 bits
    Peak memory bandwidth:         249 GB/s
    Total global memory:           5.0 GB (ECC)
    Total shared memory per block: 48.0 KB
    Total constant memory:         64.0 KB
    L2 cache size                  1.0 MB
    Kernel execution timeout:      No
    Concurrent copy and execution: Bi-directional
    Concurrent kernels support:    Yes
    Warp size:                     32
    Max. GPRs/thread block         65536
    Max. threads per block         1024
    Max. resident threads per MP   2048
    PCI device topology:           0a:00.0
    NVML id:                       1
    Fan speed:                     n/a
    GPU temp:                      22°C
    Utilization:                   0%


CUDA Device #4
    Name:                          Tesla K20Xm
    Type:                          discrete
    Compute capability:            3.5 (sm_35)
    Number of stream processors:   2688 (14 x 192)
    Clock rate:                    732 Mhz
    Memory clock rate (peak)       2600 Mhz
    Memory bus width               384 bits
    Peak memory bandwidth:         249 GB/s
    Total global memory:           5.0 GB (ECC)
    Total shared memory per block: 48.0 KB
    Total constant memory:         64.0 KB
    L2 cache size                  1.0 MB
    Kernel execution timeout:      No
    Concurrent copy and execution: Bi-directional
    Concurrent kernels support:    Yes
    Warp size:                     32
    Max. GPRs/thread block         65536
    Max. threads per block         1024
    Max. resident threads per MP   2048
    PCI device topology:           28:00.0
    NVML id:                       4
    Fan speed:                     n/a
    GPU temp:                      27°C
    Utilization:                   0%


CUDA Device #5
    Name:                          Tesla K20Xm
    Type:                          discrete
    Compute capability:            3.5 (sm_35)
    Number of stream processors:   2688 (14 x 192)
    Clock rate:                    732 Mhz
    Memory clock rate (peak)       2600 Mhz
    Memory bus width               384 bits
    Peak memory bandwidth:         249 GB/s
    Total global memory:           5.0 GB (ECC)
    Total shared memory per block: 48.0 KB
    Total constant memory:         64.0 KB
    L2 cache size                  1.0 MB
    Kernel execution timeout:      No
    Concurrent copy and execution: Bi-directional
    Concurrent kernels support:    Yes
    Warp size:                     32
    Max. GPRs/thread block         65536
    Max. threads per block         1024
    Max. resident threads per MP   2048
    PCI device topology:           2b:00.0
    NVML id:                       5
    Fan speed:                     n/a
    GPU temp:                      27°C
    Utilization:                   0%


CUDA Device #6
    Name:                          Tesla K20Xm
    Type:                          discrete
    Compute capability:            3.5 (sm_35)
    Number of stream processors:   2688 (14 x 192)
    Clock rate:                    732 Mhz
    Memory clock rate (peak)       2600 Mhz
    Memory bus width               384 bits
    Peak memory bandwidth:         249 GB/s
    Total global memory:           5.0 GB (ECC)
    Total shared memory per block: 48.0 KB
    Total constant memory:         64.0 KB
    L2 cache size                  1.0 MB
    Kernel execution timeout:      No
    Concurrent copy and execution: Bi-directional
    Concurrent kernels support:    Yes
    Warp size:                     32
    Max. GPRs/thread block         65536
    Max. threads per block         1024
    Max. resident threads per MP   2048
    PCI device topology:           30:00.0
    NVML id:                       6
    Fan speed:                     n/a
    GPU temp:                      27°C
    Utilization:                   0%


CUDA Device #7
    Name:                          Tesla K20Xm
    Type:                          discrete
    Compute capability:            3.5 (sm_35)
    Number of stream processors:   2688 (14 x 192)
    Clock rate:                    732 Mhz
    Memory clock rate (peak)       2600 Mhz
    Memory bus width               384 bits
    Peak memory bandwidth:         249 GB/s
    Total global memory:           5.0 GB (ECC)
    Total shared memory per block: 48.0 KB
    Total constant memory:         64.0 KB
    L2 cache size                  1.0 MB
    Kernel execution timeout:      No
    Concurrent copy and execution: Bi-directional
    Concurrent kernels support:    Yes
    Warp size:                     32
    Max. GPRs/thread block         65536
    Max. threads per block         1024
    Max. resident threads per MP   2048
    PCI device topology:           33:00.0
    NVML id:                       7
    Fan speed:                     n/a
    GPU temp:                      26°C
    Utilization:                   4%

Best regards
woodspeed
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.