Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 4 Jan 2021 16:34:34 +0100
From: "Anton Berggren" <antonb@...e.se>
To: <john-users@...ts.openwall.com>
Subject: Sv: Cracking rar password with rar-opencl

Thanks for all help!
Im new to this. Really dont know what im doing but i have read and tried all
kinds of examples in the documentation.

My test results for your example benchmark of rar and rar-opencl gave me
this.

C:\Users\Anton\Downloads\john-1.9.0-jumbo-1-win64\run>john --test
-format=rar
Will run 4 OpenMP threads
Benchmarking: rar, RAR3 (length 5) [SHA1 256/256 AVX2 8x AES]... (4xOMP)
DONE
Raw:    555 c/s real, 139 c/s virtual


C:\Users\Anton\Downloads\john-1.9.0-jumbo-1-win64\run>john --test
-format=rar-opencl
Will run 4 OpenMP threads
Device 3: GeForce GTX 760
Benchmarking: rar-opencl, RAR3 (length 5) [SHA1 OpenCL AES]... (4xOMP) DONE
Raw:    8353 c/s real, 8336 c/s virtual


Rar-archive is really small. 12,3kb. 
I will try everything as you people suggest.

clinfo.exe gave me this:

C:\Users\Anton\Downloads>clinfo.exe
Number of platforms                               2
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 11.2.66
  Platform Profile                                FULL_PROFILE
  Platform Extensions
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts
cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_device_uuid
  Platform Extensions function suffix             NV

  Platform Name                                   Intel(R) OpenCL
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 1.2
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_intel_dx9_media_sharing
cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d11_sharing
cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_sharing
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_icd cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_spir
  Platform Extensions function suffix             INTEL

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce GTX 760
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Device UUID
054b42bb-0b83-fced-93da-ee6fe9337453
  Driver UUID
054b42bb-0b83-fced-93da-ee6fe9337453
  Valid Device LUID                               Yes
  Device LUID                                     5ba3-000000000000
  Device Node Mask                                0x1
  Driver Version                                  460.89
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               6
  Max clock frequency                             1150MHz
  Compute Capability (NV)                         3.0
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (n/a)
    float                                                1 / 1
    double                                               1 / 1
(cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              2147483648 (2GiB)
  Error Correction support                        No
  Max memory allocation                           536870912 (512MiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        98304 (96KiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             4096x4096x4096 pixels
    Max number of read image args                 256
    Max number of write image args                16
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  1
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Device Extensions
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts
cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_device_uuid

  Platform Name                                   Intel(R) OpenCL
Number of devices                                 2
  Device Name                                     Intel(R) HD Graphics 4600
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2
  Driver Version                                  20.19.15.5166
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               20
  Max clock frequency                             1200MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     by <unknown>
(0x9400000000000000)
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             512x512x512
  Max work group size                             512
  Preferred work group size multiple (kernel)     32
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (n/a)
    float                                                1 / 1
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              1708759450 (1.591GiB)
  Error Correction support                        No
  Max memory allocation                           427189862 (407.4MiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        262144 (256KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            26699366 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4096 bytes
    Pitch alignment for 2D image buffers          64 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max number of constant args                     8
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     1024
  Queue properties
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Number of simultaneous interops (Intel)         1
  Simultaneous interops                           GL WGL D3D11
  Profiling timer resolution                      80ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels
block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block
_advanced_motion_estimate_bidirectional_check_intel
  Motion Estimation accelerator version (Intel)   2
  Device Extensions                               cl_intel_accelerator
cl_intel_advanced_motion_estimation cl_intel_ctz
cl_intel_d3d11_nv12_media_sharing cl_intel_dx9_media_sharing
cl_intel_motion_estimation cl_intel_simultaneous_sharing cl_intel_subgroups
cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d10_sharing
cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing
cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_gl_sharing cl_khr_icd cl_khr_image2d_from_buffer
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_spir

  Device Name                                     Intel(R) Core(TM) i5-4670K
CPU @ 3.40GHz
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 (Build 10094)
  Driver Version                                  5.2.0.10094
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               4
  Max clock frequency                             3400MHz
  Device Partition                                (core)
    Max number of sub-devices                     4
    Supported partition types                     by counts, equally, by
names (Intel)
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             8192x8192x8192
  Max work group size                             8192
  Preferred work group size multiple (kernel)     128
  Preferred / native vector sizes
    char                                                 1 / 32
    short                                                1 / 16
    int                                                  1 / 8
    long                                                 1 / 4
    half                                                 0 / 0        (n/a)
    float                                                1 / 8
    double                                               1 / 4
(cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              17041711104 (15.87GiB)
  Error Correction support                        No
  Max memory allocation                           4260427776 (3.968GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        262144 (256KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             480
    Max size for 1D images from buffer            266276736 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 480
    Max number of write image args                480
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max number of constant args                     480
  Max constant buffer size                        131072 (128KiB)
  Max size of kernel argument                     3840 (3.75KiB)
  Queue properties
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Local thread execution (Intel)                Yes
  Prefer user sync for interop                    No
  Profiling timer resolution                      100ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_icd
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes
cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing
cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing
cl_khr_fp64


NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in
platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices
found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type
for platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

-----Ursprungligt meddelande-----
Från: Solar Designer <solar@...nwall.com> 
Skickat: den 4 januari 2021 15:15
Till: john-users@...ts.openwall.com
Ämne: Re: [john-users] Cracking rar password with rar-opencl

Hi Anton,

On Mon, Jan 04, 2021 at 11:55:52AM +0100, Anton Berggren wrote:
>     Device #0 (1) name:     Intel(R) HD Graphics 4600

>     Device #1 (2) name:     Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz

This embedded GPU is of comparable performance to the CPU.  Here's i7-4770K
under Linux:

$ ./john -test -format=rar-opencl -dev=1 Will run 8 OpenMP threads Device 1:
Intel(R) HD Graphics
Benchmarking: rar-opencl, RAR3 (length 5) [SHA1 OpenCL AES]... (8xOMP) Build
log: fcl build 1 succeeded.
fcl build 2 succeeded.
bcl build succeeded.

LWS=16 GWS=640 (40 blocks) DONE
Raw:    680 c/s real, 96000 c/s virtual

$ ./john -test -format=rar-opencl -dev=2 Will run 8 OpenMP threads Device 2:
Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
Benchmarking: rar-opencl, RAR3 (length 5) [SHA1 OpenCL AES]... (8xOMP) Build
log: Compilation started Compilation done Linking started Linking done
Device build started Device build done Kernel <RarInit> was not vectorized
Kernel <RarHashLoop> was successfully vectorized (8) Kernel <RarFinal> was
successfully vectorized (8) Kernel <RarCheck> was not vectorized Done.
LWS=128 GWS=1024 (8 blocks) DONE
Raw:    459 c/s real, 57.8 c/s virtual

$ ./john -test -format=rar              
Will run 8 OpenMP threads
Benchmarking: rar, RAR3 (length 5) [SHA1 256/256 AVX2 8x AES]... (8xOMP)
DONE
Raw:    512 c/s real, 64.5 c/s virtual

Please note that rar-opencl also makes some use of the CPU via OpenMP, even
when its target device is a GPU.

You'll probably want to run similar tests for all 3 of your devices, and
perhaps post the results in here.

> And i resume with this command and get the output 
> C:\Users\Anton\Downloads\john-1.9.0-jumbo-1-win64\run>john --restore 
> Device 3: GeForce GTX 760 Loaded 1 password hash (rar-opencl, RAR3 
> [SHA1 OpenCL AES]) Will run 4 OpenMP threads Proceeding with 
> incremental:ASCII Press 'q' or Ctrl-C to abort, almost any other key 
> for status
> 
> Is it only using my Nvidia GPU? How can i utilize all my decices? Can 
> i optimize my rar password cracking for a more effective usage?
> It seems that my GPU usage isnt constant. It goes up and down.. up and 
> down.. up and down... about 10-30%. That is what windows reports anyway.

Do you mean 10-30% utilization, or 10-30% left idle (so 70-90% load)?

The fluctuating utilization is possibly because of post-processing done on
the CPU.  How large is the RAR archive?

You might increase average GPU utilization by running more than one attack
on it - either start a second instance of JtR with a different "--session"
name and configured to test different candidate passwords (a non-overlapping
wordlist, etc.) or use "--fork=2" (yes, with just one NVIDIA GPU device).

Using the CPU more directly and using its embedded GPU isn't necessarily a
good idea as it'd likely lower your NVIDIA GPU utilization, but feel free to
give this a try with separate sessions.  You'll likely want to set a lower
CPU thread count via the environment variable OMP_NUM_THREADS to reduce
competition for the CPU (competition can be very wasteful).

Using all devices in one session (like you technically could with
"--devices=1,2,3 --fork=3" is almost certainly a bad idea since the devices
are so different and since the best way to use a CPU is generally by using
the non-OpenCL format, but feel free to try anyway.
(Maybe I'm over-estimating your NVIDIA GPU's performance, and it's actually
similar to your CPU and your embedded GPU?  I notice it's a Kepler era
device, and isn't large.)

Again regarding the fluctuating GPU utilization, see also the "rar-opencl
performance" thread we had in here in September:

https://www.openwall.com/lists/john-users/2020/09/

Windows might be under-reporting GPU utilization.  We recently had a thread
in here where this was found to be the case for AMD GPUs.  For more reliable
reporting, please use tools that come with the GPU driver.

Anyway, far more importantly than all of the above, you need to focus the
attack to test candidate passwords that are actually likely.  You might want
to share in here what you know/recall about the password in plain English,
and we'll help you encode that into options to "john".

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.