Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Mon, 9 Mar 2015 23:49:12 +0300
From: Aleksey Cherepanov <lyosha@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: john-devkit and sha512

I am happy to introduce my code generator for john.
https://github.com/AlekseyCherepanov/john-devkit

At the moment, it may produce raw-sha512 that works a bit faster. The
produced code:
https://gist.github.com/AlekseyCherepanov/75b6621d3e5abb0c19d6

I compare peak speeds of several 5s tests:
the generated code
Raw:	2237K c/s real, 2237K c/s virtual
original Raw-SHA512
Raw:	2180K c/s real, 2180K c/s virtual

I unroll w setup by 16 and fully unroll main loop. I insert the
code directly into format file, not into SSESHA512body(). Also my
instruction tree may be different. I added -O3 using pragma.

I tried various other options like full unroll of w setup and
interleave (mixing and not mixing w arrays). They gave me bad results.
Interleave is worth of separate note: I tried to look into assembler
and I think that interleave works right. Though it does not avoid
possibility to benefit from interleave. I may try to use
w[idx][interleave_i] instead of w[idx * N + interleave_i] and/or to
not unroll the main loop. Maybe I have to "debug" my optimizations on
formats that already got them natively (or to try a profiler...).

I generated only a small part of the format: the main algo, while
padding was borrowed from john together with all supportive functions
(like valid()).


Current state (commit bc0fcc166435302a29d9a41d337c736717b17749) is
very dirty. It is not possible to turn on scalar output and
not all optimization are combinable (for instance, interleave can't
mix k arrays while it may mix w arrays, so full unroll of main loop is
needed for interleave). I'll fix it.


Files involved into generation:

format_john_sha512.py - definition of optimizations to be used
algo_sha512.py - the algo
t_raw-sha512.c - C template to insert the code

bytecode_main.py - most optimization are implemented here

lang_common.py - misc file
util_ui.py - misc file

lang_spec.py - type signatures of 'instructions'
lang_main.py - setup code to provide DSL for algo_* scripts

output_c.py - output from 'bytecode' to C


The code generator was born from attempts to implement bitslice 2
years ago. Then it evolved into dreams about general purpose code
generator with separation of algorithms and optimizations. I think
that the separation is crucial for efficient development of john.
I got part of my inspiration from Jim's dynamic formats.


The generator is written in Python, but unusually: I don't use OOP
(only a bit to provide operator overloading to have DSL), instead I
use small functions and explicit global variables because I used to
write in emacs lisp.


I'd like to target GPUs and to implement various optimizations (like
custom instruction scheduling) and so on. Also I'd like to implement
various formats (especially those we meet in contests). Though
generated code may be undesirable for john because the code is big and
ugly. Even though john-devkit may be a playground where one may
quickly try various parameters of optimization and then implement
these optimizations manually.


I propose development of john-devkit as an idea for GSoC. The task is
huge. So I'd like to hear suggestions on how to choose a part.

I see a chance to implement better support of sha-2 using john-devkit
before the GSoC. Suggestions?

Thanks!

-- 
Regards,
Aleksey Cherepanov

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.