Date: Mon, 9 Mar 2015 23:49:12 +0300 From: Aleksey Cherepanov <lyosha@...nwall.com> To: john-dev@...ts.openwall.com Subject: john-devkit and sha512 I am happy to introduce my code generator for john. https://github.com/AlekseyCherepanov/john-devkit At the moment, it may produce raw-sha512 that works a bit faster. The produced code: https://gist.github.com/AlekseyCherepanov/75b6621d3e5abb0c19d6 I compare peak speeds of several 5s tests: the generated code Raw: 2237K c/s real, 2237K c/s virtual original Raw-SHA512 Raw: 2180K c/s real, 2180K c/s virtual I unroll w setup by 16 and fully unroll main loop. I insert the code directly into format file, not into SSESHA512body(). Also my instruction tree may be different. I added -O3 using pragma. I tried various other options like full unroll of w setup and interleave (mixing and not mixing w arrays). They gave me bad results. Interleave is worth of separate note: I tried to look into assembler and I think that interleave works right. Though it does not avoid possibility to benefit from interleave. I may try to use w[idx][interleave_i] instead of w[idx * N + interleave_i] and/or to not unroll the main loop. Maybe I have to "debug" my optimizations on formats that already got them natively (or to try a profiler...). I generated only a small part of the format: the main algo, while padding was borrowed from john together with all supportive functions (like valid()). Current state (commit bc0fcc166435302a29d9a41d337c736717b17749) is very dirty. It is not possible to turn on scalar output and not all optimization are combinable (for instance, interleave can't mix k arrays while it may mix w arrays, so full unroll of main loop is needed for interleave). I'll fix it. Files involved into generation: format_john_sha512.py - definition of optimizations to be used algo_sha512.py - the algo t_raw-sha512.c - C template to insert the code bytecode_main.py - most optimization are implemented here lang_common.py - misc file util_ui.py - misc file lang_spec.py - type signatures of 'instructions' lang_main.py - setup code to provide DSL for algo_* scripts output_c.py - output from 'bytecode' to C The code generator was born from attempts to implement bitslice 2 years ago. Then it evolved into dreams about general purpose code generator with separation of algorithms and optimizations. I think that the separation is crucial for efficient development of john. I got part of my inspiration from Jim's dynamic formats. The generator is written in Python, but unusually: I don't use OOP (only a bit to provide operator overloading to have DSL), instead I use small functions and explicit global variables because I used to write in emacs lisp. I'd like to target GPUs and to implement various optimizations (like custom instruction scheduling) and so on. Also I'd like to implement various formats (especially those we meet in contests). Though generated code may be undesirable for john because the code is big and ugly. Even though john-devkit may be a playground where one may quickly try various parameters of optimization and then implement these optimizations manually. I propose development of john-devkit as an idea for GSoC. The task is huge. So I'd like to hear suggestions on how to choose a part. I see a chance to implement better support of sha-2 using john-devkit before the GSoC. Suggestions? Thanks! -- Regards, Aleksey Cherepanov
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.