Date: Sun, 06 May 2018 19:53:49 +0300 From: Aleksey Cherepanov <lyosha@...nwall.com> To: john-dev@...ts.openwall.com Subject: john-devkit at PHDays VI PHDays 8 is coming on (May 15-16, Russia, Moscow). So there is some info about my old talk about john-devkit done at PHDays VI, May 17 2016. The main achievement since my talk in 2015: john-devkit can generate more than 100 formats for john. The speeds are bad though. Deep understanding of john-devkit is needed for almost any development. That's a huge flaw. Slides as text https://raw.githubusercontent.com/AlekseyCherepanov/john-devkit/master/slides_2016-05-17_phdays_vi.src.org Code https://github.com/AlekseyCherepanov/john-devkit There are some cool things that were not included into the slides. "Complexity of optimizations for humans and machines" and "bitslicing" were mentioned in the annotation of the talk, but the talk was shorter than planned and did not include them. So this email covers these points. It would be more interesting to publish the notes and the code right after the talk. I am sorry for the delay. Short summary of project and work done - Intrusive Changes to intermediate representation (IR) are planned! - Experimental phase is not over yet - 100+ formats for john were tested - some pbkdf2-* and hmac-*, - "dynamics" built with md4, md5, sha1, sha256, sha512 - without optimizations (bad) - with one C template (good) - 1 variant of TrueCrypt (SHA-512 + AES XTS) - with optimizations - not as fast as in john - no vectorization - There is automatic bitslicing - only for fully unrolled code - it may wrap parts into cycles to mitigate code-cache problems - bitsliced LM is ~2x slower than in john - Hash parsers are generated too - New types in IR and respective optimizations: - functions - inlining - Merkle-Damgard construction as instruction - byte buffers - computation of known lengths - join/concat + split undoing - concat + slice moving - There is pattern matching to replace parts of IR easier - There is interpreter of IR for debugging - Borrowing of code was tried successfully: - python with changes from pyaes - linear parts in C from sphlib - Overall results are not descriptive so far: - it is not obvious that the goals of project can be met - there are no obvious blocks though - It feels like writing a compiler, it is pure fun! Optimizations hard for humans and machines - computers are good at repetitive tasks including bruteforcing - humans are good at innovations and pattern matching - also humans may beat a general purpose compiler using knowledge specific to the task - example win of humans: optimizing SHA-1 and SHA-256 - both have an extender and main loop - optimal SHA-1: unroll both loops and intermix them - optimal SHA-256: unroll only the main loop - SHA-1 is big enough to prevent compiler from intermixing loops even if loops were unrolled forcefully - example win of machines: bitslice - Roman Rusakov generated current formulas for S-boxes of DES used in John the Ripper with custom generator - it took 3 months - Billy Bob Brumley demonstrated application of simulated annealing algorithm to scheduling of DES S-box instructions - so john-devkit tries to be a tool that moves repetitive tasks from humans to machines saving full flexibility needed for innovations and precise application of optimizations Bitslice - splits numbers into bits and computes everything through bitwise operations - optimization focuses on minimization of Boolean formula (or circuit) - number of instructions per operator, bitsliced vs regular: - constant shifts/rotates and permutations of bits are done for free - bitwise ops are 1:1 - arithmetic addition grows significantly - static S-boxes may benefit from bitslicing - also we need to pack/unpack vectors of bits (transpose matrix of bits) - bitslicing may be done automatically in many cases if vectorization may be done automatically - bitslicing implies increase in number of variables - 64 variables for one 64-bit integer variable - developer needs to manage these variables - john-devkit grew from experiments to make work with bitslice more convenient Hash parsers john-devkit generates hash parsers for valid(), binary() and get_salt() from simple description in python. Example of code to generate hmac-md5 format with parser for cram-md5 variant ('$cram_md5$'.base64(salt).'$'.base64('<...> '.hex(hash))) and regular format (salt.'#'.hash): gen_format( ('hmac-md5', grep_tests('hmacMD5_fmt_plug.c', 'tests')), 16, make_hmac('md5'), [ 'or', '$cram_md5$', [ 'split', '$', [ 'b64', [ 'mime', 'raw', 'MIME_TRAIL_EQ' ], 'decode', 'salt' ], [ 'b64', [ 'mime', 'raw', 'MIME_TRAIL_EQ' ], 'decode', [ 'split_right', ' ', [ 'any', 'ignore' ], [ 'hex', 32, 32, 'decode', 'hash' ] ] ] ], True, [ 'split_right', '#', [ 'any', 'salt' ], [ 'hex', 32, 32, 'decode', 'hash' ] ] ] ) While the generator has its own hack value, a parser engine in C would be better due to much smaller binary code without duplications across all the formats. It is possible to write a parser engine in C using similar semantics of description and allowing definition of parsers in runtime. Own compiler - some optimizations in john-devkit resemble regular compilers - it is fun to develop your own compiler - it is rather easy in case of john-devkit because - cryptography is relatively small field - there is no need to implement rich general purpose language - for instance, some useful implementations of some algorithms need only 1 type of variables - code generation delegates low-level stuff to regular compiler - a lot of work on front-end was saved using embedded DSL - such approach allowed to use Python as macro-language, very rich macro-language - "programs" are developed together with the "compiler" - some transformations are very specific - pattern matching should simplify their creation - JFYI other languages allow similar thing too: Haskell by rewrite rules, C by clang rewriter - it is possible to add new primitives and transformations - ideas may be incorporated at different stages, there is more space to choose - john-devkit is designed to provide manual control over optimizations - so john-devkit is simpler - but it is not obvious yet how to optimize 100+ formats saving such flexibility - it is possible to write transformations with lower quality than in regular compiler: less generic, having assumptions - hence there are undocumented assumptions in the code (bad) - john-devkit itself may be slow - filters are isolated from each other outside of IR - almost all things are transferred through the list with instructions - there are "instructions" with meta-information - some things are passed explicitly at the very top-level, where all filters are called - so results of the filters are observable as of IR is readable and everything else is explicit - maybe there will be single global object shared between filters to store small settings to move common branching patterns from the top level into special filters (like vectorize_maybe) - john-devkit generates code that is not beautiful - John the Ripper provides rich foundation for development including - test vectors (including ad-hoc dynamic formats) - mechanism for benchmarking - implementations to get ideas from - abstraction for portable vectorization - and the main program: we compile only a part and embed it into big project written in regular C - hash cracking provides its own useful limitations: - we have as many data sets on input as we want - so vectorization occurs after all optimizations, there is almost no need to keep separate optimizations for vectorized code - length of input is limited Thanks! -- Regards, Aleksey Cherepanov
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.