Date: Fri, 30 Jun 2017 19:22:33 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Cc: apingis@...nwall.net Subject: Re: DES-based crypt(3) cracking on ZTEX 1.15y FPGA boards (descrypt-ztex) On Thu, Jun 29, 2017 at 09:49:13PM -0800, Royce Williams wrote: > Fantastic! :) I'm still being excessively cautious with my > substandard cooling setup (and I do understand that these boards > should run much cooler with descrypt-ztex than when they are used for > cryptocurrency work), so I haven't experimented with overclocking just > yet. But I have at least now done some basic testing. Thank you! > Here are standard-clocks results on my setup - controlled directly > from a Raspberry Pi 2, through a couple of powered USB 2.0 hubs, to my > (now down to 14) functional boards: > > $ time ./john -form=descrypt-ztex -inc=alpha -min-len=8 -max-len=8 -mask='?w?l?l?l?l' pw-fake-unix Wow. This is very helpful. It would also be helpful for you to include the corresponding results for 1 board. And ditto (14 and 1) for bcrypt. Quite possibly you have the world's fastest bcrypt cracker right now. And, for these tests can we please standardize on -inc=lower for the incremental mode portion? My use of -inc=alpha was a mistake, it's inconsistent with the mask for the last 4 characters. > So after an hour, performance was ~706Mc/s per board (if I'm reading it right). Yes. And that's about 15% lower than we'd expect for 1 board if it were the only board. Probably the communication latency causes this. > I also used a Kill-A-Watt to roughly measure power consumption (just > of the supply to the boards, not any of the supporting gear): > > ~110W idle = ~7.8W/board > ~470W under load = ~33.6W/board This is also very helpful, and I also want a figure for bcrypt-ztex. > I also noted during the testing that CPU usage was around 40-44% during the run. The CPU usage is fine per se, but it indicates there's latency in John talking to the FPGAs, probably leaving them idle at times. We need to implement an asynchronous API at a higher level to fix this long-term. For now, we could try more workarounds, such as maybe supporting --fork along with use of ZTEX boards (allocate fewer boards per process, like we have with 1 GPU/process, which is also not great but works for now). This isn't supported yet, but maybe Denis could consider it. An easier (and better?) workaround is to buffer even more candidate passwords within 1 process. On my Qubes system, adding this line: +++ b/src/ztex/device_format.c @@ -111,6 +111,7 @@ void device_format_reset() // Mask data is ready, calculate and set keys_per_crypt unsigned int keys_per_crypt = jtr_bitstream->candidates_per_crypt / mask_num_cand(); + keys_per_crypt *= 4; if (!keys_per_crypt) keys_per_crypt = 1; makes the standard clocks c/s rate increase from 806M to 828M, which suspiciously matches the theoretical maximum Denis gives for the current design in the just committed src/ztex/fpga-descrypt/README.md: https://github.com/magnumripper/JohnTheRipper/pull/2598/commits/1214de42284c8b66728f0f8fd362a743f54c2ab0#diff-c70cc4e9666091acde7a844bfc24d88d This appears to work. The cracked passwords stream isn't expected to be exactly the same (that is, not in the same order) because the larger buffers unfortunately result in less optimal ordering of the candidates with incremental mode and the like. Also, the total running time of some short sessions (in my testing: mask, but not wordlist) appears to increase - perhaps the very last crypt_all() call (for each salt) hashes many more keys than are actually supplied? Denis, perhaps this is something you could fix? Royce, unless Denis says this hack is somehow very wrong, feel free to try it on your cluster, and maybe you'll regain some of those lost ~15%. Please note that this code is also used for bcrypt (and the change gives me slight speedup for bcrypt, too - about 2%, which were presumably lost to USB pass-through - just not that much of a speedup, because the performance hit on it wasn't that bad in the first place). So if you do go for this, please test bcrypt both ways (without and with this change) as well. Perhaps keep two john binaries. keys_per_crypt factors other than 4 (such as 2, 3, 5, more) on this line may also be tried. I only tried 4 - I didn't tune. Denis, maybe make this configurable? > The only troubles that I've encountered so far are from either > apparent problems of scale, or problems with individual boards. (Most > users won't encounter the scale problems, but given the age of these > boards, some users may run into the per-board issues, so they may > warrant closer examination; I will file issues for these as you > suggest). I see you've already created two GitHub issues for these - thank you! The segfaults are definitely bugs for us to fix. John shouldn't segfault no matter what happens with the boards. > With the individual board problems, during earlier (very kind!) > troubleshooting sessions, Denis already significantly improved how his > communication methods (ztex_inouttraffic?) handled my various failure > modes, but a few remain. Great. I see you use mostly (or exclusively?) US clones rather than ZTEX original boards, and this might have been causing some issues. (There are those edge connectors visible on the picture you tweeted. They're not present on ZTEX originals.) > It would of course be > preferable if a long-running job could continue after dropping an iffy > board, but I don't know if this would be feasible. It is feasible, and should already be the case - in fact, you mention it almost works for you, but not all the time. It's a bug to fix. One thing Denis hasn't implemented yet, but maybe should consider implementing, is similar handling of intermittent failures when running with just one board: reset it and resume cracking. Right now, if the only board fails, JtR terminates right away. Thanks again, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.