john-dev - Re: faster fgets implementation by atom

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date: Tue, 22 Apr 2014 04:20:12 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: faster fgets implementation by atom

On 2013-01-08 07:11, Solar Designer wrote:
> On Tue, Jan 08, 2013 at 10:48:12AM +0530, Dhiru Kholia wrote:
>> atom has released source code for a faster fgets implementation named
>> "fgets-sse2" (See https://hashcat.net/forum/thread-1912.html)
>>
>> Does it look like something we could re-use in JtR?
>
> I think we should rewrite/cleanup wordlist.c, and use our own large
> buffer there.  Right now, we use a large buffer when the wordlist is
> loaded entirely into memory, but not for reads from a file when the
> wordlist does not fit in the memory buffer.  And our code is dirty.
>
> Once we're properly using a buffer of our own in both cases, it's a
> separate question whether we also add SIMD code for locating the '\n'
> or/and use memchr() (and hope that it has SIMD code in it?)

I'm in "research mode" before going too far on some wrong track so I 
revisited this. I implemented Atom's code over a year ago and started 
testing it but that branch has been sleeping since.

> (...) there are some uses of fgets() other than in wordlist mode -
> for reading of password files, in the unique program.  I doubt we care
> about optimizing these much, although we could.

I opted to simply add it to fgetl() inlined in misc.c so unique is 
automatically affected too (as well as pot reload now that I rebased it 
on current bleeding).

Quick tests without/with fgets-sse2:

- unique of original rockyou file: ~ 17s -> 14s

- Attacking 3K NT hashes with rockyou:
     unbuffered wordlist: 6319Kp/s -> 12156Kp/s [1]
     buffered wordlist: 9021Kp/s -> 9078Kp/s [2]
     unbuffered w/ rules: 1940Kp/s -> 3315Kp/s
     buffered w/ rules: 4829Kp/s -> 4850Kp/s
     stdin: 6549Kp/s -> 10704Kp/s
     pipe: 6182Kp/s -> 8909Kp/s
     pipe w/ rules: 4936Kp/s -> 4980Kp/s

Lines [1] and [2] are interesting. With the SSE fgetl(), the initial 
buffer load negates the win from the buffer (even though that stage uses 
SSE too). Without SSE, we have significant win.

This is not tested nor optimized much yet. The branch is called 
"fgetl-sse" on GitHub in case someone wants to try it out.

BTW considering I have a --test speed of 47419Kc/s, even the 12156Kp/s 
is a disappointing figure.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.