john-dev - Re: bcrypt: actual performance versus benchmark

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20120324233408.GA8416@openwall.com>
Date: Sun, 25 Mar 2012 03:34:08 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: bcrypt: actual performance versus benchmark

RB, all -

This is actually a john-users topic, not john-dev.  (Yes, I should have
pointed this out when I said that it should be a separate thread.)  I'll
reply in here for now, but overall let's be using john-users more.
john-dev is for discussing JtR source code (and proposed changes to it) -
stuff that would be too technical and uninteresting for JtR users.

On Sat, Mar 24, 2012 at 10:03:18AM -0600, RB wrote:
> On Fri, Mar 23, 2012 at 18:54, Solar Designer <solar@...nwall.com> wrote:
> > Are you saying that benchmark gives you 2k c/s, but actual run only
> > 100 c/s?  Is that for bcrypt or for RAR?  Anyway, this is best discussed
> > on its own thread.
> 
> Yes, for hashes beginning with "$2$" harvested from a SuSE Enterprise
> Linux system I was analyzing.
> 
> On my W3565, which shows best JtR performance with 4 threads
> (OMP_NUM_THREADS=4), and using the magnum-jumbo git repo updated
> 2012/03/21, "./john --test=20 --format=bf" benchmarked at between 2k
> and 2.1k c/s.
> 
> When cracking a single hash, actual speeds experienced were between 50
> and 100 (occasionally reaching up to 110) c/s on a system with no
> other load.  This was on a system running gentoo-sources-3.3.0,
> glibc-2.14.1, and gcc-4.6.2 (~x86_64 fully updated).

John's benchmark for --format=bf assumes 32 iterations ("$2a$05" hash
encoding prefix), whereas actual modern systems use higher iteration
counts.  From your numbers, it appears that your actual password hash
used something like 1024 iterations ("$2a$10" prefix).  If so, what
you're observing is the correct behavior.

As to OMP_NUM_THREADS=4, try to tune GOMP_SPINCOUNT instead (try values
like 10000 and 2000000; gcc 4.6.x's default is 300000) while keeping the
thread count at its default (should be 8 for your CPU, right?)  You'll
likely achieve higher speed in this way (should be over 3k c/s on the
benchmark).  And indeed you should be using an -x86-64 (or -x86-64i if
jumbo) build of John (not -x86-sse2 or other 32-bit).

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.