Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sun, 12 Apr 2015 12:14:33 +0200
From: Frank Dittrich <frank.dittrich@...lbox.org>
To: john-dev@...ts.openwall.com
Subject: 23% performance regression for brypt (Intel i5-4570 CPU)

Solar,

you included the bcrypt related changes of bleeding-jumbo commit
https://github.com/magnumripper/JohnTheRipper/commit/f64b42fee9e368cd85cf546f08b694510824fea2
into core, but decided to not allow BF_X2 = 3 for AVX systems.
(I guess this is because on well BF_X2 = 3 causes about 5% performance
regression.)

But for my system (64bit Linux, i5-4570 CPU), this causes a 23%
performance regression.

With latest bleeding-jumbo, I get

Will run 4 OpenMP threads
Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]...
(4xOMP) DONE
Speed for cost 1 (iteration count) of 32
Raw:    3888 c/s real, 967 c/s virtual

This is also what I get when I checkout master and enable OMP.

With commit f64b42fee9e368cd85cf546f08b694510824fea2 or any other commit
which uses BF_X2 = 3, I get

Will run 4 OpenMP threads
Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X3]...
(4xOMP) DONE
Speed for cost 1 (iteration count) of 32
Raw:    5040 c/s real, 1253 c/s virtual


Similarly, a generic build (with OMP) for the latest master commit gives

Will run 4 OpenMP threads
Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]... DONE
Raw:    3830 c/s real, 958 c/s virtual

When I patch best.sh to also test BF_X2 = 3 if BF_X2 = 1 is better than
BF_X2 = 0, I get

Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64]... 1992
c/s real, 498 c/s virtual
Compiling: Blowfish benchmark (scale)
Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64]... 2066
c/s real, 516 c/s virtual
Compiling: Blowfish benchmark (two hashes at a time)
Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]...
3830 c/s real, 958 c/s virtual
Compiling: Blowfish benchmark (three hashes at a time)
Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X3]...
4982 c/s real, 1245 c/s virtual

So, I suggest you test BF_X2 = 3 for generic builds (if BF_X2 = 1 is
better than BF_X2 = 0).

May be you also reconsider allowing BF_X2 = 3 for AVX.

Is there anything I can test, any more information you need to decide
when BF_X2 should be 3 even for AVX, and when it shouldn't?


Here's my patch to enhance generic:

diff --git a/src/best.sh b/src/best.sh
index 5e671b1..0192183 100755
--- a/src/best.sh
+++ b/src/best.sh
@@ -122,11 +122,21 @@ echo "Compiling: Blowfish benchmark (two hashes at
a time)"
 $MAKE bench || exit 1
 RES=`./bench 3` || exit 1
 if [ $RES -gt $MAX ]; then
+   MAX=$RES
    BF_X2=1
+   ./detect $DES_BEST $DES_COPY $DES_BS $MD5_X2 $MD5_IMM $BF_SCALE 3 >
arch.h
+   rm -f $BF_DEPEND bench
+   echo "Compiling: Blowfish benchmark (three hashes at a time)"
+   $MAKE bench || exit 1
+   RES=`./bench 3` || exit 1
+   if [ $RES -gt $MAX ]; then
+       BF_X2=3
+   fi
 else
    BF_X2=0
 fi


Frank

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.