|
|
Message-ID: <mpro.m6bwqh00fh20g0jpv.taviso@cmpxchg8b.com>
Date: Thu, 28 Jun 2012 15:13:32 +0200
From: Tavis Ormandy <taviso@...xchg8b.com>
To: john-dev@...ts.openwall.com
Subject: Re: Re: Failed self test for raw-sha1-ng (linux-x86-sse2i OMP)
magnum <john.magnum@...hmail.com> wrote:
> On 2012-06-28 12:54, magnum wrote:
> > On 2012-06-28 12:18, Frank Dittrich wrote:
> > > Due to another recent change in raw-sha1-ng, I get a new warning when
> > > compiling with clang version 2.9: rawSHA1_ng_fmt.c:127:14: warning:
> > > unknown pragma ignored [-Wunknown-pragmas] # pragma GCC optimize 3
> >>
> > > I don't know whether more recent clang versions support this pragma,
> > > so I wouldn't disable it for clang. I just wanted to let you know.
> >
> > That pragma is not implemented in GCC versions earlier than 4.4 so we
> > can test __GNUC__ and __GNUC_MINOR__ - I'll have a look even though this
> > is 100% benign.
>
> Fixed. We are really nit-picking now :-)
>
> magnum
>
Indeed ;-)
I was actually going to use this for another tweak, if I prefetch the next
passwords with __builtin_prefetch, I can pull a little extra performance out
of my hottest loop, but I've found that gcc doesn't do too badly with just
-fprefetch-loop-arrays (a non-default option).
I think I can do _slightly_ better than gcc, maybe I could perfect the code
gcc generates with various --params, but having gcc do it automatically is
nice. I was planning to just add #pragma GCC optimize
"-fpretch-loop-arrays".
Does that sound okay? I can wrap it in whatever __GNUC__ checks you like.
In fact, it made me curious if there are any other free performance wins, I
used this quick script below ( you can grep ^Raw: | sort -g )
Example output:
$ bash chooseopts.sh linux-x86-64-native rawSHA1_ng_fmt.c raw-sha1-ng | tee
log
Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 SSE4.1 intrinsics 4x]... DONE
Raw: 19482K c/s real, 19528K c/s virtual -fipa-type-escape
Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 SSE4.1 intrinsics 4x]... DONE
Raw: 19482K c/s real, 19528K c/s virtual -fno-ipa-type-escape
Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 SSE4.1 intrinsics 4x]... DONE
Raw: 19482K c/s real, 19528K c/s virtual -fivopts
Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 SSE4.1 intrinsics 4x]... DONE
Raw: 19391K c/s real, 19443K c/s virtual -fno-ivopts
....
$ grep ^Raw: log | sort -g | tail
Raw: 19538K c/s real, 19590K c/s virtual -funroll-all-loops
Raw: 19540K c/s real, 19572K c/s virtual -fno-align-jumps
Raw: 19573K c/s real, 19605K c/s virtual -minline-all-stringops
Raw: 19688K c/s real, 19721K c/s virtual -fprefetch-loop-arrays
The quickest win for me does seem to be -fprefetch-loop-arrays
#!/bin/bash
#
# usage: cd src; bash chooseopts.sh target filename.c formatname
# e.g.
# $ bash chooseopts.sh linux-x86-64-native rawSHA1_ng_fmt.c raw-sha1-ng
#
declare -a optimizers=(
$(gcc --help=optimizers | awk '/^[ ]*-f/ {printf
"%s\n%s\n",$1,gensub(/^-f/,"-fno-",1,$1)}')
);
declare -a scores
declare -ir testtime=30
# for every optimization option gcc reports it supports, build an object
# and benchmark it.
for ((i = 0; i < ${#optimizers[@]}; i++)); do
# build a new john with this flag applied to this object file.
if ! make ${1} MAKE="make -W ${2}" CC="gcc -frandom-seed=seed
${optimizers[i]}" &> /dev/null; then
# code doesn't compile, skip it.
continue
fi
# if it built, find checksum of this code.
checksum="$(objdump -d ${2//.c/.o} | cksum | sed 's/ //g')"
# if another flag generated the same code, we don't need to benchmark,
# we can just re-use the results.
if test -n "${scores[checksum]}"; then
printf "optimizer %s code cached\n" ${optimizers[i]} 1>&2
else
# no luck, we need to run it.
if ! results=$(../run/john --test=$testtime --format=${3}); then
# crashes, or doesn't pass test.
continue;
fi
# cache the scores.
scores[checksum]="${results}"
fi
# output score.
printf "%s %s\n" "${scores[checksum]}" "${optimizers[i]}"
done
With #pragma:
$ ../run/john --format=raw-sha1 -test=30
Benchmarking: Raw SHA-1 [128/128 SSE2 intrinsics 4x]... DONE
Raw: 14214K c/s real, 14271K c/s virtual
Without:
$ ../run/john --format=raw-sha1 -test=30
Benchmarking: Raw SHA-1 [128/128 SSE2 intrinsics 4x]... DONE
Raw: 14141K c/s real, 14178K c/s virtual
So..not earth-shattering, but worth a pragma imo.
Tavis.
--
-------------------------------------
taviso@...xchg8b.com | pgp encrypted mail preferred
-------------------------------------------------------
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.