Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sat, 4 Jun 2011 20:42:31 +0400
From: Solar Designer <>
Subject: unique


You probably expected that, but JFYI: unique in jumbo-5 became about 3%
slower (with default settings) than it was in 1.7.7 official and -jumbo-1.

I attribute this to the changes in line_hash().  Perhaps the constant
shift counts and masks were faster than the variable ones are.

As to optimizing this, you can easily pre-compute "vUNIQUE_HASH_SIZE - 1"
and "vUNIQUE_HASH_LOG / 2".  With constants, these were computed at
compile-time, but now you actually have those extra operations performed
at runtime, right inside the hash function.  My guesstimate is that this
will help a little bit, but not let us fully regain the lost 3%, because
the variable shift counts will remain.  So maybe we need a specialized
version of line_hash() for the default settings.  Then we'd need to call
it via a function pointer, though, unless we also introduce specialized
versions of the caller functions' loops, which feels like too much code
duplication to be worth it.  Hopefully, the function pointer overhead
will be under 1%.

...Oh, you also have cut_len and LM checks inside the per-line loops in
read_buffer() and clean_buffer(), which is probably responsible for part
of the slowdown.  And the checks against "vUNIQUE_BUFFER_SIZE -
sizeof(line) - 8" are probably slower than checks against a compile-time
computed constant were.

Here's a marketing workaround: double the default memory usage by unique
when the jumbo patch is applied (change UNIQUE_HASH_LOG from 20 to 21,
UNIQUE_BUFFER_SIZE from 0x4000000 to 0x8000000).  This will compensate
for slower code when running on large files.  In fact, -mem=21 results
in a 6% speedup over -jumbo-1 when running on all.lst (44 MB),
presumably due to the larger hash table (fewer collisions).

I am not complaining.  I am actually grateful for all your work.  I am
just documenting my findings and thoughts on the matter.



Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.