[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 16 Sep 2015 02:43:42 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Judy array
On 2015-09-16 01:09, Solar Designer wrote:
> On Wed, Sep 16, 2015 at 12:43:44AM +0200, magnum wrote:
>> Also, I don't observe any gain from disabling mmap and only minimal gain
>> from using --mem=0 when mmap is enabled (I stopped using -mem after mmap
>> was implemented).
>
> What does it mean when with mmap I am getting "Each node loaded 1/8 of
> wordfile to memory (about 15 MB/node)"? Doesn't mmap imply that each
> node has the full wordlist mapped into its address space?
>
> In fact, without mmap I am getting "Each node loaded the whole wordfile
> to memory". Doesn't not using mmap enable easy and efficient loading of
> portions of the wordlist into each node's memory?
>
> This looks backwards to me. Can you explain?
It is backwards for sure. It grew organically. I'll be stating a few
obvious (for you) things below, to explain for a broader audience.
Before mmap and MPI/fork, we would either just fgetl() each line, or use
a memory buffer. The latter would load the whole file into a contiguous
buffer once, and then modify that buffer (eg. replace \n with null).
Also, index pointers was set up to point to each word. So we could
immediately get word number 12345 using a pointer to it. This was mostly
meant for -rules but that initial load proved to be faster even without
rules IIRC. So far, things were pretty sane.
Then, with MPI, came some messy code that could do the above but only
for "my words" for a multi-node run. That was implemented on a leap-frog
(or should I say round-robin) basis, so we wouldn't end up with 200,000
short words for one node, and 40,000 long words for an other. But it
also had to take into account edge cases like "just a few words, and a
humongous number of rules" or vice versa. From this point it went downhill.
Then I implemented mmap and dropped that other buffer for a while. The
beauty of mmap is it's shared between processes (and not just forks but
any processes that use the same files) and I was hoping to do without
the other buffer. But unfortunately our mapped memory is read-only... so
we can't prepare it and just point to ready-to-use words. Instead, I
implemented an "mgetl()" that works just like fgetl() but reads from the
mmap instead of the file. BTW it's SIMD capable (using our pseudo
intrinsics), pretty damn fast scanning for next newline. It's nearly as
fast as the old mem buffer, much more straightforward and potentially
uses much less memory, BUT we can't suppress dupes. Loopback mode
*really* needs dupe suppression. So I re-enabled the simpler (whole
wordlist) version of memory buffer on top of the mmap but it's really
mostly meant for loopback.
Oh, and there's also encodings... if we do use the memory buffer and
need re-encoding, we obviously only do that once, when preparing. I
can't even remeber all details. This is by far the messiest source file
throughout the Jumbo tree. It's just that everything works pretty good
and pretty fast, so I'm a bit afraid of touching it.
But what we should do, is completely separate loopback mode from
wordlist mode. Loopback mode should be it's own code. Then we should
simplify wordlist mode, eg. drop support for full dupe suppression and
some other crazy things.
BTW another idea is to load, prepare and index a (non-mmap) buffer
before forking. If/when we're re-writing wordlist.c, we really should
set the goals beforehand... and stick to them.
magnum
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux -
Powered by OpenVZ