|
Message-ID: <03f801cc52af$4c126490$e4372db0$@net> Date: Thu, 4 Aug 2011 09:03:33 -0500 From: "jfoug" <jfoug@....net> To: <john-dev@...ts.openwall.com> Subject: RE: issues with 1.7.8-jumbo-5 >-----Original Message----- >From: magnum [mailto:rawsmooth@...dband.net] >Sent: Thursday, August 04, 2011 8:14 AM >To: john-dev@...ts.openwall.com >Subject: Re: [john-dev] issues with 1.7.8-jumbo-5 > >On 2011-08-04 14:17, Solar Designer wrote: >> magnum - >> >> On Thu, Aug 04, 2011 at 01:28:55PM +0200, magnum wrote: >>> On 2011-08-04 13:13, magnum wrote: >>>> The bug is in --pipe >>> >>> And here it is: An assumption that average line length is at most 16: >>> >>> max_pipe_words = (db->options->max_wordfile_memory/16); >> >> Thank you for figuring this out! > >I was wrong though. Jim does the right thing but somewhere in this code >block there must be some kind of fence-post error. I will dig in and have a look. However, the size/16 is only for the 'max' count. If this is raised, then there will be far fewer words possible. 16 was chosen (15 byte PW's average), as I thought that to be a good 'average' size. If the words are far too small, then we use only a tiny part of the memory buffer. If the words are all very long, then we fill up the memory buffer, but do so in only a few words (thus wasting space in the word pointer). Since there is no way to know in advance, I make the above assumption. Now, the data structure, is a flat buffer, that has each null terminated word appended to all the previous (until we run out of space). There is also an array of pointers, which point to the start of each word. In the 'normal' wordfile, I know the size of buffer needed, read one time, and will put nulls in where the \n chars are (or \r\n \n\r, etc). I am pretty sure I walk that buffer twice, once to get a count of lines, then allocate the array of pointers to line start, then walk it a second time, putting in the nulls, and assigning pointers. For the --pipe, I allocate fixed sized buffer and fixed sized array of pointers. Then I load one line at a time, setting the pointer to this line properly. When I exhaust either the lines array of pointers, or the memory buffer, I stop loading this block, and use it. Simple as that. I will look deeper, and see where I missed something. But that was supposed to be how it worked. Jim.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.