Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sat, 16 Jul 2011 01:55:02 +0400
From: Solar Designer <>
Subject: Re: memory usage question, caused by new Unicode-casing

Jim -

Thanks for bringing this up.

On Fri, Jul 15, 2011 at 10:35:16AM -0500, JFoug wrote:
> UTF16 utc2_upcase[0x10000];
> UTF16 utc2_downcase[0x10000];

What does "utc2" stand for?

> Now for the question.  utc2_up[down]case[] arrays require 256K of  heap.

Not heap, but .bss, although the effect is similar.

> Is 256K of heap a problem, that should be changed if running in --save-mem 
> mode?

I think 256 KB is not too much of a problem, especially not in -jumbo
and not with OpenMP builds where we have some other arrays in .bss that
are of comparable size.

When you don't actually write to those memory pages, then no virtual
memory is allocated for them (but address space is, and it is counted
against certain limits the OS or sysadmin might impose).

You say that the arrays are sparse - if so, and if zero reads from many
elements are expected, you may choose to not initialize those regions,
relying on the read-as-zero property of everything uninitialized in .bss.
This saves a little bit of memory.

You may also skip initialization of these altogether when a given
invocation of John does not need them.

> IF SO, then there will likely have to be substantial changes made to make 
> it 'work' properly, if in --save-mem mode, while at the same time, trying 
> to preserve the speed within the 'normal' mode.   We 'could' fall back to 
> only handling 'a'-'z' casing even in -utf8 if in --save-mem, but that 
> pretty much nueters certain formats.

I think that having this depend on --save-mem would be confusing.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.