Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 14 Aug 2015 08:04:38 +0200
From: Frank Dittrich <frank.dittrich@...lbox.org>
To: john-dev@...ts.openwall.com
Subject: Re: episerver UTF-8

On 08/14/2015 03:07 AM, jfoug@....net wrote:
> On Thu, 13 Aug 2015 19:35:57 -0500, Lei Zhang <zhanglei.april@...il.com>
> wrote:
>> BTW, I think 3*PLAINTEXT_LENGTH means that we assume
> 
> Yes, this is an 'assumption'
> 
>> each UTF8 char to be no larger than 3 bytes. Is that assumption true?
>> Or 4-byte UTF8 chars are too rare to be considered?
> 
> In real world, they are somewhat rare.  But your point is valid.  There
> could certainly be a string of X 4 byte utf8 (there are even 5 byte utf8
> characters) which cause something that should handle 25 characters to
> not be able to handle a string of 25 4 (or 5) byte utf8. But we simply
> have drawn a line in the sand where reality vs theoretical limits come
> into play.

For applications that use UTF-16 with surrogates internally, the above
assumption is OK. If you enter characters that require more than tree
bytes when converted to utf-8, the max. number of characters will be
reduced accordingly.

Frank

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.