Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 6 Jan 2013 11:32:07 +0100
From: Frank Dittrich <frank_dittrich@...mail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Markov UTF-8 magic

Hi magnum,

I wasn't fully awake (not enough coffee) when I sent my previous mail.
I hope you can still parse most of it.

Creating a really good UTF-8 validity checker is even somewhat more
complicated, since you have to exclude illegal overlong sequences as
well as invalid Unicode code points.

See the discussion here (just one example):
http://stackoverflow.com/questions/1031645/how-to-detect-utf-8-in-plain-c

BTW: Here's a perl expression which checks for valid UTF-8, just in case
we'll need one:
http://www.w3.org/International/questions/qa-forms-utf-8

May be we should google for a well-tested free C implementation which we
can use.

Frank

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ