Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 5 Aug 2013 05:13:22 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: musl@...ts.openwall.com
Subject: Re: iconv Korean and Traditional Chinese research so far

* Harald Becker <ralda@....de> [2013-08-05 03:24:52 +0200]:
> iconv then shall:
> - look for some fixed charsets like ASCII, Latin-1, UTF-8, etc.
> - search table of with libc linked charsets
> - search table of with the program linked charsets
> - search for charset on external search path

sounds like a lot of extra management cost
(for libc, application writer and user as well)

it would be nice if the compiler could figure out
at build time (eg with lto) which tables are used
but i guess charsets often only known at runtime

> [Addendum after thinking a bit more: The byte code conversion
> files shall exist of a small statical header, followed by the
> byte code program. The header shall contain the charset name,
> version of required virtual machine and length of byte code. So
> you need only add all such conversion files to a big array of
> bytes and add a Null header to mark the end of table. Then you
> only need the start of the array and you are able to search
> through for a specific charset. The iconv function in libc
> contains a definition for an "unsigned char const
> *iconv_user_charsets = NULL;", which is linked in, when the user
> does not provide it's own definition. So iconv can search all
> linked in charset definitions, and need no code changes. Really
> simple configuration to select charsets to build in.]
> 

yes that can work, but it's a musl specific hack
that the application programmer need to take care of

> > if the format changes then dynamic linking is
> > problematic as well: you cannot update libc
> > in a single atomic operation
> 
> The byte code shall be independent of dynamic linking. The
> conversion files are only streams of bytes, which shall also be
> architecture independent. So you do only need to update the
> conversion files if the virtual machine definition of iconv has
> been changed (shall not be done much). External files may be read
> into malloc-ed buffers or mmap-ed, not linked in by the
> dynamical linker.
> 

that does not solve the format change problem
you cannot update libc without race
(unless you first replace the .so which supports
the old format as well as the new one, but then
libc has to support all previous formats)

it's probably easy to design a fixed format to
avoid this

it seems somewhat similar to the timezone problem
ecxept zoneinfo is maintained outside of libc so
there is not much choice, but there are the same
issues: updating it should be done carefully,
setuid programs must be handled specially etc

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.