musl - Re: iconv Korean and Traditional Chinese research so far

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130805095332.3b1be6e5@ralda.gmx.de>
Date: Mon, 5 Aug 2013 09:53:32 +0200
From: Harald Becker <ralda@....de>
Cc: musl@...ts.openwall.com, dalias@...ifal.cx
Subject: Re: iconv Korean and Traditional Chinese research so far

Hi Rich !

> iconv is not something that needs to be extensible. There is a
> finite set of legacy encodings that's relevant to the world,
> and their relevance is going to go down and down with time, not
> up.

Oh! So you consider Japanese, Chinese, Korean, etc. languages
relevant for programs sitting on my machines? How can you decide
this? Why being so ignorant and trying to write an standard
conform library and then pick out a list of char sets of your
choice which may be possible on iconv, neglecting wishes and
need of any musl user.

... or in other words, if you really be this ignorant and
insist on including those charsets fixed in musl, musl is never
more for me :( ... I don't need to bring in any part of mine into
musl, but I don't consider a lib usable for my needs, which
include several char set files in statical build and neglects to
load seldom used charset definitions from extern in any way.

> 
> > > Do I want to give users who have large volumes of legacy
> > > text in their languages stored in these encodings the same
> > > respect and dignity as users of other legacy encodings we
> > > already support? Yes.
> > 
> > Of course. I won't dictate others which conversions they want
> > to use. I only hat to have plenty of conversion tables on my
> > system when I really know I never use such kind of
> > conversions.
> 
> And your table for just Chinese is as large as all our tables
> combined...

How can you tell this. I don't think so. Such conversion codes
may be very compact. Size is mainly required for translation
tables, that is when code points of the char sets does not match
Unicode character order, but you always need the space for those
translations. The rest won't be much.

> I agree you can make iconv smaller than musl's in the case
> where _no_ legacy DBCS are installed. But if you have just one,
> you'll be just as large or larger than musl with them all.

... musl with them all? I don't consider them smaller than an
optimized byte code interpreter ... not when you are going to
include DBCS char sets fixed into musl. At least if you do all
the required translations.

> compare the size of musl's tables to glibc's converters. I've
> worked hard to make them as small as reasonably possible
> without doing hideous hacks like decompression into an
> in-memory buffer, which would actually increase bloat.

Are you now going to build a lib for startup purpose and embedded
systems only or are you trying to write a general purpose
library? Including all those definitions in a statical build is
definitely not the way I will ever like. This may be done for
some special situations and selected char sets, but not for a
general purpose library, claiming to get a wide usage.

> If you have root or want to setup nonstandard environment
> variables.

What about a charset searchpath including something like
"~/.local/share/charset". This would allow to install charset
files in the users directory.

> > interpreter allows to statical link in the conversion byte
> > code programs.
> 
> At several times the size of the current code/tables, and after
> the user searches through the documentation to figure out how
> to do it.

You definitely consider to include all those code tables
statically into musl? I won't include much more than some
standard sets. Why don't you want to load the charset definitions
as they are required?

On one hand you say "use dietlibc" if you need small statical
programs and on the other hand you want to include many charset
definitions into a statical build to avoid dynamic loading of
tables, required only on embedded systems.

So what's the purpose of musl? I don't think you stay right here.
 
> It's not just a matter of dropping in. You'd have path searches
> to modify or disable, build options to get the static tables
> turned on, and all of this stuff would have to be integrated
> with the build system for what you're dropping it into.

I don't see the required complexity. In fact I won't have a lib
that includes several charset definitions in a statical build. I
really like to have a directory with definition files for those
char sets and don't see the complexity for this you proclamate.

Inclusion in statical build is not more than selection of the
charsets you want o be included statically. This selection is
always required or you include all files , which I definitly
neglect.

> Complexity is never the solution. Honestly, I would take a 1mb
> increase in binary size over this kind of complexity any day.
> Thankfully, we don't have to make such a tradeoff.

The only complexity which we has here is the complexity of
charset translation. The rest is relatively simple.

> Charsets are not added. The time of charsets is over. It should
> have been over in 1992, when Pike and Thompson made them
> obsolete, but it's really over now.

So why are you adding Japanese, Chinese and Korean charsets to an
iconv conversion in musl? Why not just using UTF-8? Whenever you
use iconv you want the flexibility to do all required charset
conversions. Which means you need to statically link in many
charset definitions or you need to dynamically load what is
required.

> Then dynamic link it. If you want an extensible binary, you use
> dynamic linking.

Dynamic linking of mail client, ok and where go the charset
definition files? Are they all packed into your libc.so? That is
a very big file? Why do I need to have Asian language definition
on my disk, when I do not want?

It is your decision, but please state clear what purpose you are
building musl. Here it looks you are mixing things and steping in
a direction I will never like.

--
Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.