musl - Re: iconv Korean and Traditional Chinese research so far

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK4o1WxVD1A9j5b9x8G8y=5LFiR-UNOqjyHBe76vXj0yMHicgg@mail.gmail.com>
Date: Mon, 5 Aug 2013 09:24:37 +0100
From: Justin Cormack <justin@...cialbusservice.com>
To: musl@...ts.openwall.com
Subject: Re: iconv Korean and Traditional Chinese research so far

On 5 Aug 2013 08:53, "Harald Becker" <ralda@....de> wrote:
>
> Hi Rich !
>
> > iconv is not something that needs to be extensible. There is a
> > finite set of legacy encodings that's relevant to the world,
> > and their relevance is going to go down and down with time, not
> > up.
>
> Oh! So you consider Japanese, Chinese, Korean, etc. languages
> relevant for programs sitting on my machines? How can you decide
> this? Why being so ignorant and trying to write an standard
> conform library and then pick out a list of char sets of your
> choice which may be possible on iconv, neglecting wishes and
> need of any musl user.
>
> ... or in other words, if you really be this ignorant and
> insist on including those charsets fixed in musl, musl is never
> more for me :( ... I don't need to bring in any part of mine into
> musl, but I don't consider a lib usable for my needs, which
> include several char set files in statical build and neglects to
> load seldom used charset definitions from extern in any way.

They are not going to be "fixed" just don't build them. It is not hard with
Musl. Just add this into your build script.

One of the nice features of Musl is that it appeals to a broader audience
than just "embedded" so it is always going to have stuff you can cut out if
you want absolute minimalism but this means it will get wider usage.

Adding external files has many disadvantages to other people. If you don't
want these conversions external files do not help you.

Making software for more than one person involves compromises so please
calm down a bit. Use your own embedded build with the parts you don't need
omitted.

Justin

> >
> > > > Do I want to give users who have large volumes of legacy
> > > > text in their languages stored in these encodings the same
> > > > respect and dignity as users of other legacy encodings we
> > > > already support? Yes.
> > >
> > > Of course. I won't dictate others which conversions they want
> > > to use. I only hat to have plenty of conversion tables on my
> > > system when I really know I never use such kind of
> > > conversions.
> >
> > And your table for just Chinese is as large as all our tables
> > combined...
>
> How can you tell this. I don't think so. Such conversion codes
> may be very compact. Size is mainly required for translation
> tables, that is when code points of the char sets does not match
> Unicode character order, but you always need the space for those
> translations. The rest won't be much.
>
> > I agree you can make iconv smaller than musl's in the case
> > where _no_ legacy DBCS are installed. But if you have just one,
> > you'll be just as large or larger than musl with them all.
>
> ... musl with them all? I don't consider them smaller than an
> optimized byte code interpreter ... not when you are going to
> include DBCS char sets fixed into musl. At least if you do all
> the required translations.
>
> > compare the size of musl's tables to glibc's converters. I've
> > worked hard to make them as small as reasonably possible
> > without doing hideous hacks like decompression into an
> > in-memory buffer, which would actually increase bloat.
>
> Are you now going to build a lib for startup purpose and embedded
> systems only or are you trying to write a general purpose
> library? Including all those definitions in a statical build is
> definitely not the way I will ever like. This may be done for
> some special situations and selected char sets, but not for a
> general purpose library, claiming to get a wide usage.
>
> > If you have root or want to setup nonstandard environment
> > variables.
>
> What about a charset searchpath including something like
> "~/.local/share/charset". This would allow to install charset
> files in the users directory.
>
> > > interpreter allows to statical link in the conversion byte
> > > code programs.
> >
> > At several times the size of the current code/tables, and after
> > the user searches through the documentation to figure out how
> > to do it.
>
> You definitely consider to include all those code tables
> statically into musl? I won't include much more than some
> standard sets. Why don't you want to load the charset definitions
> as they are required?
>
> On one hand you say "use dietlibc" if you need small statical
> programs and on the other hand you want to include many charset
> definitions into a statical build to avoid dynamic loading of
> tables, required only on embedded systems.
>
> So what's the purpose of musl? I don't think you stay right here.
>
> > It's not just a matter of dropping in. You'd have path searches
> > to modify or disable, build options to get the static tables
> > turned on, and all of this stuff would have to be integrated
> > with the build system for what you're dropping it into.
>
> I don't see the required complexity. In fact I won't have a lib
> that includes several charset definitions in a statical build. I
> really like to have a directory with definition files for those
> char sets and don't see the complexity for this you proclamate.
>
> Inclusion in statical build is not more than selection of the
> charsets you want o be included statically. This selection is
> always required or you include all files , which I definitly
> neglect.
>
> > Complexity is never the solution. Honestly, I would take a 1mb
> > increase in binary size over this kind of complexity any day.
> > Thankfully, we don't have to make such a tradeoff.
>
> The only complexity which we has here is the complexity of
> charset translation. The rest is relatively simple.
>
> > Charsets are not added. The time of charsets is over. It should
> > have been over in 1992, when Pike and Thompson made them
> > obsolete, but it's really over now.
>
> So why are you adding Japanese, Chinese and Korean charsets to an
> iconv conversion in musl? Why not just using UTF-8? Whenever you
> use iconv you want the flexibility to do all required charset
> conversions. Which means you need to statically link in many
> charset definitions or you need to dynamically load what is
> required.
>
> > Then dynamic link it. If you want an extensible binary, you use
> > dynamic linking.
>
> Dynamic linking of mail client, ok and where go the charset
> definition files? Are they all packed into your libc.so? That is
> a very big file? Why do I need to have Asian language definition
> on my disk, when I do not want?
>
> It is your decision, but please state clear what purpose you are
> building musl. Here it looks you are mixing things and steping in
> a direction I will never like.
>
> --
> Rich

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.