Date: Sun, 31 Jul 2011 17:03:26 -0500 From: "jfoug" <jfoug@....net> To: <john-dev@...ts.openwall.com> Subject: Character casing question for U+0131 This is asked to the list as a whole, in hopes that someone will have the answer. I am hoping this can be easily answered, and then all formats can conform to a single method. Ok, in cp850, character at 0xD5 is U+0131. This is the undotted lower case 'I' character. Now, in Unicode, this character DOES upcase. It upcases to normal cap I. Thus, this is a non-circular upcase. In cp850, lc(uc(char(0xD5))) != char(0xD5), but instead == char(0x69) If this is the proper behavior, then what I have right now, in Unicode.c and rules.c is correct. I handle cp850 differently, as after the normal building of the upcase/downcase set of arrays, I change one element in the upcase array, to handle character 0xD5 upcasing into character 0x49. This works great, AS LONG AS the actual formats, and or OS code page logic works that way (for cp850). Hopefully someone has the answer to this. Jim. Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.