Date: Sat, 19 Sep 2015 22:28:11 -0400 From: Rich Felker <dalias@...c.org> To: oss-security@...ts.openwall.com Subject: Re: s/party/hack like it's 1999 On Sun, Sep 20, 2015 at 02:34:15AM +0300, Solar Designer wrote: > On Thu, Sep 17, 2015 at 12:33:28PM -0430, Manuel Gomez wrote: > > There is absolutely nothing wrong with `head`, `tail`, `more`, `curl`, > > `wget` or `diff`. > > I agree that Federico's examples show nothing wrong with these tools. > > However, out of these tools, I think we should test curl and wget for > their handling of metadata such as filenames and HTTP responses when > printing them (likely) to the terminal. Federico's examples do not test > this (they explicitly request the remote file's content to be printed, > so having it printed verbatim and interpreted by the terminal, if any, > is expected behavior). > > In processing of metadata, I think such tools that are commonly run on a > terminal should prevent character codes in the typical controls ranges > (ranges C0 and C1, and DEL character) from being sent to the terminal. > > https://en.wikipedia.org/wiki/C0_and_C1_control_codes > > What exactly such programs should do is debatable, though. For example, > the ps command from Linux procps prints question marks. Its detection > of control characters is locale and multibyte character aware, which > doesn't make me confident: it relies on libc and on locale data, neither > of which is directly related to a terminal one is using. It's also more They're supposed to match; if they don't, this is user error. It would be nice if we could just assume everything is UTF-8, but doing that would actually break one case: where the user has properly configured both their locale and terminal for a non-UTF-8 encoding, just assuming UTF-8 would happily let C1 characters through. So trusting the locale really is the right thing to do here, IMO. > complex (especially including libc and locale data), and hence poses a > higher risk of implementation bugs, than a direct check for C0 and C1 > ranges and DEL would have been. Maybe this complexity is a price to pay > for supporting arbitrary printable UTF-8, which includes codes in the C1 > range in continuation bytes. "The C1 range in continuation bytes" is a complex concept that needs to be explained. Traditionally, the way terminals supported character sets with printable characters in the "C1 range" was by having an option (separate from character encoding, which the terminal often did not even know or care about) to disable processing of C1 characters and treat them as printable. This worked, but it was the wrong model, and precluded use of C1 in UTF-8. The right way for a terminal to behave is to put the byte to character conversion step before the escape processing step. In this way, character sets like cp1252 or koi8-r that have printable characters in the "C1 range" naturally work just fine, because the bytes 80-9F _are not C1 character_ but rather bytes which correspond to other characters. Likewise, in UTF-8, the bytes 80-9F are not even characters at all, but the C1 characters do exist: they're represented by sequences C2 80 ... C2 9F, and when you perform the bytes to characters step first, you end up with U+0080 ... U+009F, which then perform their expected (and dangerous, as we will see) functions. It's easy to play with this on a UTF-8 terminal with the printf command, e.g.: printf '\xc2\x9b1mhello\xc2\x9b0m\n' to see what happens. At least on GNU screen, the C1 characters are processed by default, but can be disabled per-window with the "c1 off" command or globally for new windows with "defc1 off". I haven't widely tested other terminals, but at least my uuterm also processes UTF-8 C1 this way. > Perhaps we can pay a lower code complexity price by checking for a UTF-8 > locale and then validating the UTF-8 characters explicitly (assuming > that if a UTF-8 locale is chosen, the terminal is also set to UTF-8). > Maybe we need a generic code snippet or library of this sort? As long as you're following the locale, mbrtowc+iswprint should suffice. > Then, besides terminal escapes there are UTF-8 control characters: BOM, > LRM, RLM (any others?) > > https://en.wikipedia.org/wiki/Byte_order_mark > https://en.wikipedia.org/wiki/Left-to-right_mark > https://en.wikipedia.org/wiki/Right-to-left_mark I don't think bidi controls are a particularly high risk since most terminals I've used fail to support them properly, but this could change (or maybe already has changed on some of the more desktop-environment-type terminals people use these days). This should probably be checked. > With UTF-8, it might be different how to s/party/hack/ now than in 1999. Solar and I just discussed this and I believe there's at least one interesting attack that's possible even when applications have validated that they have printable data. It involves interleaving of data from multiple writers. Consider the following example: Writer 1: "©" Writer 2: "Û1m" As bytes, these are: Writer 1: C2 A9 Writer 2: C3 9B 31 6D One possible interleaving (writes to terminals have _no_ atomicity at all) is: C3 C2 9B 31 6D A9 This of course contails illegal sequences. The standard practice for processing the above sequence of bytes is to drop or replace truncated or illegal sequences. The exact manner in which this is done varies, but since most software tries to minimize data loss in the case of dropped or corrupt bytes, the usual interpretation is: [illegal C3] [valid C2 9B] [valid 31] [valid 6D] [illegal A9] Regardless of how the illegal sequences are dropped/replaced, then, the characters in the middle are: U+009B U+0031 U+006D or: CSI '1' 'm' If C1 characters are processed, that put your terminal in bold mode. Note that all that was needed for this to happen was for a stray C2 byte from one writer to get injected just before the character-final 9B byte of a multibyte character from another writer. I specifically chose my example so that both writers output data which is well-formed and printable UTF-8, but that was not necessary. Since I see no reasonable application-side mitigation for this, I think the right recommendation should be disabling C1 control codes in terminal emulators, at least in UTF-8 mode, but preferably just across the board. AFAIK nothing is using them. They don't even work reliably across all terminal emulators; many users have C1 disabled from the old days where that was the right way to use certain legacy 8-bit encodings, and some UTF-8 terminal emulators probably don't even support them at all. Note that when considering disabling C1 controls in screen or tmux, it's important that the attaching terminal also has them disabled. Otherwise screen/tmux will treat them as printable and pass them through to be interpreted by the attaching terminal, which is potentially even more dangerous. It would be nice to see an option in screen/tmux not to treat C1 as printable but rather filter out these characters, so that users running everything in screen/tmux don't have to worry about potentially dangerous settings on the terminal they attach from. Rich
Powered by blists - more mailing lists
Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.