Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 27 Sep 2014 20:27:28 +0100
From: Steve Jones <trevd1234@...il.com>
To: oss-security@...ts.openwall.com
Subject: Re: Fwd: Non-upstream patches for bash

Hi There

I've been meaning to post all day. After looking at the code with the
intention of fixing I can say in my opinion the Parse is 70% of the
problem , the Core shell language grammar is 15% and the bashes
bashism and it need to allow things like redirection at any position
and other fun things is the final 15%

The Shell language Grammar is way to ambiguous this is created as a
result of allow multiple tokens for 1 action ; which are repurposed
without an unambiguous termination of the previous statement  or to
put it another way the Parser  can't count and Grammar lacks
consistency and to me seems to me missing an explicit fucking
terminator ..

Even from a non security view point it is f**ked

This    " function t2(){ sl } }; } "  should not be  a value statement
block and if it is I'm never writing another shell script again.

A little more on the parser - It's not robust at all and is also
lacking a thing like basic sanity checking. Also seems parsing
strategy does not seem suitable for the complexes and ambiguity of the
Grammar and as a result doesn't stand a chance.

I certainly not a breaker and just a developer with a security minded
bent but even I was able to force arbitrary memory in another process
address space just from random fuzzing alone ....


Couple of things the wrap :

For anyone rusty with thatt context free grammar terminalogy
:http://pages.cs.wisc.edu/~fischer/cs536.s08/course.hold/html/NOTES/3.CFG.html

Don't worry though as the documentation just define it's own terms

DEFINITIONS
       The following definitions are used throughout the rest of this document.
       blank  A space or tab.
       word   A sequence of characters considered as a single unit by
the shell.  Also known as a token.
       name   A word consisting only of alphanumeric characters and
underscores, and beginning with an alphabetic character or an
underscore.  Also referred to as an identifier.
       metacharacter
              A character that, when unquoted, separates words.  One
of the following:
              |  & ; ( ) < > space tab
       control operator
              A token that performs a control function.  It is one of
the following symbols:
              || & && ; ;; ( ) | |& <newline>


I'm sure some of you folks may have notice that whitespace has amazing
properties and the difference between    { list;} and    { list} and
 { list; }
So here's a "don't this somethings something mught break"

              list is simply executed in the current shell
environment.  list must be terminated with a newline or semicolon.
This is known as a group command.  The return status is the exit
status  of  list.   Note
              that  unlike  the metacharacters ( and ), { and } are
reserved words and must occur where a reserved word is permitted to be
recognized.  Since they do not cause a word break, they must be
separated from
              list by whitespace or another shell metacharacter.


This is ambiguous token reuse ( many other examples about )
 A list is a sequence of one or more pipelines separated by one of the
operators ;, &, &&, or ||, and optionally terminated by one of ;  &,or
<newline>.
The use of the word optionally is incorrect and should be replace with "must be"

A Grep to run in the bash source code directory. This show all the
areas that the developer was confused  though there

grep -B4 -niR " xxx " --exclude-dir=doc

variables.c is troubling

variables.c:4331:    stupidly_hack_special_variables (var->name); /* XXX */

and another choice one :
braces.c:423:      QUIT; /* XXX - memory leak here */

Finally : Bash is everywhere - not only being used as the shell
interpreter but in the form of libbash which needs to be check to see
if they reused the parsed.
This is both statically  and dynamically linked

Executables and libraries are also not adverse to calling bash via an
execv . I freely speculate the some bashes are just name sh

You could start with a scan of every shell script
 find / -type f -iname "*.sh" -exec grep bash -lh {} \;

In summary Bash is screwed  .Using an alternative can be as simple
install one or "impossible " du the bashism
I Maybe more productive less distruptive the make use a new iparser
from an existing project the GPLv3 Adds legal compatibility with
Apache2
More on this later perhaps?

Thanks for reading folks
Trfevd


Apologies if lines are .. using a webclient :(




On 27 September 2014 16:06, Solar Designer <solar@...nwall.com> wrote:
> On Sat, Sep 27, 2014 at 03:26:01PM +0200, Roman Drahtmueller wrote:
>> By way of exposing the parser to potentionally harmful content: Is the
>> importing of functions the only occasion, or are there more than this?
>
> That's a great question.  This aspect is arguably more important than
> individual parsing bugs, in part because distros are already adopting
> Florian's prefix/suffix patch turning parser bugs on function imports
> into non-security issues.
>
> Has anyone started reviewing bash for possible other code paths where
> untrusted input may hit the parser?
>
> Of course, what input is trusted vs. not may be unclear.  Apparently, 20
> years ago bash developers considered all env vars to be trusted input,
> regardless of the names, which is how we got here.
>
> Are bash scripts themselves exclusively trusted input, or should we
> assume that portions of them (which?) may be untrusted (e.g., for
> scripts generated by other programs, with some user input substituted
> into them)?  Clearly, it makes no sense to treat scripts as untrusted in
> their entirety - the very purpose of bash is to do a wide variety of
> things based on script contents - but maybe some individual tokens, etc.
> within scripts may reasonably (and thus should?) be treated as untrusted
> (to the extent possible within bash script syntax specs).
>
> For example, what if a DHCP client sanitizes some input field and then
> embeds it in a generated script?  That's risky design, yet bash could
> try to be robust when faced with scripts like that.  Ideally, it should
> behave only as specified, with no extra "features" available e.g. via
> syntactically correct yet overly long tokens, etc.
>
> Perhaps this boils down to the parser's robustness in general: treating
> whatever we can (even within scripts) as untrusted input is the same as
> having the most robust parser.  This is why I wrote "arguably" in the
> first paragraph above.
>
> Now, is it realistic to make bash's parser so robust by finding and
> patching individual bugs?  I doubt it.  We should find and patch the
> bugs, but perhaps we shouldn't declare bash's parser robust, and perhaps
> we shouldn't treat bash issues triggerable via untrusted script contents
> as security issues.  Perhaps we should instead declare bash unsafe to
> use on scripts containing any untrusted input in them, and focus on
> treating inputs to such scripts (env vars and command line) safely.
>
> This also means that we should treat any programs that generate bash
> scripts with (sanitized) untrusted input in them as unsafe, and patch
> those to use safer mechanisms to pass (sanitized) inputs to scripts
> (preferably use env vars with fixed names).
>
> Comments?
>
> Alexander

Powered by blists - more mailing lists

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.