oss-security - Re: Thoughts on Shellshock and beyond

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 15 Oct 2014 02:10:41 +0800
From: Pavel Labushev <pavel.labushev@...box.no>
To: oss-security@...ts.openwall.com
Subject: Re: Thoughts on Shellshock and beyond

On Sun, 12 Oct 2014 13:24:10 +0200
Florian Weimer <fw@...eb.enyo.de> wrote:

> Haskell programmers typically do not have a background in formal
> semantics and proof automation.  Like with any language with a
> significant following, a fairly large portion of the practitioners
> follows a "whatever works" approach.

By "Haskell" I mean any technology, its scientific basis and the
other aspects altogether, that I think have the potential of
significantly shifting the paradigm. Haskell the language (not the
bright image if it ;) isn't even dependently typed out of the box and
doesn't provide any automated proof tools.  Anyway, I think that
concentrating on the particulars won't really help the discussion.

I know that many would disagree and say that the devil is in the
details. However, I would like to emphasize that there are the
other details that are obviously being underestimated and fall out of
sight - and that's the status quo I would disagree with.

Yes, there probably will always be some people, and maybe even the
majority of them, who follow some sort of "whatever works" approach. Yet
the actual degree of carelessness may vary substantially. Isn't it
kinda how the theory of broken windows works for them? Together with
the fact that in the current situation security remains a mostly (or
even the most) unmeasurable thing, no matter who wants to measure:
employee or employer, or anyone else.

But there are also people who actually use "Haskell" to solve nontrivial
problems which they wouldn't be able to even approach using the
mainstream technologies or anything as inconsistent and as impeding.
Why aren't we talking about the results of their work? Does anyone even
care?

As Wittgenstein said: "the limits of my language mean the limits of
my world". I'm convinced that it's a methodological mistake to think
that if "Haskell" would become mainstream, the people (or their
children, and we shouldn't forget about that!) would remain mostly the
same, regardless; that they wouldn't start thinking differently, create
new tools or enhance the existing ones, and that they wouldn't start
approaching the problems that they were effectively unable or unwilling
to solve or even recognize before.

I'm convinced that for the people who care, "Haskell" could make a
qualitative difference in how they spend their resources and what
results they achieve, especially when working together (explicitly and
knowingly or not). It's a road away from the world of manual labor in
the flat swamp of uncertainty, where security-conscious programmers
spend substantial amounts of their resources on tedious repetitive
tasks, that many other people ignore, such as reinventing string APIs
in C and then informally enforcing their use, to the more abstract
world of mostly-scientific approach, where expressing higher level
concepts and automated reasoning about complex properties is not only
practically possible and gets more efficient and convenient over time,
but even is encouraged for many reasons, including the ability to
create composable knowledge frameworks and parts and reuse them later
both for the greater good and commercial success.

Imagine you're writing a shell and decide to introduce the high level
distinguished concepts of code, data, data source, and derive the
concepts of trusted|untrusted data|code [source].

You do so *NOT* because:
- you know in advance that you can make the mistake the bash developers
did and want to prevent it (you don't know until someone finds it in
bash, and then it's too late)
- you think that your formally verified shell Will Be Provably Secure
after it's written, because your Specifications And Formal Proof Is
Comprehensive (just bullshit)
- you hope that it will become "really secure" eventually, in some
Distant Future, probably with the help of The Many Eyes of FLOSS who
will be Auditing your work, especially since it's Written In Haskell
(naive and proven wrong so many times)

You do so because you're a security- and reliability-conscious "Haskell"
programmer and so you realize that:
- any input data is untrusted by default
- separating code from data is generally a good thing, especially if it
costs not much
- you're a part of the system (in a good sense), an active and
conscious participant, and you know that the notion of those concepts
won't be erased from the implementation (thanks to the language and
tools), so the other people who will use your work could formally
reason about _their systems' properties_ and evaluate security of your
shell as a component _in that context_

Besides (ok, maybe not exactly in the context of writing a shell):
- you recognize this as an interesting intellectual challenge, even
with some practical benefits
- Maybe you'd like to use a couple of buzzwords for marketing, backed
up with some real stuff, especially if your competitors already can into
- you do so because everyone does (yep, in a Wonderful Distant Future)
- you don't feel like a self-proclaiming guru of the great empirical
knowledge and prefer a more scientific approach ;)
- there are some other reasons, such as participating in knowledge
accumulation and refinement, but they require much more text to explain

Now, EVEN *IF*:
- you didn't prove (and didn't want to) that your shell doesn't execute
untrusted code
- you allowed mixing untrusted and trusted data and made possible the
conversion from untrusted data to code
- you didn't restrict execution of untrusted code in any way
- you even want that twisted function export feature, even in a totally
unconstrained form

...then you still didn't screw it "just as easily". Why? Because you
have "proven", by formally expressing the state of affairs, that it's
possible to execute code form this particular untrusted data source.
You created knowledge that hasn't been erased by the lower level
implementation details, and even though you forgot or wasn't even going
to mention it in the README, is still can be used by the other people,
in the context of their analysis.

And by "their analysis" I mean that it's still possible:
- to automatically observe (other than just postulate) that the web
server (or OpenSSH, or whatever) does execute your shell
- to infer the knowledge from the source code of your shell, that the
environment variables are treated as a particular untrusted input data
source, and that some untrusted code potentially may be executed from
that source, or, in other words, that there aren't any evidence of the
opposite
- given the above conditions, to investigate if the web server may
actually pass any untrusted data in the environment variables, find out
that it indeed does, and conclude that the _system_ (not your shell) is
vulnerable to RCE
- to go and fix your shell or maybe some other parts of the system

A systematic approach. How about that.

I think, even if some relatively small part of the software would be
written that way, "in Haskell", that would bring us a choice where
there wasn't any. The lack of certainty that a particular shell doesn't
execute arbitrary code from environment variables doesn't mean
_anything_, because it doesn't make any difference, because we're
uncertain in the same way about pretty much every shell out there. And
in the case of bash, until just recently we didn't even know that we
didn't know. Once again. ;)

But if there would be at least one shell "written in Haskell" (I hope
now you know what I mean), then there is a choice between a lesser
uncertainty, with that "lesser" backed up with some science and
persistent reusable knowledge, and nearly a total one. I'm all for that
choice, even if it's not always as obvious. Besides, I'm convinced that
it would start bringing positive systematic changes, and this, I think,
is far more important than whether or not "Haskell" can help with any
particular issue.

> For example, there is no guarantee that a programmer in a type-rich
> language who implements web templating will give types to the
> templates (which can help to decouple coding an design work), or will
> implement auto-escaping functionality (which helps to prevent

There's no such guarantee, i.e. that it can't be misused, for pretty
much every tool out there. Besides, the programmer may decide whatever
he wants, but if he writes the templating engine in "Haskell", the lack
of evidence of presence of certain properties would be a criterion of
choice. A web templating engine "written in Haskell" that doesn't even
try to address the major security issues in a decent way? No, thanks.

> cross-site scripting vulnerabilities).  In fact, if the goal is to
> produce a provably correct templating library, it is likely that both
> features are left out because they are much more difficult to verify.

If the specification doesn't require the program to have some
properties, then again it's about (a degree of) uncertainty as a
criterion of choice. It's funny though, that you bring up this example.
Because many strongly typed languages has achieved much success in
preventing code injection vulnerabilites in web template engines and
database queries. Even though there are no formal proofs I'm aware of.
People just do that because they can, i.e. because the tools don't
constrain them too much.

> (Especially auto-escaping requires extensive domain knowledge which
> most web templating library authors lack.)

Also, the problem is that with the current mainstream technology it's
practically impossible to express such knowledge in a sound, abstract
and composable form, so that it could be transformed and reused
regardless of (but consistently with) the implementations' particulars.
Btw, many people don't even realize that it's possible or that it may be
actually useful. Their _tools_, together with the other factors, of
course, greatly affect not only how people work, but also how they
think.

> I don't have any.  A decade ago, I wanted to rewrite everything in
> Ada, but that feeling has passed.

Ada has its pros, but I think it's still not consistent enough to
make enough difference. SPARK is nice though and doesn't even require a
background in formal proof systems to start solving some real life
problems with it. Kinda shows that all this stuff is not for academia
only.

> Certainly there are Haskell libraries which do not have this problem.
> However, the related language features could have long fallen into
> obscurity, but this is not what happened.  There are quite a number of
> Haskell programmers who like these features, much like there are
> defenders of arcane shell features.  This affects how the Haskell code
> out there is being written.

Is being written, that's it. But it doesn't mean it would be written
that way no matter what and no matter by whom, or that Haskell as
a language is incapable of change in the right direction. It is an
experimental language, after all, and that factor works in both
directions.

> I don't think Haskell is a magic bullet.  I do think type-rich
> languages (and languages with memory safety) have a lot to offer, but
> writing secure software in them is still hard.

And I'm convinced that "Haskell", in a broader sense and together with
the other factors, is a part of a solution, capable of making a
qualitative change.

Content of type "application/pgp-signature" skipped
Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.