tlsify - Re: Introduction & summary of tlsify discussions, part 2

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150928164139.GA17773@brightrain.aerifal.cx>
Date: Mon, 28 Sep 2015 12:41:39 -0400
From: Rich Felker <dalias@...c.org>
To: tlsify@...ts.openwall.com
Subject: Re: Introduction & summary of tlsify discussions, part 2

On Mon, Sep 28, 2015 at 06:01:56PM +0200, Joakim Sindholt wrote:
> On Sun, 2015-09-27 at 22:20 -0400, Rich Felker wrote:
> > The following are excerpts from notes by Daniel Kahn Gillmor (dkg),
> > who was part of the CII-Madrid tlsify discussions, originally sent to
> > me by email before the tlsify list was setup, and my replies. Reposted
> > with permission. The original notes from the workshop were a lot more
> > sparse than the expanded version I just sent to the tlsify list, so
> > some of the questions below are probably already answered, but I think
> > it's still useful discussion.
> 
> I do have some concerns based on this, mostly performance related. It
> should be no secret that I think this should be in the kernel. Please
> keep that in mind when reading my opinions.

Thanks for the feedback. I didn't go into the whole kernel topic yet
because I'd written enough already, and it's still a ways off I think.
In brief, the phased discussed (in both the musl community and at
CII-Madrid) are roughly:

Phase 1: API development and implementations doing everything in a
userspace in the tlsify child process, using whatever existing TLS
backends make sense (they're isolated in the child, so library-safety
issues and other implementation warts don't matter so much).

Phase 2: Produce TLS backend code for use by the tlsify process that
makes proper use of shared text and minimizes libraries so that
exec'ing it is sufficiently fast and light to be practical for many
real-world loads.

Phase 3: Develop a mechanism for handing off the symmetric crypto to
the kernel. Session management would still need userspace help. This
phase is not well-defined at this point, but I would like to keep the
external-command API as the preferred way of setting this up for
simple apps without huge scalability requirements but also have a
library framework for using kernel-side TLS help.

> > On Fri, Jul 17, 2015 at 09:18:21PM +0200, Daniel Kahn Gillmor wrote:
> > > Peer Cert Verification
> > > ----------------------
> > > 
> > > references for specific examples of SRV-ID:
> > > 
> > > RFC 6120 specifies the use of SRV-IDs in X.509 certs for XMPP.
> > > 
> > > RFC 6764 specifies the use of SRV-IDs in X.509 certs for
> > > WebDAV/CalDAV/CardDAV.
> > > 
> > > the upcoming tzdist draft should also use SRV-IDs, iirc.
> > > 
> > > Also, the list of initialization/configuration parameters doesn't
> > > include any mention of a selected list of root CAs or any additional
> > > constraints on peer cert verification.  Is the assumption that every
> > > tlsify parent should be willing to accept the same set of root CAs?
> > > Here's an example where i think that might not be the case: consider an
> > > operating system installer that wants to fetch data from the public web
> > > (e.g. to show the user some news feed to read during installation), but
> > > also wants to fetch software packages and sensitive configuration
> > > information from a repository that it knows is certified by an
> > > organizational CA X.  The second connection would ideally *only* accept
> > > connections from the organizational CA,
> > 
> > You're completely right that this was an omission. The set of root CAs
> > is certainly an input, but most callers will probably want to use a
> > default set or a named set of some sort. The exact nature such "named
> > sets" might have is unclear to me, but could allow an application to
> > say something like "no locally-added root CAs" or "restricted root CA
> > set based on zero-tolerance for breaches of trust". Obviously directly
> > providing a single organizational CA is another usage case, one which
> > should be a lot easier to deal with.
> 
> On the topic of CAs, I think OpenSSL is the implementation that stores
> symlinks named with either a hash or signature, not sure which. It does
> this to avoid loading several hundreds of certs when it really only
> wants to check against one, and it can look that one up with nothing but
> open(2).
> 
> Keep this in mind further down.
> 
> > > session resumption
> > > ------------------
> > > [...]
> > Do you think it's prohibitive (from a usability standpoint or
> > otherwise) to just have the caller be responsible for getting the
> > session token and passing it in when making new sessions if it wants
> > to use session resuming? I suppose it's a matter of whether the caller
> > mishandling the token is a greater or lesser risk than the tlsify
> > implementation mishandling it. Whichever approach we take, simply not
> > using session resuming by default is probably the safest, most correct
> > approach. (And with keepalive connections, I'm skeptical that resuming
> > even has much value for https.)
> 
> I, personally, would rather see this handled transparently in tlsify.
> Doing so would significantly lessen the amount of code necessary to
> write a TLS server that properly supports resumption.
> >From a security standpoint it seems more sound to have tlsify handle it
> internally. This of course hinges on having at least all affected
> connections from one parent process connected through one child process.

Is supporting resumption something we really want to be encouraging?
My impression is that it has negative impact on forward secrecy and
little benefit for https (where keepalive/persistent connections
achieve many of the same goals) but I don't by any means consider
myself an expert on this topic.

In general, in cases where supporting a feature has purely negative
impact on security and is not a hard requirement for usage cases, my
leaning would be towards not supporting it. But I'm very open to
discussion on this topic.

> > > multiplexing
> > > ------------
> > > 
> > > fork/exec is an expensive step, esp. for complex code that needs to load
> > > dynamic libraries/etc.
> > > 
> > > What if a tlsify parent process could ask an existing tlsify child
> > > process (via the control channel, i guess) to "tlsify" another file
> > > descriptor?  Many OSes have fd-passing capabilities across sockets these
> > > days.  it seems like the tlsify child process in this case would be able
> > > to handle multiple sockets in its select loop without a problem, and you
> > > could avoid the fork/exec overhead for all connections but the first.
> > > The simple one-fork-per-connection arrangement would still work for
> > > tlsify parents that prefer simplicity (don't want to deal with a
> > > control channel) and are willing to accept the performance hit.
> > 
> > This is a large part of why I want the "phase 2" to take place: fixing
> > the internal TLS implementation not to have so much ridiculous runtime
> > overhead. Using static const tables mapped from the executable file on
> > disk rather than dynamically initializing tables used for crypto can
> > theoretically get the size of the process down to a few pages. The
> > base time for posix_spawn is roughly 250-500 us even on slower
> > systems, which is negligible in comparison to TLS connection setup if
> > I'm not mistaken. So while I think there's possibly some value to
> > using a shared process, I question whether it's worth the complexity;
> > efforts might be better spent just making the process-per-connection
> > more efficient, at least early on.
> > 
> > > If we were able to do this, we'd need to be able to map the
> > > initialization/configuration options to something that could be sent
> > > over the control channel as well, right?
> > 
> > Yes, that would be needed.
> > 
> > > having this multiplexed arrangment also makes it possible for the tlsify
> > > child to have a session-resumption database that stays in RAM (though it
> > > would go away when the tlsify process terminates).  it does raise a
> > > question of when the tlsify process should shut itself down, though,
> > > since it is no longer responsible for only a single connection.  Maybe
> > > it could shut itself down when the control channel is closed?
> > 
> > Yes, that would be reasonable behavior I think.
> > 
> > > More radically, with this arrangement it's conceivable that you could
> > > have a single tlsify process that runs in the background and performs
> > > this work for a number of clients.  this offers a nice, easy way to do
> > > privsep (no process needs to drop privileges, they just need to be able
> > > to talk to the tlsify peer process's control socket).  Maybe that's too
> > > fancy?
> > 
> > This has been considered before, with no definitive conclusion
> > reached.
> 
> So, after mulling it over for a bit, I have a few concerns depending on
> what kind of implementation is chosen.
> 
> All connections create a new tlsify process:
> * How do you find the installed tlsify? Relevant in case of static
>   linking.

Either via $PATH, tlsify-specific env vars, or hard-coded. My leaning
would be towards $PATH since I don't like hard-coded fs layout and
having bad things in $PATH is a serious PEBKAC security error anyway.

> * 500µs startup time gives you 2000 key exchanges per second. While
>   nginx posted benchmarks showing around 350 poorly defined negotiations
>   per core per second[1]. By no means is that negligible overhead.

Absolutely. I don't think the model is really appropriate for
high-load servers with large numbers of transient connections, but it
may be reasonable to support an extension where the child process
handles multiple TLS sessions for its caller (all in one process) and
still get lots of the same benefits. I say an "extension" because
implementing this should not be mandatory for tlsify API
implementations and callers should be able to fallback to
process-per-session if it's not implemented.

Going back to the big picture, the problem I see tlsify as solving is
that the current approaches to TLS are all tailored towards highly
engineered applications intended to scale to large numbers of
connections, and don't fit well with simple client applications that
just need TLS for privacy/authentication/etc. The early adopters I
have in mind are things like:

- Git
- Downloaders (wget-like)
- Chat clients and servers (IRC, etc.)
- Light REST API clients for services
- Mail services

> * Having one process per connection but still polling seems like kind of
>   a waste. Might as well have two threads in it to send and recv
>   asynchronously. Save some syscalls, parallelize, all that jazz.

This is on the other side of the API boundary, so there's no reason a
tlsify implementation couldn't just use two threads like that. It
probably wouldn't even add any startup latency if you create the
second thread after sending the initial handshake while waiting for a
reply.

> * It will undoubtedly waste an awful amount of time looking for and
>   parsing certs in the CA folder if it's a client.

This sounds like an important problem we need to solve.

> * And my main concern: this will be painful to integrate into existing
>   applications. The goal here should be to replace the current model
>   and requiring all new users to go through all their code, find all
>   instances where they create new fds and ensure they set CLOEXEC on all
>   of them is a big blocker. I don't consider enumerating and closing all
>   1100 open fds to be an acceptable solution to this. Imagine doing that
>   5000 times per second.

Missing close-on-exec is an issue, but it's only a race condition in
multi-threaded applications. In others, the fd leaks either
always-happen or never-happen, and if they always-happen, it's easy to
find and fix them. It would be wonderful if there were some global
solution to this problem like close-on-exec-by-default, but sadly that
ship already sailed a long time ago...

Do you have any ideas for avoiding this?

> Parents have one tlsify child handling all connections:
> * This is only interesting for servers and suffers all the same problems
>   as the approach above.
> * Running it with the stdin/stdout pipes would effectively be a special
>   case of its intended mode of operation, unless you then call it
>   tlsifyd and have tlsify be a small shim around it.
> 
> One major tlsifyd and all users just connect to it:
> * Security nightmare - even moreso than the current method and in more
>   ways than one (file system permissions, probably more)
> * This would require two binaries, the latter of which will rarely be
>   used. That cannot possibly end well.
> 
> The last approach seems to have a far better potential for max
> performance. You keep the whole CA cert pool in one place and you can
> use session resuming across worker processes with zero issue.
> 
> The middle approach is a good midway station for servers but offers
> nothing but some extra potential for screw-ups in clients.
> 
> The first approach is undoubtedly my favorite but it does have some
> serious performance considerations that are vitally important when it
> comes to servers and longrunning clients doing many connections. I would
> like to see solutions to these problems rather than compromise on the
> process model.

Ultimately I think we have to accept that we're sacrificing some of
the high-performance and broken-code-friendly options for the sake of
something that's much more secure and easy to integrate with clean and
simple applications. I don't have any delusions that tlsify is going
to displace direct in-process library usage for huge servers, but it's
able to solve a problem that presently has no solution, and if we get
to phase 3 (the kernel stuff) that could very well open a path to much
greater performance for high-load https servers.

> > > control channel formats
> > > -----------------------
> > > 
> > > this seems like the big question: how can we structure this in a
> > > friendly/simple way without needing to pull in json libraries or
> > > other scary/dangerous/complex parsers.  My big fear is that there will
> > > be control channel data that is large and possibly complex.
> > > 
> > > examples of large data: full certificate chains -- these can be quite
> > > long.  i don't think there is a functional limit on their length,
> > > actually, and the size of any one certificate can be huge.  Similarly
> > > large are the hints provided by servers during CertificateRequest
> > > messages.
> > > 
> > > examples of complex data: session initialization/configuration is
> > > starting to sound possibly complex, though if it's just command-line
> > > arguments, we should be able to get away with line-at-a-time reads
> > > (maybe with embedded NULs between args so that the same command-line
> > > parser can be used?).  what do you think is the most complex part you've
> > > seen?
> > 
> > Embedded NULs aren't possible; arguments to exec are C strings. Of
> > course it's possible to have multiple arguments. But since the command
> > line is usually public (via ps, etc.) any potentially-private input
> > needs to be via the environment or an active control-channel.
> > 
> > I'm really not sure what the most complex part is. I think we need to
> > get some examples going with a complete inventory of inputs that would
> > need to be passed.
> 
> I don't really see a need to make it particularly complicated. Pass
> certs in PEM format over the ctl line. PEM is plain text and used
> absolutely everywhere. This has the added benefit of making it slightly
> easier to introspect for debugging purposes.
> 
> What other data is there that isn't plain text? I can't think of any.
> For the love of all that is good, don't start with the JSON
> serialization.
> 
> I assume this is going to be handled by a very tiny libtlsify that you'd
> link in (always statically?) so it should be some manner of extensible
> while also remaining backwards compatible. I fear the day I get the
> "your tlsify is too new, please downgrade" message.

I started to write somewhere, but I forget whether I actually followed
up on this -- I think it would be nice to have some canonical
reference code for how to invoke tlsify as long as it's not mandatory
to use (i.e. not the API boundary). But for many simple apps (think
Busybox wget) it's just going to be a few lines of C you'd write
inline. Script langs that automate opening a bidirectional IO channel
to a child process will also have it very easy.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.