tlsify - Re: Introduction & summary of tlsify discussions, part 2

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1443456116.2263.19.camel@zhasha.com>
Date: Mon, 28 Sep 2015 18:01:56 +0200
From: Joakim Sindholt <opensource@...sha.com>
To: tlsify@...ts.openwall.com
Subject: Re: Introduction & summary of tlsify discussions, part 2

On Sun, 2015-09-27 at 22:20 -0400, Rich Felker wrote:
> The following are excerpts from notes by Daniel Kahn Gillmor (dkg),
> who was part of the CII-Madrid tlsify discussions, originally sent to
> me by email before the tlsify list was setup, and my replies. Reposted
> with permission. The original notes from the workshop were a lot more
> sparse than the expanded version I just sent to the tlsify list, so
> some of the questions below are probably already answered, but I think
> it's still useful discussion.
> 
> Rich

I do have some concerns based on this, mostly performance related. It
should be no secret that I think this should be in the kernel. Please
keep that in mind when reading my opinions.

> On Fri, Jul 17, 2015 at 09:18:21PM +0200, Daniel Kahn Gillmor wrote:
> > Peer Cert Verification
> > ----------------------
> > 
> > references for specific examples of SRV-ID:
> > 
> > RFC 6120 specifies the use of SRV-IDs in X.509 certs for XMPP.
> > 
> > RFC 6764 specifies the use of SRV-IDs in X.509 certs for
> > WebDAV/CalDAV/CardDAV.
> > 
> > the upcoming tzdist draft should also use SRV-IDs, iirc.
> > 
> > Also, the list of initialization/configuration parameters doesn't
> > include any mention of a selected list of root CAs or any additional
> > constraints on peer cert verification.  Is the assumption that every
> > tlsify parent should be willing to accept the same set of root CAs?
> > Here's an example where i think that might not be the case: consider an
> > operating system installer that wants to fetch data from the public web
> > (e.g. to show the user some news feed to read during installation), but
> > also wants to fetch software packages and sensitive configuration
> > information from a repository that it knows is certified by an
> > organizational CA X.  The second connection would ideally *only* accept
> > connections from the organizational CA,
> 
> You're completely right that this was an omission. The set of root CAs
> is certainly an input, but most callers will probably want to use a
> default set or a named set of some sort. The exact nature such "named
> sets" might have is unclear to me, but could allow an application to
> say something like "no locally-added root CAs" or "restricted root CA
> set based on zero-tolerance for breaches of trust". Obviously directly
> providing a single organizational CA is another usage case, one which
> should be a lot easier to deal with.

On the topic of CAs, I think OpenSSL is the implementation that stores
symlinks named with either a hash or signature, not sure which. It does
this to avoid loading several hundreds of certs when it really only
wants to check against one, and it can look that one up with nothing but
open(2).

Keep this in mind further down.

> > session resumption
> > ------------------
> > [...]
> Do you think it's prohibitive (from a usability standpoint or
> otherwise) to just have the caller be responsible for getting the
> session token and passing it in when making new sessions if it wants
> to use session resuming? I suppose it's a matter of whether the caller
> mishandling the token is a greater or lesser risk than the tlsify
> implementation mishandling it. Whichever approach we take, simply not
> using session resuming by default is probably the safest, most correct
> approach. (And with keepalive connections, I'm skeptical that resuming
> even has much value for https.)

I, personally, would rather see this handled transparently in tlsify.
Doing so would significantly lessen the amount of code necessary to
write a TLS server that properly supports resumption.
>From a security standpoint it seems more sound to have tlsify handle it
internally. This of course hinges on having at least all affected
connections from one parent process connected through one child process.

> > multiplexing
> > ------------
> > 
> > fork/exec is an expensive step, esp. for complex code that needs to load
> > dynamic libraries/etc.
> > 
> > What if a tlsify parent process could ask an existing tlsify child
> > process (via the control channel, i guess) to "tlsify" another file
> > descriptor?  Many OSes have fd-passing capabilities across sockets these
> > days.  it seems like the tlsify child process in this case would be able
> > to handle multiple sockets in its select loop without a problem, and you
> > could avoid the fork/exec overhead for all connections but the first.
> > The simple one-fork-per-connection arrangement would still work for
> > tlsify parents that prefer simplicity (don't want to deal with a
> > control channel) and are willing to accept the performance hit.
> 
> This is a large part of why I want the "phase 2" to take place: fixing
> the internal TLS implementation not to have so much ridiculous runtime
> overhead. Using static const tables mapped from the executable file on
> disk rather than dynamically initializing tables used for crypto can
> theoretically get the size of the process down to a few pages. The
> base time for posix_spawn is roughly 250-500 us even on slower
> systems, which is negligible in comparison to TLS connection setup if
> I'm not mistaken. So while I think there's possibly some value to
> using a shared process, I question whether it's worth the complexity;
> efforts might be better spent just making the process-per-connection
> more efficient, at least early on.
> 
> > If we were able to do this, we'd need to be able to map the
> > initialization/configuration options to something that could be sent
> > over the control channel as well, right?
> 
> Yes, that would be needed.
> 
> > having this multiplexed arrangment also makes it possible for the tlsify
> > child to have a session-resumption database that stays in RAM (though it
> > would go away when the tlsify process terminates).  it does raise a
> > question of when the tlsify process should shut itself down, though,
> > since it is no longer responsible for only a single connection.  Maybe
> > it could shut itself down when the control channel is closed?
> 
> Yes, that would be reasonable behavior I think.
> 
> > More radically, with this arrangement it's conceivable that you could
> > have a single tlsify process that runs in the background and performs
> > this work for a number of clients.  this offers a nice, easy way to do
> > privsep (no process needs to drop privileges, they just need to be able
> > to talk to the tlsify peer process's control socket).  Maybe that's too
> > fancy?
> 
> This has been considered before, with no definitive conclusion
> reached.

So, after mulling it over for a bit, I have a few concerns depending on
what kind of implementation is chosen.

All connections create a new tlsify process:
* How do you find the installed tlsify? Relevant in case of static
  linking.
* 500µs startup time gives you 2000 key exchanges per second. While
  nginx posted benchmarks showing around 350 poorly defined negotiations
  per core per second[1]. By no means is that negligible overhead.
* Having one process per connection but still polling seems like kind of
  a waste. Might as well have two threads in it to send and recv
  asynchronously. Save some syscalls, parallelize, all that jazz.
* It will undoubtedly waste an awful amount of time looking for and
  parsing certs in the CA folder if it's a client.
* And my main concern: this will be painful to integrate into existing
  applications. The goal here should be to replace the current model
  and requiring all new users to go through all their code, find all
  instances where they create new fds and ensure they set CLOEXEC on all
  of them is a big blocker. I don't consider enumerating and closing all
  1100 open fds to be an acceptable solution to this. Imagine doing that
  5000 times per second.

Parents have one tlsify child handling all connections:
* This is only interesting for servers and suffers all the same problems
  as the approach above.
* Running it with the stdin/stdout pipes would effectively be a special
  case of its intended mode of operation, unless you then call it
  tlsifyd and have tlsify be a small shim around it.

One major tlsifyd and all users just connect to it:
* Security nightmare - even moreso than the current method and in more
  ways than one (file system permissions, probably more)
* This would require two binaries, the latter of which will rarely be
  used. That cannot possibly end well.

The last approach seems to have a far better potential for max
performance. You keep the whole CA cert pool in one place and you can
use session resuming across worker processes with zero issue.

The middle approach is a good midway station for servers but offers
nothing but some extra potential for screw-ups in clients.

The first approach is undoubtedly my favorite but it does have some
serious performance considerations that are vitally important when it
comes to servers and longrunning clients doing many connections. I would
like to see solutions to these problems rather than compromise on the
process model.

> > control channel formats
> > -----------------------
> > 
> > this seems like the big question: how can we structure this in a
> > friendly/simple way without needing to pull in json libraries or
> > other scary/dangerous/complex parsers.  My big fear is that there will
> > be control channel data that is large and possibly complex.
> > 
> > examples of large data: full certificate chains -- these can be quite
> > long.  i don't think there is a functional limit on their length,
> > actually, and the size of any one certificate can be huge.  Similarly
> > large are the hints provided by servers during CertificateRequest
> > messages.
> > 
> > examples of complex data: session initialization/configuration is
> > starting to sound possibly complex, though if it's just command-line
> > arguments, we should be able to get away with line-at-a-time reads
> > (maybe with embedded NULs between args so that the same command-line
> > parser can be used?).  what do you think is the most complex part you've
> > seen?
> 
> Embedded NULs aren't possible; arguments to exec are C strings. Of
> course it's possible to have multiple arguments. But since the command
> line is usually public (via ps, etc.) any potentially-private input
> needs to be via the environment or an active control-channel.
> 
> I'm really not sure what the most complex part is. I think we need to
> get some examples going with a complete inventory of inputs that would
> need to be passed.

I don't really see a need to make it particularly complicated. Pass
certs in PEM format over the ctl line. PEM is plain text and used
absolutely everywhere. This has the added benefit of making it slightly
easier to introspect for debugging purposes.

What other data is there that isn't plain text? I can't think of any.
For the love of all that is good, don't start with the JSON
serialization.

I assume this is going to be handled by a very tiny libtlsify that you'd
link in (always statically?) so it should be some manner of extensible
while also remaining backwards compatible. I fear the day I get the
"your tlsify is too new, please downgrade" message.

- Joakim

[1] https://www.nginx.com/blog/nginx-ssl-performance/
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.