oss-security - Re: linux-distros list policy and Linux kernel, again

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZOzy5H/9go9KPfm3@1wt.eu>
Date: Mon, 28 Aug 2023 21:17:56 +0200
From: Willy Tarreau <w@....eu>
To: Solar Designer <solar@...nwall.com>
Cc: oss-security@...ts.openwall.com, Vegard Nossum <vegard.nossum@...cle.com>,
        Jiri Kosina <jkosina@...e.cz>, Donald Buczek <buczek@...gen.mpg.de>,
        Greg KH <gregkh@...uxfoundation.org>
Subject: Re: linux-distros list policy and Linux kernel, again

Hi Alexander,

On Mon, Aug 28, 2023 at 08:05:18PM +0200, Solar Designer wrote:
> Hi Willy,
> 
> Thank you for your helpful feedback and criticism.
> 
> I just noticed the recent ksummit list thread is also summarized by LWN:
> 
> https://lwn.net/Articles/941745/

I didn't notice.

> In there, Johannes Segitz (SUSE, and a former linux-distros subscriber)
> made a comment saying (among other things):
> 
> "I see the 14 day requirement by distros as the major problem in the way
> it is currently run. I understand why solar designer insists on this (it
> is really tricky to keep information private for any extended time), but
> this then leads to people working around distros and distributing the
> information up front, only to notify distros when it's basically already
> solved and widely known."

It's possible, but I don't have data to back this, what I suspect instead
is that reporting security issues is so stressful for anyone (constantly
making sure not to do a mistake nor to send to the wrong people) that once
they see the fix merged, they just relax and consider the job done, so
most likely linux-distros isn't even contacted at this point. And it's
very possible that some having experienced a friendly process on s@k.o
and felt some unneeded pressure on l-d just don't want to go there again.
I personally see this a bit like projects asking to sign a CLA: you come
there saying "hey, you had a bug there, I fixed it, look" and in return
you feel like you're swamped by some heavy process so you just give up,
swearing you'll never go there again. That might be exagerated but I
can understand how it could be felt that way. I'm having periods where
it's very difficult for me to find even one extra hour a day, and I would
certainly not appreciate at all being pressured like this to tidy my stuff
and prepare for it to be published when I have other things to do, after
having made the effort to report a bug. So that's something to keep in
mind, not everyone deals with it the same way.

> I think people handling a complex issue more privately at first and only
> notifying distros when no more than 14 days is left until planned public
> disclosure is actually fine.  I doubt all distros need to be involved in
> early analysis and fixing of a complex issue e.g. in the kernel.

I generally agree, though see above. Also some reporters systematically
copy some distros' security teams, probably because they were in contact
with them previously and found that it made the process smoother for them,
and it's possible that they consider that this part of the job is already
done.

> > Please note that delays are not specific to hardware issues. We've had
> > to work maybe 3 months with a reporter on a randomness problem that
> > allowed to some extents to guess TCP ports and sequence numbers, and it
> > required us to imagine various approaches that shouldn't break TCP, and
> > iterate with the researchers who studied them, tested them before getting
> > back to us with "it still isn't sufficient". It was a long and painful
> > one, nobody remained idle, yet it was really needed to get to the end of
> > it before publishing anything. Further, the researchers asked us to keep
> > some details on hold for a while because they were preparing a paper, and
> > this is also something to keep in mind (some of them depened on this,
> > though we must not accept that it drags for too long).
> 
> Yes, I understand that such cases and such incentives exist.  In those
> cases, the issue should only be brought to (linux-)distros when it's
> almost ready for publication.

Yes, but again, please see above, I couldn't blame a bug reporter for
wanting to have their week-ends and nights again and think everything's
behind them and in someone else's hands now.

> That said, can you share more detail on the specific issue you referred
> to above and its handling/disclosure timeline?  Was it ever brought to
> oss-security, and if not then why not?

I just checked and I'm not seeing any traces of it there. I don't even
know who normally notifies about such issues there.

> I am guessing this is related to your work on random32 in 2020:
> 
> https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/

Ah yes indeed it's that one! How painful memories suddently come back!

> If so, it looks like the original issue became public via your commit in
> July 2020, but further issues with that fix commit were discovered and
> fixes for them prepared in public in August and only merged in October.
> 
> So I guess some lengthy private discussion occurred before July 2020,

Yeah it started in early March, and Eric, Amit and I basically spent all
our week-ends and numerous evenings experimenting with different methods
to deliver good enough randoms without breaking the principle of not
reusing the same IDs too fast (still have a long minimal period), and
running tests on real traffic, counting failures. At some point in July
I gave up and concluded we couldn't fix it alone between us and needed
some public help, hence the posting.

> but it wasn't enough anyway, which makes me question the value of having
> the initial handling in private.  Maybe the issue wasn't critical enough
> and privately-fixable enough for that.  Maybe this actually illustrates
> that such issues are best handled entirely in public... if it were not
> for the researchers' incentive you mentioned (plan to publish a paper).

It's always the same for random attacks: the reporter sees a very high
success rate in a lab while those dealing with production know for sure
that the success rate is so close to zero in field that it cannoot be
represented on a float. But there's a wide spectrum between the two,
such as mostly idle routers serving as route reflectors, or monitoring
devices etc. Thus you start from "it could theoretically be damaging in
certain environments, let's be careful", with the researchers initially
willing to be discrete since working to prepare a paper. As we made
progress and saw the risks of attack significantly fade away but never
close enough to zero, we concluded that in the worst case we had something
better than the original and it wasn't that much of a problem anymore to
make it public. But I think the researchers also progressed on their side
seeing the hopes to get a quick fix fade away and the reality hit the
theory, thus being more willing to disclose more of their work. It's a
bit of everything.

> > As such I think that it's not a good solution to anything to require a
> > disclosure before a fix is ready. Actually there can be one exception:
> > when no more progress is being made. I don't think I would personally be
> > shocked by saying that a discussion that remained inactive for 7 days
> > leads to publication, it would sufficiently put the pressure on all parties
> > not to let it cool rot. And difficult issues generally don't stay inactive
> > for more than a few days.
> 
> Makes sense.  The current kernel documentation edit should take care of
> this (no linux-distros notification until fix is ready) for cases where
> the reporter learns of linux-distros from there.  Maybe we should even
> duplicate this information on the linux-distros wiki page?

Maybe. I'm not good at adopting processes, so I'm not the best one to
suggest either way :-/

> Alternatively, we may need to relax the policy.

I personally think it does have a flaw that is emphasized by the linux
kernel handling but can actually affect other projects. Some sole
developers might just not have enough resources to do everything in
14 days, from diagnosing the problem at night or only during a few work
hours, setting up a lab on the week-end to test a fix, to contacting
whoever needs to be contacted and making releases. Some even make the
mistake of developing new stuff in maintenance branches and feel like
they need to finish before releasing (already seen)! I remember having
had to search in my boxes of hardware to re-assemble a working PC with
a floppy drive just to be able to validate a fix in the floppy driver.
You can be sure I only did that the week-end after the report, but
that's possibly 5 days lost already!

I understand the rationale behind your policy. I, too, was on vendor-sec
where we saw some vendors say "just FYI we're trying to fix this, we'll
keep you updated" and one year later, no news. But all those doing a
serious work (and there are, and the linux security team is doing that
serious work) can be heavily penalized by that policy when they're not
quick enough to obtain a fix. The linux people are known for being vocal,
so you hear about them. But other developers might just feel completely
crushed by this and it could really be harmful to them, especially when
they're new to this and haven't been dealing with security reports for
25 years like many of us.

That's why I tend to think that what would better address what you want
to prevent, is ensuring the discussion doesn't come to a stall. This
could remove a lot of frustration. And if something has to be published
before the end because the developers or vendor stay silent, it's much
more powerful to say "they didn't dare responding for 14 days" than
"they couldn't figure a working fix for this complex issue in 14 days".

> Via links from the new LWN story, I also found your similar comments
> from 2022:
> 
> https://lwn.net/Articles/897065/

Ah I didn't remember :-)  At least it seems I'm consistent on this topic.

> Here's a thought experiment: what if the list were not private at all,
> e.g. like oss-security is not?  Sure someone can ask to "please keep
> this confidential", but if it's posted to the list that would be
> ineffective.  So what people sometimes do on public lists, Bugzillas,
> GitHub issues, etc. is share private reproducers with individual
> maintainers out-of-band, such as via direct e-mail, while keeping the
> main discussion on the list, etc.

Mistakes are made all the time on these. I just had one two weeks ago on
haproxy via github that saved me an embargo :-)  Having a private list to
estimate if the risk is real and to forward to the skilled people is more
effective IMHO. We see this a lot on s@.... I think that about 1/3 of the
issues end up as "do not worry, just post this publicly". That encourages
those not really in the security business to seek help without taking the
risk to be blamed for disclosing something dangerous.

> I see no good reason why the same
> can't be happening on a temporarily-private list.  So the real problem
> may be that (linux-)distros is misunderstood as permanently-private
> rather than temporarily-private.  Unfortunately, I don't know how to
> address that reliably.  Even with automated delayed publication, some
> people would initially have the wrong idea... maybe unless they have to
> pass through a web page with the public archives before finding the
> posting address?

Just a stupid idea, it could possibly be addressed by a confirmation
e-mail on an opening thread. Something like "we need you to confirm that
what you posted will be made public by YY/MM/DD, if that's really what
you want, please visit this link within 24h otherwise all your materials
will be destroyed". I'm not sure, that's just an idea. But yes, it needs
to be understood as public so that confidential stuff is not shared
there, and it must be possible to ask for some materials to be erased
early if the reporter wasn't aware of this or made a mistake (e.g. send
a pcap just before the security team says "never ever share a pcap!").

> Alternatively, we may need to relax the policy.
> 
> Just thinking out loud.

You're welcome. I don't want to interfere with the lists you operate
nor with those working on them, but I observe that there has been some
frictions multiple times for reasons that are probably not too hard to
address if respective participants discuss just a bit, which is why I'm
sharing some observations ;-)

Regards,
Willy
Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.