Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sun, 25 Dec 2022 17:44:50 +0100
From: Dominik Czarnota <dominik.b.czarnota@...il.com>
To: oss-security@...ts.openwall.com
Subject: Re: [Linux] /proc/pid/stat parsing bugs

> To me this seems like a parsing problem, not a VFS problem. (...)

Indeed, it is a parsing problem, but if you have to split by ')'
or/and read data from the end of file in order to parse it properly,
that's not the best design.

It is probably the lack of proper documentation and examples that
causes devs to make those mistakes since it is hard to think or account
for all edge cases. Escaping the rendered output or using a standard
format like json/xml would probably cause less mistakes like this.

> Others have a simple and well-defined format
> (like /proc/self/environ and /proc/self/cmdline, which are sequences of
> \0-terminated bytestrings), and those also seem fine.

The format may seem to be well-defined, but it isn't. Nothing stops a
process from changing what is rendered in their /proc/$pid/cmdline and
/proc/$pid/environ files.
The data in those files is rendered from mm->arg_start and
mm->env_start user-space pointers respectively [0][1] and it can be
changed either by:
1) modifying the underlying data, e.g. overwriting the memory under
argv[n] envp[n] in main
2) changing those pointers with the prctl syscall with
PR_SET_MM_ARG_{START,END} and PR_SET_MM_ENV_{START,END} flags [2]
3) or by the setproctitle (3bsd) function [3]

The `man procfs` page mentions that the `environ` file content may
change, but it doesn't do so for the `cmdline` file:

```
   /proc/[pid]/cmdline
      This read-only file holds the complete command line for the
process, unless the process is a zom‐
      bie.  In the latter case, there is nothing in this file: that
is, a read on this file will return
      0  characters.   The  command-line arguments appear in this file
as a set of strings separated by
      null bytes ('\0'), with a further null byte after the last string.
```

I understand we may not want to change what is already there to not
break existing applications. But adding new files with well-defined
formats and extending existing man pages sounds like a reasonable
solution.

[0] get_mm_cmdline -
https://elixir.bootlin.com/linux/v6.1.1/source/fs/proc/base.c#L255
[1] environ_read -
https://elixir.bootlin.com/linux/v6.1.1/source/fs/proc/base.c#L941
[2] https://man7.org/linux/man-pages/man2/prctl.2.html#:~:text=since%20Linux%203.5.-,PR_SET_MM_ARG_START,-Set%20the%20address
[3] https://www.freebsd.org/cgi/man.cgi?query=setproctitle&sektion=3

Best regards,
Dominik 'Disconnect3d' Czarnota



On Fri, 23 Dec 2022 at 17:50, Simon McVittie <smcv@...ian.org> wrote:
>
> On Thu, 22 Dec 2022 at 10:04:48 -0500, Shawn Webb wrote:
> > We knew way back then the dangers of VFS-based wizardry. Did we lose
> > that knowledge somehow?
>
> To me this seems like a parsing problem, not a VFS problem. Some
> pseudo-files in Linux /proc are one file per item (/proc/self/oom_adj,
> /proc/self/sessionid, most of /proc/sys) and those are fine[1]: the
> structure is implicit in the filesystem layout, and the file contents
> are trivial to "parse". Others have a simple and well-defined format
> (like /proc/self/environ and /proc/self/cmdline, which are sequences of
> \0-terminated bytestrings), and those also seem fine.
>
> It's the pseudo-files that contain more than one item, particularly
> those with a semi-consistent format that aims for human-readability, that
> can easily get into escaping and parsing issues. If those pseudo-files
> made *more* use of the VFS (one new file in /proc/self for each field
> in the current /proc/self/stat?) then they would suffer from different
> issues instead, like inability to read all fields atomically and maybe
> performance issues for heavy users, but parsing would become a non-issue.
>
>     smcv
>
> [1] or when they're not fine, the issues are around things like how to
>     separate an AppArmor enforcement mode from the label, which again is
>     a matter of parsing a human-readable format with structure

Powered by blists - more mailing lists

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.