Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 24 Sep 2020 09:04:54 -0700
From: Robert Mustacchi <rm@...golfin.org>
To: libc-coord@...ts.openwall.com, Carlos O'Donell <carlos@...hat.com>
Cc: DJ Delorie <dj@...hat.com>
Subject: Re: RFC: open_memstream, fseek, and SEEK_END.

Hi Carlos,

I did the illumos implementation of this, so I can at least provide a
little color on the choices we made. I agree that the current standard
is pretty lacking in specification. Anything to make it clearer what it
expects and more importantly getting implementations to behave in a
similar way to help out the ultimate users of these APIs, would be welcome.

On 9/24/20 7:14 AM, Carlos O'Donell wrote:
> DJ Delorie and I reviewed the following reported bug regarding
> open_memstream() behaviour and decided that the POSIX behaviour
> was under specified when it came to computing the position of
> SEEK_END in certain conditions.
> 
> I would like to get consensus on what the behaviour should be.
> 
> DJ is going to open an Austin Group ticket to request a clarification.

Can we make sure whatever updates are made also apply to
open_wmemstream() too. I assume it has a similar problem.

> Bug 1875596 - fwrite on stream opened with open_memstream() truncates the buffer size
> https://bugzilla.redhat.com/show_bug.cgi?id=1875596
> 
> Musl (recent):
> Running test: open_memstream()
> position after fwrite: 24
> position after SEEK_END: 24
> open_memstream() final buffer size: 4
> 
> glibc (recent):
> Running test: open_memstream()
> position after fwrite: 24
> position after SEEK_END: 4
> open_memstream() final buffer size: 4
> 
> FreeBSD 12.1:
> Running test: open_memstream()
> position after fwrite: 24
> position after SEEK_END: 24
> open_memstream() final buffer size: 24

I ran the test program on illumos:

Running test: open_memstream()
position after fwrite: 24
position after SEEK_END: 24
open_memstream() final buffer size: 24

We have the same SEEK_END as FreeBSD/musl, but the FreeBSD buffer size.
OpenBSD has their own implementation and a quick glance at the source
code suggests it'd be similar. NetBSD and Dragonfly use the FreeBSD
implementation.

> - FreeBSD computes the wrong final buffer size. The standard
>   is clear that it should be the "smaller of" the current
>   buffer length, or the number of bytes between the
>   beginning of the buffer and the current file position indicator.

You've added more words here than Issue 7, 2018
(https://pubs.opengroup.org/onlinepubs/9699919799/functions/open_memstream.html),
but I can see that interpretation. When I read it initially, I suspect I
interpreted "contain the smaller of the current buffer length and the
number of bytes" as effectively being the file pointer, but the idea
that it's only updated after a write and not just a seek makes sense.

If someone does a SEEK_SET to 48 and writes 4 bytes, what do you expect
final buffer size to be? I'd naively say 52.

If someone does a SEEK_SET to 48, writes 4 bytes, and then does a
SEEK_SET to 0, and then either writes or doesn't write data, what do you
expect the value to be on close? Based on your interpretation in the
case of no write, I'd expect 52, but in the case of a write, what then?
My guess is 4, but I'm not sure and certainly the 'number of bytes' line
is insufficient.

The standard needs to make clear what the impact of seeking and a
subsequent write is "on the number of bytes". I expect that what
FreeBSD, OpenBSD, and myself did is erroneously treat the seek as
updating what that number of bytes should be, while glibc and musl only
used the actual write to do that.

> - glibc and musl disagree about where SEEK_END lies and the
>   standard does nothing to clarify this.
> 
> - Should SEEK_END always seek to the current file position
>   indicator? My opinion is that the behaviour of fflush()
>   and fclose() strongly implies that SEEK_END should always
>   move to the file position indicator which is where the
>   end of the buffer will be should an fflush() or fclose()
>   be called next. As a developer you may want to know where
>   the last write completed, go there, and continue adding
>   data. This behvaiour seems more important than say
>   finding the extent of the buffer (see next point).
>   
> - An alternative argument can be made that the user has no
>   way to find out the current extent of the buffer without
>   the ability to call fseek(SEEK_END) + ftell() and get back
>   the maximum size the current buffer has been extended to.
>   One alternative is to call strlen() on the buffer to find
>   this size since you're always null terminated, but this
>   is costly on large sized buffers if you don't keep track
>   of this position via ftell() yourself.

I agree this is basically unspecified in the standard. In this case, it
comes down to what is the end-of-file for these memory streams. This is
simple with fmemopen(), it's the size of the actual allocated buffer.

With open_memstream() because the standard dictates that the function
has to track a logical buffer length independent from the file position
and it has the times that this buffer is supposed to grow, I interpret
that as one needs to track basically what the logical size of the
underlying buffer is (which is distinct from the memory allocation of
that buffer). That to me is what the end-of-file would be, which has the
closest analogues to a file and open_memstream().

I can see how the fclose() and fflush() stuff might suggest the other
perspective, but it's not what I would jump to first (after all I
implemented the same thing as musl, FreeBSD, and OpenBSD). One thing I'd
caution is that you can't actually call strlen(). Simply SEEK_SET to an
offset and start writing. The intermediate bytes will probably be  set
to '\0' or L'\0' and defeat strlen (a quick of glibc says this'll
happen). You'll get a buffer size of 52 and a strlen of zero.

The way this interface was specified makes it seem like it wasn't
expected that users would ever really seek, except maybe forward to
create holes, giving us our current dilemma.

Anyways, thanks for bringing this up. Hopefully we can get to relative
consensus.

Robert

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.