Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Fri, 28 Oct 2005 08:40:55 +0400
From: Solar Designer <solar@...nwall.com>
To: owl-users@...ts.openwall.com
Subject: Re: sed pattern matching (new-line character)

On Thu, Oct 27, 2005 at 07:32:45PM +0400, (GalaxyMaster) wrote:
> According to the sed(1) manual page:
> 
>                                                   The \n sequence
>        in a regular expression matches the newline character, and
>        similarly for \a, \t, and other sequences.
[...]
> jill!galaxy:~$ cat sed-test.txt | sed 's,\n,,g'
> line 1
> line 2
> line 3
> jill!galaxy:~$ cat sed-test.txt | tr -d '\n'
> line 1line 2line 3jill!galaxy:~$
> 
> The last two commands should produce the equivalent output, but they
> don't :(.  What I'm missing?

You're missing the fact that sed removes newline characters from lines
of input when placing them into the pattern space and that it adds
the newlines back when outputting the contents of the pattern space.
That's the way it is supposed to work, according to both the texinfo
documentation for GNU sed and POSIX.1-2001.  The '\n' can only be used
to match newlines embedded in pattern space (it is possible and
sometimes useful to embed a newline in the pattern space using sed
commands).

The texinfo documentation for GNU sed says:

|    `sed' operates by performing the following cycle on each lines of
| input: first, `sed' reads one line from the input stream, removes any
| trailing newline, and places it in the pattern space. [...]
| 
|    When the end of the script is reached, unless the `-n' option is in
| use, the contents of pattern space are printed out to the output
| stream, adding back the trailing newline [...]

POSIX.1-2001 says:

| In default operation, sed cyclically shall append a line of input, less
| its terminating <newline>, into the pattern space.  Normally the pattern
| space will be empty, unless a D command terminated the last cycle.  The
| sed utility shall then apply in sequence all commands whose addresses
| select that pattern space, and at the end of the script copy the pattern
| space to standard output (except when -n is specified) and delete the
| pattern space.  Whenever the pattern space is written to standard output
| or a named file, sed shall immediately follow it with a <newline>.
| 
| [...]
| 
| Also note that '\n' cannot be used to match a <newline> at the end of an
| arbitrary input line; <newline>s appear in the pattern space as a result
| of the N editing command.

("N" is not the only way to embed a newline in the pattern space.  There
are many others.)

> It seems like a real bug in sed's code.

No bug there.

-- 
Alexander Peslyak <solar at openwall.com>
GPG key ID: B35D3598  fp: 6429 0D7E F130 C13E C929  6447 73C3 A290 B35D 3598
http://www.openwall.com - bringing security into open computing environments

Was I helpful?  Please give your feedback here: http://rate.affero.net/solar

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.