Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 29 Mar 2016 09:32:54 -0400
From: Rich Felker <dalias@...c.org>
To: Jaydeep Patil <Jaydeep.Patil@...tec.com>
Cc: "musl@...ts.openwall.com" <musl@...ts.openwall.com>
Subject: Re: [PATCH] Fix atomic_arch.h for MIPS32 R6

On Tue, Mar 29, 2016 at 07:16:46AM +0000, Jaydeep Patil wrote:
> >-----Original Message-----
> >From: Rich Felker [mailto:dalias@...ifal.cx] On Behalf Of Rich Felker
> >Sent: 29 March 2016 AM 09:41
> >To: Jaydeep Patil
> >Cc: musl@...ts.openwall.com
> >Subject: Re: [musl] [PATCH] Fix atomic_arch.h for MIPS32 R6
> >
> >On Tue, Mar 29, 2016 at 03:54:02AM +0000, Jaydeep Patil wrote:
> >> >-----Original Message-----
> >> >From: Rich Felker [mailto:dalias@...ifal.cx] On Behalf Of Rich Felker
> >> >Sent: 28 March 2016 PM 06:35
> >> >To: Jaydeep Patil
> >> >Cc: musl@...ts.openwall.com
> >> >Subject: Re: [musl] [PATCH] Fix atomic_arch.h for MIPS32 R6
> >> >
> >> >On Mon, Mar 28, 2016 at 05:07:39AM +0000, Jaydeep Patil wrote:
> >> >> >> >I was just saying it makes the code less cluttered to use them
> >> >> >> >spuriously even though we don't need to:
> >> >> >> >
> >> >> >> >		".set push ; "
> >> >> >> >#if __mips_isa_rev < 6
> >> >> >> >		".set mips2 ; "
> >> >> >> >#endif
> >> >> >> >		"ll %0, %1 ; .set pop"
> >> >> >> >
> >> >> >> >or similar.
> >> >> >> >
> >> >> >> >It's also not clear to me whether the "m" constraint is valid
> >> >> >> >anymore for the R6 ll/sc instructions since they take a 9-bit
> >> >> >> >offset now instead of a
> >> >> >16-bit offset.
> >> >> >> >The compiler could generate an address expression whose offset
> >> >> >> >part does not fit in 9 bits. In that case we may need to #if
> >> >> >> >the whole function (or at least the __asm__ statement)
> >> >> >> >separately rather than just
> >> >> >skipping the .set mips2....
> >> >> >> >
> >> >> >>
> >> >> >> The "m" constrain is still valid here, as the offset will be 0 in this case..
> >> >> >
> >> >> >How can you assume the offset will be 0? It's the compiler's
> >> >> >choice what to use. For instance, a_cas(&foo->bar, t, s) is likely
> >> >> >to have an offset equal to offsetof(__typeof__(foo),bar). AFAIK
> >> >> >this happens in practice with small offsets in mutex structures,
> >> >> >etc. so the bug may be unlikely to be hit, but I think it's still an incorrect-
> >constraint bug.
> >> >>
> >> >> Compiler generates appropriate LL/SC based on the offset.
> >> >> Compiler adds the offset to the base register if it does not fit 9bits.
> >> >
> >> >The compiler has no way of knowing that the operand will be used with
> >> >ll with the 9-bit offset restriction; as far as it knows, it will be
> >> >used in a normal context where a 16-bit offset is valid. I don't have
> >> >a toolchain that will target r6, but you can try the following
> >> >program which produces an offset of 4096 for loading p[1024]:
> >> >
> >> >unsigned ll1k(volatile unsigned *p)
> >> >{
> >> >	unsigned val;
> >> >	__asm__ __volatile__ ("ll %0, %1" : "=r"(val) : "m"(p[1024]) :
> >> >"memory" );
> >> >	return val;
> >> >}
> >> >
> >> >I would expect this to produce errors at assembly time on r6.
> >> >Rich
> >>
> >> This is what compiler has generated for above function:
> >>
> >> $ gcc -c -o main.o main.c -O3 -mips32r6 -mabi=32
> >>
> >> Objdump:
> >>
> >> 00000000 <ll1k>:
> >>    0:   24821000        addiu   v0,a0,4096
> >>    4:   7c420036        ll      v0,0(v0)
> >>    8:   d81f0000        jrc     ra
> >>    c:   00000000        nop
> >
> >Can you try gcc -S instead of -c (still at -O3) to produce asm output without
> >assembling it?
> 
> Generated asssembly:
> 
> #APP
>  # 4 "test.c" 1
>         ll $2, 4096($4)
>  # 0 "" 2
> #NO_APP
>         jrc     $31
> 
> Even if we set "noreorder" before LL, assembler generates addiu+ll:
> 
> 00000000 <ll1k>:
>    0:   24821000        addiu   v0,a0,4096
>    4:   7c420036        ll      v0,0(v0)
>    8:   d81f0000        jrc     ra
>    c:   00000000        nop

I see. I suspected the assembler was doing it. "noat", not
"noreorder", is the way to suppress things like this but I doubt even
"noat" does it since a separate temp register ("at") is not needed in
this case.

If all assembers that support R6 support this rewriting, then the ZC
constraint in gcc is really just an optimization, not strictly
necessary. We should probably check (1) whether clang's internal
assembler can do the rewriting, and (2) whether clang supports the ZC
constraint. I would prefer using ZC but I want to do whatever is more
compatible; I don't think the codegen efficiency matters a lot either
way.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.