[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Heads Up: GCC 3.2, on MIPS, bug with JFFS2



Hello, Jffs2 developers.

I'm using GCC 3.2 for a MIPS target (big endian, if anyone cares), and
found an apparent bug in GCC 3.2 that shows up in the generated code
for JFFS2.  I wanted to inform anyone else who might run into it, to
save some potential scratching of heads.  

This probably ONLY applies to MIPS code generation from GCC, not for
other processors!

The symptom is: a bunch of "jffs2_scan_eraseblock(): Magic bitmask
0x1985 not found at 0xXXXXXXXX: 0x2003 instead" messages.  

Easiest to re-create by erasing the partition, mounting it, let the
filesystem do it's thing erasing the blocks, unmount, then try to
remount it.  I actually re-booted before remounting, but I don't
believe that would be necessary.

The message is printed because at the begining of each block, instead
of 16 bits of 0x1985 (JFFS2_MAGIC_BITMASK), followed by 16 bits of 0x2003
(JFFS2_NODETYPE_CLEANMARKER), you find that 0x2003 is repeated twice.

The following change is what I did in fs/jffs2/erase.c to fix the
problem, around line 379:

------------------------------------------------------------
#if 0
		struct jffs2_unknown_node marker = {
			.magic =	cpu_to_je16(JFFS2_MAGIC_BITMASK),
			.nodetype =	cpu_to_je16(JFFS2_NODETYPE_CLEANMARKER),
			.totlen =	cpu_to_je32(c->cleanmarker_size)
		};
#else
		/* Working around MIPS gcc bug */
		struct jffs2_unknown_node marker ;
		memset (&marker, 0, sizeof(marker)) ;
		marker.magic =     cpu_to_je16(JFFS2_MAGIC_BITMASK);
		marker.nodetype =  cpu_to_je16(JFFS2_NODETYPE_CLEANMARKER);
		marker.totlen =    cpu_to_je32(c->cleanmarker_size);
#endif
------------------------------------------------------------

The situation is that all of the __attribute__((PACKED)) stuff
involved with cpu_to_je*() "confuses" the compiler.  When generating
the struct to initialize 'marker', it writes the 0x1985, but then
re-uses the location it wrote 0x1985 to as a temporary location when
generating 0x2003, so 0x2003 ends up in both 16 bit locations.

For whatever reason, the compiler doesn't choke on the more
traditional code in the #else region.

I haven't experimented too much with compiler flags, etc., to see if I
can make this problem go away.  I have reduced the problem to a small
file, and have passed that on to our guy who handles GCC stuff; he'll
be looking at it and passing it on to the GCC maintainers, unless he
sees something I don't.  (The small file looks a lot like the above
code, and generates the same result for both versions on an x86, but
different results between them on our mips compiler.  If someone wants
to see it, just ask, I'll send it.)

To be clear: I'm *not* asking this list for GCC support!

Should the original JFFS2 source code be changed to the #else code?
I'd usually shy away from even suggesting it, because the code is
fine, the compiler's what's broken.  There is a small chance that
someone else could run into this without reading this message, though.
So, if you think it'll save some support effort in the future, you
might want to do it.

But *I'm* not going to be the one to push for it.

When I first ran into the bug, I thought, "Man, this JFFS2 code must
be a bunch of junk!  But why isn't anyone else complaining?"  Then
later, it was "Oh, it's the COMPILER.  Nevermind."  I felt a bit like
Gilda Radner/Emily Latella for a second.  :-)

--> Steve Wahl, Brecis Communications

To unsubscribe from this list: send the line "unsubscribe jffs-dev" in
the body of a message to majordomo@xxxxxxx.com