[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Kernel oops using DMA on MCM

Your problem is far to frequent to be caused by the cache bug. The cache
bug only happens under heavy DMA load.

Have you added or modified any device driver?
Have you tested with several different DIMM modules?
Have you run the --memtest option to etrax100boot?
Can you send your schematics, kernel config and the part
number of your DIMM (not on this list though)

>How easy it is to disable the cache (or make the kernel-mm to work only
>with uncached addresses) just to see if the cache-bug causes all the

It would require modifications in a few files but this can't be the 
problem with just a serial console and an idle LAN.


-----Original Message-----
From: owner-dev-etrax@xxxxxxx.com]On">mailto:owner-dev-etrax@xxxxxxx.com]On
Behalf Of Phoenix
Sent: Thursday, December 19, 2002 3:31 AM
To: dev-etrax
Subject: RE: Kernel oops using DMA on MCM

I am working on a custom LX board with a dimm socket and for some time now
I'm chasing a bug that seems very similar to this cache bug. Userland
programs segfault very frequently at random times. I had to change the
"init" with ash just to be able to get access to the board. This shell
works fine, but all the other programs I execute afterwards tend to crash
at 3 out of 5 times after the fork(). Large programs always crash. This
makes the board unusable, so it's very important to find a fix for it.

Even simple programs crash (like "int main(void){ return 0; }") using only
com0 async port for the console, and ethernet connected to an idle lan.
The same code (kernel/devboard) run fine in the devboard (by just changing
the sdram settings to dram).

The system is unstable *only* in userland. The kernel runs fine without
any problems. I have tested the memory with a program from both within the
kernel and the bootloader and it looks fine (although the same code
crashes if run in the userland, with corrupted stack). I can provide
details about what tests I have done and all the things I have tried to
figure out what's wrong. The bottom line is that the whole thing is very
unstable (ex. a printk() before wake_up_process() in do_fork()
in kernel/fork.c makes all the programs crash, while without it the
same programs will run most of the times).

In the current 2.4.19 release with the patch, are the async serial dma and
ethernet dma vulnerable to this "cache bug" ? Do you have any ideas how
this bug could be fixed or detected and prevented "globally" for the

How easy it is to disable the cache (or make the kernel-mm to work only
with uncached addresses) just to see if the cache-bug causes all the