[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Kernel oops using DMA on MCM

I am working on a custom LX board with a dimm socket and for some time now
I'm chasing a bug that seems very similar to this cache bug. Userland
programs segfault very frequently at random times. I had to change the
"init" with ash just to be able to get access to the board. This shell
works fine, but all the other programs I execute afterwards tend to crash
at 3 out of 5 times after the fork(). Large programs always crash. This
makes the board unusable, so it's very important to find a fix for it.

Even simple programs crash (like "int main(void){ return 0; }") using only
com0 async port for the console, and ethernet connected to an idle lan.
The same code (kernel/devboard) run fine in the devboard (by just changing
the sdram settings to dram).

The system is unstable *only* in userland. The kernel runs fine without
any problems. I have tested the memory with a program from both within the
kernel and the bootloader and it looks fine (although the same code
crashes if run in the userland, with corrupted stack). I can provide
details about what tests I have done and all the things I have tried to
figure out what's wrong. The bottom line is that the whole thing is very
unstable (ex. a printk() before wake_up_process() in do_fork()
in kernel/fork.c makes all the programs crash, while without it the
same programs will run most of the times).

In the current 2.4.19 release with the patch, are the async serial dma and
ethernet dma vulnerable to this "cache bug" ? Do you have any ideas how
this bug could be fixed or detected and prevented "globally" for the

How easy it is to disable the cache (or make the kernel-mm to work only
with uncached addresses) just to see if the cache-bug causes all the