[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: etrax fs / linux 2.6.19: ethernet can't receive but still transmits


I investigated this problem to some extent and made a temporary fix to
eth_v32.c  driver:

--- eth_v32.c    2007-10-10 14:11:00.000000000 -0600
+++ eth_v32_patched.c    2007-12-07 17:12: 15.000000000 -0700
@@ -418,6 +418,7 @@
        REG_RD_INT(dma, np->dma_in_inst, rw_saved_data)) {
         /* Now really advance the ring one step.  */
+        if ((((crisv32_eth_descr*)(phys_to_virt((int)np->active_rx_desc->
descr.next)))->descr.in_eop) && !(np->active_rx_desc->descr.in_eop))
     } else {
         /* delay the advancing of the ring.  */
         np->new_rx_package = 0;

It seems to work for me on our ETRAX FS-based product (
http://wiki.elphel.com/index.php?title=10353 ) that you are using, but
that problem is likely more general and applies to other products that
use ETRAX FS. Somebody can try the following command on their
system(s) and see if that will also hang the Ethernet port.

We were able to make the Ethernet connection fail with a simple ping
flood (suggested by Spectr):

nice -n -15 ping -f -p ff -Q 0x10 -s 3200 -l 200

Usually after a minute or so our board stopped responding (even if the
ping flood is aborted), ifconfig eth0 down; ifconfig eth0 up from the
serial console was enough to restore operation. While debugging
eth_v32.c I noticed the following - sometimes rxdesc[..].in_eop had
zeroes where the driver expected to have ones. This bit means that the
DMA packet completely fit into designated 1522 bytes (no packet can be
longer) so driver is using it as a marker in FIFO - driver resets that
bit, DMA - sets.

But, for some reasons during ping flood and many buffer overruns it
was possible to get (rxdesc[..].in_eop==0) &&
((rxdesc[..].after-rxdesc[..].buf)==1522) - it is probable that DMA
was trying to put more than 1522 bytes in the buffer and marked
.in_eop=0 (meaning it will be continued in the next DMA descriptor),
but the software considered ".in_eop==0" as "DMA did not write here
data yet and waits.. forever, as DMA already passed that descriptor,
wrote "0" and is stuck at ".in_eol==1" descriptor - just one below the
one being readout by the software.

I believe Axis engineers will be able to look into this problem and
make a permanent bug fix for our problem.