[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: network throughput



Hi,
 
Thanks for the program... I am not sure exactly where the problem is, but
I consistently get UDP as well as TCP throughput of only around 1.2 to 1.5
MBytes/s using tcp_perf. 

Using netperf, I get only slightly higher (and sometimes worse) throughput
as I had mentioed in an earlier mail....

The thing I am more concerned about is the overruns, since they are
causing the box to lock up... 

THe UDP sender in tcp_perf has a built-in flow control - the receiver has
to ack the sender after every 10 packets. If I change this to slightly
above 25, there are overruns, and packets are lost.  


so basically I need to understand how overruns are caused - here's my
understanding of it - 

- the card has a DMA ring into which packets are received. It transfers
 packets from here to pre-programmed addresses in main
mem. using DMA, disables the DMA slots in which there are packets, and
raises an interrupt after
this. When servicing the interrupt, the OS should re-enable the DMA
for each slot after it is done handling the packet in it. If a
packet arrives to that slot on the card when the DMA is not re-enabled for
it, there is an overrun.

Is this correct?
 
-In that case, what is number of slots in the DMA ring on the card?
(Mikael had mentioned that there are 64 OS buffers, but presumably the
card has less - if I configure the gap in tcp_perf's UDP send routine to
anything more than 25 packets, there are overruns...)

- in the latest version of ethernet.c, there is the RX_COPYBREAK
optimization Mikael mentioned - "double buffering",
right? Is there any data on how this threshold (currently 256 bytes) was
arrived at, and if it might be better to do double buffering for all
packet sizes? 

Thanks, 
akshay




On Sat, 24 Aug 2002, Zoran Tomicic wrote:

> Hi,
> 
> I also has some concerns regarding the netwok performance, so I made
> a simple network/memory bechmark utility tcp_perf.c (you can find it in
> devboard_lx/tools/src/tcptest). It works both on Linux and Windows.
> 
> Here are results for devboard:
> 
> Test 1 (devboard TCP server, Linux PC client)
>  ==============================================
>  Devboard:
>  [root@AxisZoka /]49# tcp_perf -s 5000 1000000
>  Waiting for client connection
>  Accepted client connection
>  Time to receive 100000000 bytes was 30160 msec (3315600 bytes/sec)
>  3300 recv() calls made, average CPU load during transfer 98% (S:98% U:0%)
> 
>  Linux PC:
>  [zoka@xxxxxxx.99  5000 10000000 1000000
>  Connecting to 192.168.0.99:5000...
>  Connected to 192.168.0.99!
>  10000000 bytes sent to 192.168.0.99  10 send() calls made
> 
>  Test 2 (devboard UDP server, Linux PC client)
>  ==============================================
>  Devboard:
>  [root@AxisZoka /]49# tcp_perf -us 5000
>  Time to receive 10000000 bytes was 3220 msec (3105500 bytes/sec)
>  7143 packets received, average CPU load during transfer 96% (S:94% U:2%)
> 
>  Linux PC:
>  [zoka@xxxxxxx.99  5000 10000000 1400
>  10000000 bytes sent to 192.168.0.99 in 7143 packets
> 
>  Note that UDP server has to be started before the client, otherwise the
>  handshake mechanism will fail.
> 
> 
> Johan Adolfsson did some tests for me on 16-bit wide SDRAM:
> ===========================================================
> AxisBoard(16bit-SDRAM):
> Time to receive 10000000 bytes was 2200 msec (4545400 bytes/sec)
> 336 recv() calls made, average CPU load during transfer 77% (S:77% U:0%)
> 
> 
> Regads Zoran
> 
> 
> 
> 
> 
> > -----Original Message-----
> > From: owner-dev-etrax@xxxxxxx.com]On">mailto:owner-dev-etrax@xxxxxxx.com]On
> > Behalf Of Akshay Adhikari
> > Sent: Saturday, 24 August 2002 02:15
> > To: Mikael Starvik
> > Cc: dev-etrax@xxxxxxx.com
> > Subject: RE: network throughput
> >
> >
> > Hi,
> >
> > Thanks for the help... however, my numbers using netperf are inconsistent
> > with yours, and there are some other problems/questions I have:
> >
> > I am using kernel 2.4.14.
> >
> > - with netperf, TCP achieved at most 12-15 Mbits/s (when the
> > receiver was a devbaord, and the sender was either a devboard or a PIII).
> >
> > - with netperf, sending UDP packets of sizes in the range 100-500
> > bytes, spaced about 1ms apart cause ethernet overruns very quickly on the
> > receiver box. For example, sending 250 byte UDP packets, one every 1 ms
> > from one devboard to another for a period of 20 seconds gave these
> > results:
> >
> > packets sent: 19187
> > packets rcvd: 3412
> >
> > all dropped packets were overruns on the receiver box.
> >
> > This behavior is also inconsistent- it happens for a wide
> > range of packet sizes and interpacket spacing.
> >
> > When this happens, the box sometimes freezes: here is part of a message
> > I get from the kernel:
> > Unable to handle kernel access at virtual address 20000000
> > Oops: 0000
> >                                                                     IRP:
> > 60068bd8 SRP: 600689be DCCR: 000004a0 USP: 4ffffd84 MOF: 00000000
> >
> >                                                            r0: 00000200
> > r1: b00
> >
> > another strange thing: after the box freezes, if I reboot it by pressing
> > the reboot button,
> > it negotiates
> > half-duplex mode with the switch it is connected to. On the other hand,
> > if I reboot it by _powering off_, it negotiates ful duplex.
> >
> > any idea why this might happen?
> >
> > - Since the acheived throughput is surprisingly low, I'm trying to locate
> > possible sources of bottlenecks - card to OS DMA transfer because of slow
> > mem. access time, or simply the overhead of calling recv() repeatedly from
> > user space, etc....
> >
> > any suggestions towards improving the performance by just OS tweaks rather
> > than hardware changes would be great!
> >
> > Thanks!
> > akshay
> >
> >
> >
> > On Fri, 23 Aug 2002, Mikael Starvik wrote:
> >
> > > Hi,
> > >
> > > If I remember it correctly the performance with a developer
> > > board LX is approximately 4 MBytes/s for receive and 3 MByte/s
> > > for transmit (with the latest kernel). With other memory
> > > configurations you can get 5 MByte/s (receive) and 4 MByte/s
> > > (transmit).
> > >
> > > >- data bus width (32 bit??)
> > >
> > > On the developer board LX the data bus width is 16 bits.
> > > With 32 bits and SDRAMs you can get higher performance.
> > >
> > > >- data bus speed
> > > 50 MHz
> > >
> > > >- dma setup latency
> > > I don't know the typical latency. It depends a lot on what
> > > else is going on in the system (although the ethernet DMAs
> > > have very high priority)
> > >
> > > >- size of network rx buffers (32KB??)
> > > 64 full size packets
> > >
> > > >- maybe typical interrupt latency....
> > > I don't know.
> > >
> > > /Mikel
> > >
> > >
> > > -----Original Message-----
> > > From: owner-dev-etrax@xxxxxxx.com]On">mailto:owner-dev-etrax@xxxxxxx.com]On
> > > Behalf Of Akshay Adhikari
> > > Sent: Thursday, August 22, 2002 3:55 PM
> > > To: dev-etrax
> > > Subject: network throughput
> > >
> > >
> > > hello,
> > >
> > > I would like to find out the max achievable network throughput on a
> > > devboard_lx (For example, in experiments with netperf, I found that tcp
> > > throughput was atmost 15 Mbps, and sending UDP packets back to back
> > > caused receiver overruns pretty fast...)
> > >
> > > I was trying to find the specifications for the devboard_lx to use for
> > > calculations of
> > > max. achievable data throughput; it would be useful to know:
> > >
> > > - data bus width (32 bit??)
> > > - data bus speed
> > > - dma setup latency
> > > - size of network rx buffers (32KB??)
> > > - maybe typical interrupt latency....
> > >
> > > does anyone have these numbers or know how/where to obtain them?
> > >
> > > TIA,
> > > akshay
> > >
> >
> >
> 
>