USFCS try machine

Document Sample
USFCS try machine Powered By Docstoc
					Our „xmit1000.c‟ driver

Implementing a „packet-transmit‟
 capability with the Intel 82573L
   network interface controller
   Remenber „echo‟ and „cat‟?
• Your device-driver module (named „uart.c‟)
  was supposed to allow two programs that
  are running on a pair of adjacent PCs to
  communicate via a “null-modem” cable
         Transmitting…          Receiving…


    $ echo Hello > /dev/uart
                               $ cat /dev/uart
             $_
                                   Hello _
             „keep it simple‟
• Let‟s try to implement a „write()‟ routine for
  our Intel Pro/1000 ethernet controllers that
  will provide the same basic functionality as
  we achieved with our serial UART driver
• It should allow us to transmit a message
  by using the familiar UNIX „cat‟ command
  to redirect output to a character device-file
• Our device-file will be named ‘/dev/nic‟
               Driver‟s components
 my_fops
     write                                        my_write()
‘struct’ holds one             This function will program the actual data-transfer
 function-pointer
                                           my_get_info()
                     This function will allow us to inspect the transmit-descriptors


           module_init()                              module_exit()
This function will detect and configure       This function will do needed ‘cleanup’
the hardware, define page-mappings,           when it’s time to unload our driver –
allocate and initialize the descriptors,      turn off the ‘transmit’ engine, free the
start the ‘transmit’ engine, create the       memory, delete page-table entries,
pseudo-file and register ‘my_fops’            the pseudo-file, and the ‘my_fops’
                      Kzalloc()
• Linux kernels since 2.6.13 offer this convenient
  function for allocating pre-zeroed kernel memory
• It has the same syntax as the „kmalloc()‟ function
  (described in our texts), but adds the after-effect
  of zeroing out the newly-allocated memory-area
         void *kmem = kmalloc( region_size, GFP_KERNEL );
         memset( kmem, 0x00, region_size );
                 /* can be replaced with */
         void *kmem = kzalloc( region_size, GFP_KERNEL );

• Thus it does two logically distinct actions (often
  coupled anyway) within a single function-call
  Single page-frame option



                Packet-Buffer (3-KB)
        (reused for successive transmissions)
4KB
Page-
Frame


               Descriptor-Buffer (1-KB)
           (room for up to 256 descriptors)
            Our Tx-Descriptor ring
After writing the data into our packet-buffer, and writing its length to the
the current TAIL descriptor, our driver will advance the TAIL index; the
NIC responds by reading the current HEAD descriptor, fetching its data,
then advancing the HEAD index as it sends our data out over the wire.

                                      TAIL                             HEAD

                                                   descriptor 0
                                                   descriptor 1
                                                   descriptor 2
               Our
            „reusable‟                             descriptor 3
         transmit-buffer                           descriptor 4
          (1536 bytes)
                                                   descriptor 5
                                                   descriptor 6
                                                  descriptor 7
        one packet-buffer                Array of 8 transmit-descriptors
            „/proc/xmit1000‟
• This pseudo-file can be examined anytime
  to find out what values (if any) the NIC has
  „written back‟ into the transmit-descriptors
  (i.e., the descriptor-status information) and
  current values in registers TDH and TDT:
             $ cat /proc/xmit1000
       Direct Memory Access
• The NIC is able to „fetch‟ descriptors from
  host-system‟s memory (and also can read
  the data from our packet-buffer) as well as
  „store‟ a status-report back into the host‟s
  memory by temporarily becoming the Bus
  Master (taking control of the system-bus
  away from the CPU so that it can perform
  the „fetch‟ and „store‟ operations directly,
  without CPU involvement or interference)
 Configuration registers
 CTRL      Device Control

CTRL_EXT   Extended Device Control

  TIPG     Transmit Inter-Packet Gap

  TCTL     Transmit Control

 TDBAL     Transmit Descriptor-queue Base-Address (LOW)

 TDBAH     Transmit Descriptor-queue Base-Address (HIGH)

 TDLEN     Transmit Descriptor-queue Length

  TDH      Transmit Descriptor-queue HEAD

  TDT      Transmit Descriptor-queue TAIL

TXDCTL     Transmit Descriptor-queue Control
     The „initialization‟ sequence
•   Detect the network interface controller
•   Obtain its i/o-memory address and size
•   Remap the i/o-memory into kernel-space
•   Allocate memory for buffer and descriptors
•   Initialize the array of transmit-descriptors
•   Reset the NIC and configure its operations
•   Create the „/proc/xmit1000‟ pseudo-file
•   Register our „write()‟ driver-method
         The „cleanup‟ sequence
• Usually the steps here follow those in the
  initialization sequence -- but in backwards
  order:
     •   Unregister the device-driver‟s file-operations
     •   Delete the „/proc/xmit1000‟ pseudo-file
     •   Disable the NIC‟s „transmit‟ engine
     •   Release the allocated kernel-memory
     •   Unmap the NIC‟s i/o-memory region
       Our „write()‟ algorithm
• Get index of the current TAIL descriptor
• Confine the amount of user-data
• Copy user-data into the packet-buffer
• Setup the packet‟s Ethernet Header
• Setup packet-length in the TAIL descriptor
• Now hand over this descriptor to the NIC
  (by advancing the value in register TDT)
• Tell the kernel how many bytes were sent
     Recall Tx-Descriptor Layout
31                                                                        0



                   Buffer-Address low (bits 31..0)                            0x0




                  Buffer-Address high (bits 63..32)                           0x4




      CMD              CSO              Packet Length (in bytes)              0x8



                                                      reserved
             special                     CSS             =0      status       0xC



Buffer-Address = the packet-buffer‟s 64-bit address in physical memory
Packet-Length = number of bytes in the data-packet to be transmitted
CMD = Command-field       CSO/CSS = Checksum Offset/Start (in bytes)
STA = Status-field
        Suggested C syntax
typedef struct {
           unsigned long long   base_addr;
           unsigned short       pkt_length;
           unsigned char        cksum_off;
           unsigned char        desc_cmd;
           unsigned char        desc_stat;
           unsigned char        cksum_org;
           unsigned short       special;
           } TX_DESCRIPTOR;
          Transmit IPG (0x0410)
                            IPG = Inter-Packet Gap


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

  R      IPG After Deferral               IPG
                                        IPG Part 1 = 8)        IPG Back-To-Back
  =0    (Recommended value = 7)      (Recommended value        (Recommended value = 8)




This register controls the Inter-Packet Gap timer for the Ethernet controller.

Note that the recommended TIPG register-value to achieve IEEE 802.3
compliant minimum transfer IPG values in full- and half-duplex operations
would be 00702008 (hexadecimal), equal to (7<<20) | (8<<10) | (8<<0).




                                                                                 82573L
         Transmit Control (0x0400)
    31    30        29        28        27        26        25        24       23       22       21       20       19       18       17       16


  R      R       R       MULR       TXCSCMT
                                                        UNO
                                                                 RTLC      R         SW                   COLD (upper 6-bits)
                                                        RTX                         XOFF                  (COLLISION DISTANCE)
  =0     =0      =0                                                        =0




               15        14        13        12        11        10        9        8        7        6        5        4        3        2        1    0

                                                                            CT
                                                                               I   S                                             P
               COLD (lower 4-bits)
                                                              0
                                                                               L
                                                                    ASDV SPEED modeL
                                                                  (COLLISION THRESHOLD)
                                                                                        TBI
                                                                                                                                 S    R0
                                                                                                                                               E
                                                                                                                                               0       R
               (COLLISION DISTANCE)                                            O                                                      =0       N       =0
                                                                               S   U                                             P



EN = Transmit Enable                                                  SWXOFF = Software XOFF Transmission
PSP = Pad Short Packets                                               RLTC = Retransmit on Late Collision
CT = Collision Threshold (=0xF)                                       UNORTX = Underrun No Re-Transmit
COLD = Collision Distance (=0x3F)                                     TXCSCMT = TxDescriptor Minimum Threshold
                                                                      MULR = Multiple Request Support

                                                                                                                                              82573L
              Our driver‟s elections
Here’s a C programming style that ‘documents’ the programmer’s choices.

        int      tx_control = 0;

        tx_control |= (0<<1);      // EN-bit (Enable Transmit Engine)
        tx_control |= (1<<3);      // PSP-bit (Pad Short Packets)
        tx_control |= (15<<4);     // CT=15 (Collision Threshold)
        tx_control |= (63<<12);    // COLD=63 (Collision Distance)
        tx_control |= (0<<22);     // SWXOFF-bit (Software XOFF Tx)
        tx_control |= (1<<24);     // RTLC-bit (Re-Transmit on Late Collision)
        tx_control |= (0<<25);     // UNORTX-bit (Underrun No Re-Transmit)
        tx_control |= (0<<26);     // TXCSMT=0 (Tx-descriptor Min Threshold)
        tx_control |= (0<<28);     // MULR-bit (Multiple Request Support)

        iowrite32( tx_control, io + E1000_TCTL );   // Transmit Control register

                                                                      82573L
       An „e1000.c‟ anomaly?
• The official Linux kernel is delivered with a
  device-driver supporting Intel‟s „Pro/1000‟
  gigabit ethernet controllers (several)
• Often this driver will get loaded by default
  during the system‟s startup procedures
• But it will interfere with your own driver if
  you try to write a substitute for „e1000.ko‟
• So you will want to remove it with „rmmod‟
       Side-effect of „rmmod‟
• We‟ve observed an unexpected side-effect
  of „unloading‟ the „e1000.ko‟ device-driver
• The PCI Configuration Space‟s command
  register gets modified in a way that keeps
  the NIC from working with your own driver
• Specifically, the Bus Mastering capability
  gets disabled (by clearing bit #2 in the PCI
  Configuration Space‟s word at address 4)
            What to do about it?
• This effect doesn‟t arise on our „anchor‟
  cluster machines, but you may encounter
  it when you try using our demo elsewhere
• Here‟s the simple “fix” to turn Bus Master
  capability back on (in your „module_init()‟)
      u16     pci_cmd;          // declares a 16-bit variable

      pci_read_config_word( devp, 4, &pci_cmd ); // read current word
      pci_cmd |= (1<<2);        // turn on the Bus Master enabled-bit
      pci_write_config_word( devp, 4, pci_cmd ); // write modification
                  In-class demo
• We demonstrate our „xmit1000.c‟ driver on
  an „anchor‟ machine, with some help from
  a companion-module (named „recv1000.c‟)
  which is soon-to-be discussed in class
         Transmitting…                    Receiving…


   $ echo Hello > /dev/nic
                                   $ cat /dev/nic
   $_
                                   Hello _



           anchor01                        anchor05
                             LAN
          In-class exercise
• Open three or more terminal-windows on
  your PC‟s graphical desktop, and login to
  a different „anchor‟ machine in each one
• Install the „xmit1000.ko‟ module on one of
  the anchor machines, and then install our
  „recv1000.ko‟ module on the other stations
• Execute the „cat /dev/nic‟ command on the
  receiver-stations, and then run an „echo‟
  command on the transmitter-station

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:3/23/2011
language:English
pages:23