N/w Device Drivers
• Any n/w transaction is made through an interface/device
• They are not accessed as files but by names eg eth0
• Interface can be hardware eg eth0 or pure software eg l0
• N/w interface is in charge of sending and receiving packets
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Configuring interfaces
The “ifconfig” program configures interface devices for use
>ifconfig $(DEVICE) $(IPADDR) netmask $(NMASK) broadcast $(BCAST)
>ifconfig $(DEVICE) up
>ifconfig $(DEVICE) down
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Configuring Routes
The “route” program is used to configure routes for interfaces
to the forwarding information base ( FIB)
>route add -net $(NETWORK) netmask $(NMASK) dev $(DEVICE)
>route add -host $(IPADDR) $(DEVICE)
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Module Loading/Unloading
A n/w interface registers itself with the kernel through
a data structure ie struct net_device
register_netdev(struct net_device *netdev)
Invokes the init fn of the device and adds the structure to
a global list of network devices ( pointed by dev_base )
unregister_netdev(struct net_device *netdev)
does the necessary cleanup and restores the global list back.
Chenna Reddy & Shaik Yunus -
Wipro Technologies
net_device structure
Visible and non visible parts.
Visible parts -
name of the device : name[IFNAMSIZ]
I/O specific fields : mem_end,mem_start,base_addr,irq
if_port,dma
*next : Pointer to the next device in global list
int (*init) : Initialization routine for the driver.
Non visible parts -
Mtu- Maximum Transfer Unit
tx_queue_len : Max no frames that can be queued on the device
Hard_header_len : hardware header length
dev_addr : Hardware (MAC) address of the deivce
addr_len : Len of h/w address
void *priv : Private data
Chenna Reddy & Shaik Yunus -
Wipro Technologies
net_device structure
dev_base Kernel Space (having the global device list)
Struct Net_Device Struct Net_Device
Name : Lo Name : Eth0
Qdisc : & Qdisc : &
Open : & Open : &
Close : & Close : &
Init : & Init : &
hard_start_xmit :& hard_start_xmit :&
Next : Next :
…….. ……..
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Device methods
Fundamental methods:
int (*open)() - opens the interface(ifconfig dev up)
int (*stop)() - stops the interface (ifconfig dev down)
int (*hard_start_xmit)(struct sk_buff *,) -initiates transmission
int (*hard_header)() - builds the hardware header
struct net_device_stats *(*get_stats)() - gets statistics of interface
int (*set_config)() - to change interface coniguration
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Device methods ...
Optional methods:
int (*do_ioctl) () - perform interface specific ioctl commands
int (*change_mtu)() - change mtu for the interface
int (*set_mac_address)() - to change interface h/w address
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Socket Buffers
N/w drivers deal with data buffers which is of struct sk_buff
struct net_device *dev,*rx_dev - device sending and receiving the buffer
union {} h - Transport layer header
union {} nh - Network layer header
union {} mac - Mac layer header
unsigned char *head,*data,*tail,*end - pointers to address data
unsigned long len - Length of data buffer
struct sock *sk - socket owned by
stuct timeval stamp - time Packet arrived at
struct sk_buff *next,*prev - pointers to next and previous packets
char cb[48] - control buffer
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Socket buffers ...
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Sk_buff methods
struct sk_buff *alloc_skb(int len,int prio),*dev_alloc_skb(int len)
- allocate a buffer, prio = GFP_ATOMIC or GFP_KERNEL
Void kfree_skb(),dev_kfree_skb() - free the buffer
unsigned char *skb_put(struct sk_buff *,int len) - add data to end of buffer
unsigned char *skb_push(struct sk_buff *,int len) - add data to start of packet
int skb_tailroom(struct sk_buff *skb) - returns availabe space
int sk_headroom(strct sk_buff *skb) - returns available space in front of data
unsigned char *skb_pull(struct sk_buff *,int len) - returns data from the packet
Length of a “single” skb ie skb->len = skb->tail - skb->data
Size of an skb = skb->end - skb->head
headroom = skb->head - skb->data
Chenna Reddy & Shaik Yunus -
Wipro Technologies
ioctls
Are used to copy data from user space to kernel space
and vice versa.Approriate routines can be invoked from the ioctl fn.
int (*do_ioctl)(struct net_device *,struct ifreq *ifr,int cmd)
Each interface can define its own ioctl commands
ioctl implementation for sockets has 16 ioctls as private to interface ,
SIOCDEVPRIVATE to SIOCDEVPRIVATE+15
more ioctls can be virtually implemented by the ifr structure’s ifu_data field
as roughly shown
struct ifreq{
char *name;
char *data;
}
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Packet transmission
• int (*hard_start_xmit)(struct sk_buff *,struct net_device *)
Initiates the transmission of packet.
• Gets invoked by the dev_queue_xmit function which is
invoked by the higher layers
• should detect timeouts
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Packet Reception
• Occurs through an interrupt handler.
(unless interface is pure s/w)
• Allocate a sk_buff buffer
• assign dev and protocol fields and pass it to link layer( netif_rx fn)
• Link layer puts on backlog queue and marks it for next
Bottom half run
• The n/w bottom half passes pkt to protocol receive fn ( IP layer)
• The IP layer either forwards it or sends up to transport layer.
Chenna Reddy & Shaik Yunus -
Wipro Technologies
How to trap a packet ?
• Write a n/w module and register it
• store the eth0 net_device structure in a temp pointer
• Replace the eth0 net_device with your net_device structure
• In your device ,hard_start_xmit fn invoke the eth0 hard
_start_xmit function.
• Replace the routing table entries eth0 with your device
!!! Packets now go out through your interface , u can do wonders
with the packet before calling the actual hard_start_xmit fn !!!
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Linux QOS support
TCP,UDP
Traffic Control
Packet Out
Packet In
Input Demultiplexing Forwarding Output Queue
Chenna Reddy & Shaik Yunus -
Wipro Technologies
QOS Support ...
• Traffic Control consists of queuing disciplines,classes and
Filters which controls the packets that are sent out.
• Queuing disciplines is at the heart of linux traffic control
• Each device has its own queuing discipline .
• Before transmitting any packet out of the interface,Queuing
Disciplines come into action.
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Queuing Disciplines
Specifies the way the packets are going to be
Queued at the o/p interface.
(*enqueue) (struct sk_buff *,struct net_device *)
(*dequeue) (struct sk_buff *,struct net_device *)
Before calling the device hard_start_xmit , packet is first
Queued using enqueue .
dequeue determines the next packet to be transmitted
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Queuing Disciplines …
Types of Queuing Discliplines:
CBQ - Class Based Queuing
TBF - Token Bucket Filter
SFQ - Stochastic Fair Queuing
Priority - Priority based dequeing
FIFO- First in First Out
RED - Random Early Detection
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Classes , filters
• Filters are used to classify packets based on packet properties
• When enqueue fn is invoked, the filters are applied
to identify the (best) class to which the packets belong.
• The enqueue fn of the Qdisc owned by that class is then invoked
• Classes and queues are tied together.
• Each class has a queue which is FIFO by default
• Classes not supported for all Qdisc eg TBF
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Types of Filters
Different filter types are supported -
Route - Based on the decission on which route packet will be routed.
Fw - based on the decission on how the firewall marks the packet
u32 - Based on the decission on fields within the packet
rsvp - Bases the decission on the target
Chenna Reddy & Shaik Yunus -
Wipro Technologies
TC
Linux Traffic Controller
-Alexey Kuznetsov
User level program to create qdisc, classes and filters.
It uses netlink sockets to transfer data to and fro the
kernel space
tc qdisc [add/del/get/change] dev $(DEVICE) handle $(ID) QKIND
tc class [add/del/get/change] dev $(DEVICE) classid $(ID) QKIND
tc filter [add/del/get/change] dev $(DEVICE) prio PRIO proto PROTO
classid $(ID) FTYPE
Chenna Reddy & Shaik Yunus -
Wipro Technologies
CBQ
Roo
t
A B
Audio (Prio
Audio (Prio Video(Prio 4) Video Prio 4)
7)
7)
10% 40% 20%
30%
Chenna Reddy & Shaik Yunus -
Wipro Technologies
CBQ
CBQ is a scheduling mechanism to
• provide link sharing between agencies that share the same physical link
• provide a framework to differentiate traffic that has different priorities
Main components are
Classifier: extract flow information, and put packet in corresponding class
General Scheduler: aims to share the bandwidth among classes
Link sharing Scheduler: aims to share bandwidth during congestion,
distributes the excess bandwidth approriately.
Estimator : Estimates whether each class is underlimit/overlimit
Chenna Reddy & Shaik Yunus -
Wipro Technologies
CBQ
Link Sharing Scheduler
Estimator
Input link Output
link
Classifier General Scheduler
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Qdisc_ops
Per qdisc
interface Class “Qdisc_class_ops”
Dev_base
Qdisc_ops
qdisc
Class “Qdisc_class_ops”
Net_device
Name :Eth0 Filter Chain
Qdisc : qdisc
Next :
….. Packet
Storage
Chenna Reddy & Shaik Yunus -
Wipro Technologies
Filter Storage
PG_FilterStore Front
Priority =1, Noof Filters Next Next
Next Back
Front
Priority =2, Noof Filters Filter Elements storing
Back DestIp,SrcIP,SrcMask,DestMask,
Next SrcPortMin,SrcPortMax,DstPortMin,
Front DstPortMax,Protocol,Tos,Dscp etc.
Priority =3, Noof Filters
Back
Filter Priority Queue Node
Filter Element Node
Chenna Reddy & Shaik Yunus -
Wipro Technologies
References
Linux Device Drivers – Rubini & Corbet
Linux IP networking – Glenn Herrin,May 31 2000
Linux Advanced Networking Overview – S.RadhaKrishnan
Link Sharing for Packet n/w :Sally Floyd IEEE.Aug 1995
Chenna Reddy & Shaik Yunus -
Wipro Technologies
THANK YOU
Chenna Reddy & Shaik Yunus -
Wipro Technologies