Docstoc

Interrupts and Exceptions

Document Sample
Interrupts and Exceptions Powered By Docstoc
					Socket Layer
   COMS W6998
    Spring 2010

    Erich Nahum
Outline
   Sockets API Refresher
   Linux Sockets Architecture
   Interface between BSD sockets and
    AF_INET
   Interface between AF_INET and TCP/UDP
   Receive Path
   Send Path
BSD Socket API
   Originally developed by UC Berkeley at the
    dawn of time
   Used by 90% of network oriented programs
   Standard interface across operating systems
   Simple, well understood by programmers
User Space Socket API
   socket() / bind() / accept() / listen()
       Initialization, addressing and hand shaking
   select() / poll() / epoll()
       Waiting for events
   send() / recv()
       Stream oriented (e.g. TCP) Rx / Tx
   sendto() / recvfrom()
       Datagram oriented (e.g. UDP) Rx / TX
   close(), shutdown()
       Closing down an association
Standard Socket Sequence
  The ‘server’ application
         socket()

          bind()                                              The ‘client’ application
          listen()                                                    socket()

                                                                       bind()

                                   3-way handshake
         accept()                                                    connect()


          read()                       data f low to server            write()

          write()            data f low to client                      read()


          close()                  4-way handshake                     close()
Socket() System Call
   Creating a socket from user space is done by the
    socket() system call:
     int socket (int family, int type, int
       protocol);
     On success, a file descriptor for the new socket is
       returned.
     For open() system call (for files), we also get a file
       descriptor as the return value.
     “Everything is a file” Unix paradigm.
   The first parameter, family, is also sometimes referred
    to as “domain”.
Socket(): Family
   A family is a suite of protocols
   Each family is a subdirectory of linux/net
       E.g., linux/net/ipv4, linux/net/decnet, linux/net/packet
   IPv4: PF_INET
   IPv6: PF_INET6.
   Packet sockets: PF_PACKET
       Operate at the device driver layer.
       pcap library for Linux uses PF_PACKET sockets
       pcap library is in use by sniffers such as tcpdump.
   Protocol Family == Address Family
       PF_INET == AF_INET (in /include/linux/socket.h)
Address/Protocol Families
/* Supported address families. */
#define AF_UNSPEC       0
#define AF_UNIX         1       /*   Unix domain sockets          */
#define AF_LOCAL        1       /*   POSIX name for AF_UNIX       */
#define AF_INET         2       /*   Internet IP Protocol         */
#define AF_AX25         3       /*   Amateur Radio AX.25          */
#define AF_IPX          4       /*   Novell IPX                   */
#define AF_APPLETALK    5       /*   AppleTalk DDP                */
#define AF_NETROM       6       /*   Amateur Radio NET/ROM        */
#define AF_BRIDGE       7       /*   Multiprotocol bridge         */
#define AF_ATMPVC       8       /*   ATM PVCs                     */
#define AF_X25          9       /*   Reserved for X.25 project    */
#define AF_INET6        10      /*   IP version 6                 */
#define AF_ROSE         11      /*   Amateur Radio X.25 PLP       */
#define AF_DECnet       12      /*   Reserved for DECnet project */
#define AF_NETBEUI      13      /*   Reserved for 802.2LLC project*/
#define AF_SECURITY     14      /*   Security callback pseudo AF */
#define AF_KEY          15      /*   PF_KEY key management API */
..
#define AF_ISDN         34      /*   mISDN sockets               */
#define AF_PHONET       35      /*   Phonet sockets              */
#define AF_IEEE802154   36      /*   IEEE802154 sockets          */
#define AF_MAX          37      /*   For now.. */
                                                                 include/linux/socket.h
Socket(): Type
   SOCK_STREAM and SOCK_DGRAM are
    the mostly used types.
       SOCK_STREAM for TCP, SCTP
       SOCK_DGRAM for UDP.
       SOCK_RAW for RAW sockets.
       There are cases where protocol can be either
        SOCK_STREAM or SOCK_DGRAM; for
        example, Unix domain socket (AF_UNIX).
Socket(): Protocol
   Protocol is protocol number within a family.
   Internet protocols are assigned by IANA
       http://www.iana.org/assignments/protocol-numbers/
   For AF_INET, it’s usually 0.
       IPPROTO_IP is 0, see: include/linux/in.h.
   For SCTP:
       protocol is IPPROTO_SCTP (132)
    sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP);
   For UDP-Lite:
       protocol is IPPROTO_UDPLITE (136)
Socket Layer Architecture
                              Application                               User


                           BSD Socket Layer                            Socket
                                                                       Interface
      PF_INET               PF_PACKET         PF_UNIX        PF_IPX
 SOCK_     SOCK_
 STREAM    DGRAM   SOCK                                      ….
                                                 ….                    Protocol
                   _RAW      SOCK   SOCK_
TCP       UDP                _RAW   DGRAM                              Layers

          IPV4
                                                                        Kernel
                          Network Device Layer
                                                                        Device
   Ethernet         Token Ring       PPP      SLIP    FDDI              Layer



Intel E1000                                                           Hardware
Key Concepts
   Function pointer tables (“ops”)
       In-kernel interfaces for socket functions
         Binding between BSD sockets and AF_XXX families
         Binding between AF_INET and transports (TCP, UDP)

   Socket data structures
       struct socket (BSD socket)
       struct sock (protocol family socket, network state)
         struct packet_sock (PF_PACKET)

         struct inet_sock (PF_INET)
             struct udp_sock
             struct tcp_sock
Socket Data Structures
   For every socket which is created by a user space application,
    there is a corresponding struct socket and struct sock in the
    kernel.
   These are confusing.
   struct socket: include/linux/net.h
     Data common to the BSD socket layer
     Has only 8 members
     Any variable “sock” always refers to a struct socket
   struct sock : include/net/sock/h
     Data common to the Network Protocol layer (i.e., AF_INET)
     has more than 30 members, and is one of the biggest structures
       in the networking stack.
     Any variable “sk” always refers to a struct sock.
struct socket
struct socket {
  socket_state             state; // SS_CONNECTING etc.
  short                    type;   // SOCK_STREAM etc.
  unsigned long            flags;
  struct fasync_struct     *fasync_list;
  wait_queue_head_t        wait;   // tasks waiting
  struct file              *file; // back ptr to inode
  struct sock              *sk;    // AF specific state
  const struct proto_ops   *ops;   // AF specific operations
};




                                                    include/linux/net.h
Socket State
typedef enum {
        SS_FREE = 0,       /*   not allocated                 */
        SS_UNCONNECTED,    /*   unconnected to an socket      */
        SS_CONNECTING,     /*   in process of connecting      */
        SS_CONNECTED,      /*   connected to socket           */
        SS_DISCONNECTING   /*   in process of disconnecting   */
} socket_state;


   These states are not layer 4 states (like TCP_ESTABLISHED or
    TCP_CLOSE).




                                                              include/linux/net.h
Socket Types
enum sock_type {
        SOCK_STREAM      =   1,
        SOCK_DGRAM       =   2,
        SOCK_RAW         =   3,
        SOCK_RDM         =   4,
        SOCK_SEQPACKET   =   5,
        SOCK_DCCP        =   6,
        SOCK_PACKET      =   10,
};

                                   include/linux/net.h
Comment in include/net/sock.h
/*
 * This structure really needs to be cleaned up.
 * Most of it is for TCP, and not used by any of
 * the other protocols.
 */
struct sock_common
/* minimal network layer representation of sockets */

struct sock_common {
        /*
         * first fields are not copied in sock_copy()
         */
        union {
                struct hlist_node       skc_node;       // main hash linkage for lookup
                struct hlist_nulls_node skc_nulls_node; // main hash for TCP/UDP
        };
        atomic_t                skc_refcnt;
        int                     skc_tx_queue_mapping; // tx queue for this connection
        union {
                unsigned int    skc_hash;           // hash value for lookup
                __u16           skc_u16hashes[2];
        };
        unsigned short          skc_family;         // network address family
        volatile unsigned char skc_state;           // Connection state
        unsigned char           skc_reuse;          // SO_REUSEADDR setting
        int                     skc_bound_dev_if;   // bound if !=0
        union {
                struct hlist_node       skc_bind_node;     // bind hash linkage
                struct hlist_nulls_node skc_portaddr_node; // bind hash for UDP/Lite
        };
        struct proto            *skc_prot; // protocol handlers in a net family
};

                                                                            include/net/sock.h
Outline
   Sockets API Refresher
   Linux Sockets Architecture
   Interface between BSD sockets and
    AF_INET
   Interface between AF_INET and TCP/UDP
   Receive Path
   Send Path
BSD Socket  AF Interface
   Main data structures
       struct net_proto_family
       struct proto_ops
   Key function
    sock_register(struct net_proto_family *ops)
   Each address family:
       Implements the struct net _proto_family.
       Calls the function sock_register( ) when the protocol
        family is initialized.
       Implement the struct proto_ops for binding the BSD
        socket layer and protocol family layer.
                                                  BSD Socket Layer



net_proto_family                                   AF Socket Layer


   Describes each of the supported protocol families
    struct net_proto_family {
      int family;
      int (*create)(struct net *net, struct socket
        *sock, int protocol, int kern);
      struct module *owner;
    }
   Specifies the handler for socket creation
       create() function is called whenever a new socket of this type is
        created
                                  BSD Socket Layer



INET and PACKET proto_family
                    AF Socket Layer


  static const struct net_proto_family
     inet_family_ops = {
     .family = PF_INET,
     .create = inet_create,
     .owner = THIS_MODULE,           /* af_inet.c */
  };
  static const struct net_proto_family
     packet_family_ops = {
     .family = PF_PACKET,
     .create = packet_create,
     .owner = THIS_MODULE,           /* af_packet.c
     */
  };
                                BSD Socket Layer



proto_ops                        AF Socket Layer



   Defines the binding between the BSD
    socket layer and address family (AF_*)
    layer.
   The proto_ops tables contain function
    exported by the AF socket layer to the BSD
    socket layer
   It consists of the address family type and a
    set of pointers to socket operation routines
    specific to a particular address family.
                                                BSD Socket Layer



struct proto_ops                                AF Socket Layer
struct proto_ops {
        int             family;
        struct module   *owner;
        int             (*release);
        int             (*bind);
        int             (*connect);
        int             (*socketpair);
        int             (*accept);
        int             (*getname);
        unsigned int    (*poll);
        int             (*ioctl);
        int             (*compat_ioctl);
        int             (*listen);
        int             (*shutdown);
        int             (*setsockopt);
        int             (*getsockopt);
        int             (*compat_setsockopt);
        int             (*compat_getsockopt);
        int             (*sendmsg);
        int             (*recvmsg);
        int             (*mmap);
        ssize_t         (*sendpage);
        ssize_t         (*splice_read);
};
                                                              include/linux/net.h
                                                 BSD Socket Layer



PF_PACKET proto_opsAF Socket Layer
static const struct   proto_ops packet_ops = {
        .family =         PF_PACKET,
        .owner =          THIS_MODULE,
        .release =        packet_release,
        .bind =           packet_bind,
        .connect =        sock_no_connect,
        .socketpair   =   sock_no_socketpair,
        .accept =         sock_no_accept,
        .getname =        packet_getname,
        .poll =           packet_poll,
        .ioctl =          packet_ioctl,
        .listen =         sock_no_listen,
        .shutdown =       sock_no_shutdown,
        .setsockopt   =   packet_setsockopt,
        .getsockopt   =   packet_getsockopt,
        .sendmsg =        packet_sendmsg,
        .recvmsg =        packet_recvmsg,
        .mmap =           packet_mmap,
        .sendpage =       sock_no_sendpage,
};




                                                       net/packet/af_packet.c
                                                                 BSD Socket Layer



 PF_INET proto_ops                                               AF Socket Layer
               inet_stream_ops (TCP)    inet_dgram_ops (UDP)         inet_sockraw_ops (RAW)
.family        PF_INET                  PF_INET                      PF_INET
.owner         THIS_MODULE              THIS_MODULE                  THIS_MODULE
.release       inet_release             inet_release                 inet_release
.bind          inet_bind                inet_bind                    inet_bind
.connect       inet_stream_connect      inet_dgram_connect           inet_dgram_connect
.socketpair    sock_no_socketpair       sock_no_socketpair           sock_no_socketpair
.accept        inet_accept              sock_no_accept               sock_no_accept
.getname       inet_getname             inet_getname                 inet_getname
.poll          tcp_poll                 udp_poll                     datagram_poll
.ioctl         inet_ioctl               inet_ioctl                   inet_ioctl
.listen        inet_listen              sock_no_listen               sock_no_listen
.shutdown      inet_shutdown            inet_shutdown                inet_shutdown
.setsockopt    sock_common_setsockopt   sock_common_setsockopt       sock_common_setsockopt
.getsockopt    sock_common_getsockop    sock_common_getsockop        sock_common_getsockop
.sendmsg       tcp_sendmsg              inet_sendmsg                 inet_sendmsg
.recvmsg       sock_common_recvmsg      sock_common_recvmsg          sock_common_recvmsg
.mmap          sock_no_mmap             sock_no_mmap                 sock_no_mmap
.sendpage      tcp_sendpage             inet_sendpage                inet_sendpage
.splice_read   tcp_splice_read          --                           --
                                                                                          net/ipv4/af_inet.c
Outline
   Sockets API Refresher
   Linux Sockets Architecture
   Interface between BSD sockets and
    AF_INET
   Interface between AF_INET and TCP/UDP
       Binding between IP and TCP/UDP (upcall)
       Binding between AF_INET and TCP (downcall)
   Receive Path
   Send Path
                                             AF_INET Layer



AF_INET  Transport APILayer
                   Transport


   struct inet_protos
     Interface between IP and the transport layer
     Is the upcall binding from IP to transport
     Method for demultiplexing IP packets to proper transport
   struct proto
     Defines interface for individual protocols (TCP, UDP, etc)
     Is the downcall binding for AF_INET to transport
     Transport-specific functions for socket API
   struct inet_protosw
     Describes the PF_INET protocols
     Defines the different SOCK types for PF_INET
     SOCK_STREAM (TCP), SOCK_DGRAM (UDP), SOCK_RAW
                                                                   BSD Socket Layer



  Recall IP’s inet_protos AF Socket Layer
                                            net_protocol
                                                                            Receive binding
inet_protos[MAX_INET_PROTOS]
                                  0
                                               handler
                                                            udp_rcv()        from the IP layer to
                                             err_handler
                                                            udp_err()        the transport layer.
                                           gso_send_check                   init_inet( ) calls
                                            gso_segment                      inet_add_protocol
                                             gro_receive
                                                                             (p) to add each
                                            gro_complete
                                                                             protocol to the hash
                                                                             queues.
                                            net_protocol    igmp_rcv()
                                  1
                                               handler      Null
                                             err_handler
                                           gso_send_check
                                            gso_segment
                                             gro_receive
                                            gro_complete




                               MAX_INET_    net_protocol
                               PROTOS
                                                  BSD Socket Layer



struct proto                                      AF Socket Layer


/* Networking protocol blocks we attach to sockets.
 * socket layer -> transport layer interface
 */
struct proto {
    void                   (*close);
    int                    (*connect);
    int                    (*disconnect);
    struct sock *          (*accept);
    int                    (*ioctl);
    int                    (*init);
    void                   (*destroy);
    void                   (*shutdown);
    int                    (*setsockopt);
    int                    (*getsockopt);
    int                    (*sendmsg);
    int                    (*recvmsg);
    int                    (*sendpage);
    int                    (*bind);
    int                    (*backlog_rcv);
    void                   (*hash);
    void                   (*unhash);
    int                    (*get_port);
}
                                                                include/linux/net.h
                                                            BSD Socket Layer



udp_prot                                                    AF Socket Layer
struct proto udp_prot = {
        .name                =   "UDP",
        .owner               =   THIS_MODULE,
        .close               =   udp_lib_close,
        .connect             =   ip4_datagram_connect,
        .disconnect          =   udp_disconnect,
        .ioctl               =   udp_ioctl,
        .destroy             =   udp_destroy_sock,
        .setsockopt          =   udp_setsockopt,
        .getsockopt          =   udp_getsockopt,
        .sendmsg             =   udp_sendmsg,
        .recvmsg             =   udp_recvmsg,
        .sendpage            =   udp_sendpage,
        .backlog_rcv         =   __udp_queue_rcv_skb,
        .hash                =   udp_lib_hash,
        .unhash              =   udp_lib_unhash,
        .get_port            =   udp_v4_get_port,
        .memory_allocated    =   &udp_memory_allocated,
        .sysctl_mem          =   sysctl_udp_mem,
        .sysctl_wmem         =   &sysctl_udp_wmem_min,
        .sysctl_rmem         =   &sysctl_udp_rmem_min,
        .obj_size            =   sizeof(struct udp_sock),
        .slab_flags          =   SLAB_DESTROY_BY_RCU,
        .h.udp_table         =   &udp_table,
#ifdef CONFIG_COMPAT
        .compat_setsockopt   = compat_udp_setsockopt,
        .compat_getsockopt   = compat_udp_getsockopt,
#endif
};                                                                             net/ipv4/af_inet.c
                                                           BSD Socket Layer



inet_protosw                                                AF Socket Layer
static struct inet_protosw inetsw_array[] =
{
         {                                                On startup (inet_init()),
                .type =       SOCK_STREAM,
                .protocol =   IPPROTO_TCP,                 TCP, UDP, and Raw
                .prot =       &tcp_prot,
                .ops =        &inet_stream_ops,            socket protocols are
                .no_check =   0,                           inserted into the
                .flags =      INET_PROTOSW_PERMANENT |
                              INET_PROTOSW_ICSK,           inetsw_array[].
         },
         {                                                Other protocols call
                .type =       SOCK_DGRAM,
                .protocol =   IPPROTO_UDP,
                                                           inet_register_protosw()
                .prot =       &udp_prot,
                .ops =        &inet_dgram_ops,            inet_unregister_protosw()
                .no_check =   UDP_CSUM_DEFAULT,            will not remove protocols
                .flags =      INET_PROTOSW_PERMANENT,
       },                                                  with PERMANENT set.
       {
               .type =       SOCK_RAW,
               .protocol =   IPPROTO_IP, /* wild card */
               .prot =       &raw_prot,
               .ops =        &inet_sockraw_ops,
               .no_check =   UDP_CSUM_DEFAULT,
               .flags =      INET_PROTOSW_REUSE,
       }
};
                                                                              net/ipv4/af_inet.c
Relationships
                    struct sock       struct sock_common
                    sk_common               skc_node
struct socket        sk_lock              skc_refcnt
    state           sk_backlog              skc_hash
                        ...                    ...
    type
                (*sk_prot_creator)         skc_proto
    flags
                    sk_socket                skc_net
fasync_list
                   sk_send_head
    wait                ...
    file                                 struct proto
     sk
                                       udp_lib_close
 proto_ops
                                     ipv4_dgram_connect
                                        udp_sendmsg
                                        udp_recvmsg
                struct proto_ops             ...
                    PF_INET
                   af_inet.c
                 inet_release
                   inet_bind
                  inet_accept
                       ...
Example: inet_accept()
int inet_accept(struct socket *sock, struct socket *newsock, int flags)
{
        struct sock *sk1 = sock->sk;
        int err = -EINVAL;
        struct sock *sk2 = sk1->sk_prot->accept(sk1, flags, &err);

          if (!sk2)
                  goto do_err;

          lock_sock(sk2);

          WARN_ON(!((1 << sk2->sk_state) &
                    (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT | TCPF_CLOSE)));

          sock_graft(sk2, newsock);

          newsock->state = SS_CONNECTED;
          err = 0;
          release_sock(sk2);
do_err:
          return err;
}
Backup

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:11/9/2011
language:English
pages:35