NOL - Sockets

Document Sample
NOL - Sockets Powered By Docstoc
					     Department of Computer Applications

                Lecture notes

         Network Programming Lab
                (IV Semester)
                Prepared By
             D Sophia Solomon

Rajalakshmi Nagar, Thandalam, Chennai -
                      UNIT – I

Process termination
command line arguments
Environment of a UNIX process
Process Times
Process relationships terminal logins
Message Queues
Shared Memory
Creation of a process
     • A unique pid is assigned to the new process.
     • Space is allocated for all the elements of the process image.
     • The process control block is initialized. Inherit info from parent.
     • The appropriate linkages are set: for scheduling, state queues..
     • Create and initialize other data structures (file tables, IO table etc.).
     • Process Interruption
     • Two kinds of process interruptions: interrupt and trap.
     • Interrupt: Caused by some event external to and asynchronous to the currently
          running process, such as completion of IO.
     • Trap : Error or exception condition generated within the currently running
          process. Ex: illegal access to a file, arithmetic exception.
     • (supervisor call) : explicit interruption.
Unix system V
     • All user processes in the system have as root ancestor a process called init.
          When a new interactive user logs onto the system, init creates a user process,
          subsequently this user process can create child processes and so on. init is
          created at the boot-time.
     • Process states : User running , kernel running, Ready in memory, sleeping in
          memory (blocked), Ready swapped (ready-suspended), sleeping swapped
          (blocked-suspended), created (new), zombie , preempted (used in real-time

Process control
     • Process creation in unix is by means of the system call fork().
     • OS in response to a fork() call:
      • Allocate slot in the process table for new process.
      • Assigns unique pid.
      • Makes a copy of the process image, except for the shared memory.
      • Move child process to Ready queue.
      • it returns pid of the child to the parent, and a zero value to the child.
     • All the above are done in the kernel mode in the process context. When the
          kernel completes these it does one of the following as a part of the dispatcher:
            – Stay in the parent process. Control returns to the user mode at the point of
              the fork call of the parent.
            – Transfer control to the child process. The child process begins executing
              at the same point in the code as the parent, at the return from the fork call.
            – Transfer control another process leaving both parent and child in the
              Ready state.
UNIX Process Creation

     •   Every process, except process 0, is created by the fork() system call
         – fork() allocates entry in process table and assigns a unique PID to the child
         – child gets a copy of process image of parent: both child and parent are
           executing the same code following fork()
         – but fork() returns the PID of the child to the parent process and returns 0 to
           the child process

fork and exec

     •   Child process may choose to execute some other program than the parent by
         using exec call.
     •   Exec overlays a new program on the existing process.
     •   Child will not return to the old program unless exec fails. This is an important
         point to remember.
     •   Why does fork need to clone?
     •   Why do we need to separate fork and exec?
     •   Why can’t we have a single call that fork a new program?

Race conditions
     • race = shared data + outcome depends on order that processes run
     • e.g., parent or child runs first?
     • waiting for parent to terminate
     • generally, need some signaling mechanism
     • signals
     • stream pipes

     •   process: address space + single thread of control
     •   sometimes want multiple threads of control (flow) in same address space
     •   quasi-parallel
     •   threads separate resource grouping & execution
     •   thread: program counter, registers, stack
     •   also called lightweight processes
     •   multithreading: avoid blocking when waiting for resources
     •   multiple services running in parallel
     •   state: running, blocked, ready, terminated
     •       Pthreads
         •    POSIX standard for threads.
         •    POSIX standard (IEEE 1003.1c) : The standard defines an API for creating
              and manipulating threads.
         •    Pthreads defines a set of C programming language types and procedure calls.
         •    It is implemented with a pthread.h header and a thread library

     •     Data types
         • pthread_t: handle to a thread
         • pthread_attr_t: thread attributes
System V IPC

             • The System V IPC consists of three types of IPC :
                – Message Queues,
                – Semaphores and
                – shared memory
             • Similarities between the three IPC
             • Identifiers and keys
             • Permission structure
             • Configuration Limits

Identifiers and keys

             • Each of the three structures is referred to the kernel by a nonnegative
             • Whenever an IPC structure is being created, key must be specified.
             • The data type of this key is key_t(defined in sys/types.h)
             • This key is converted to the identifier by the kernel.

Permission Structure.

             • Every IPC structure has an associates ipc_perm structure.This structure is
                used to define the permission and the owners.
             struct ipc_perm {
             uid uid;         /* owner’s effective user id */
             gid_t gid ;     /* owner’s effective group id */
             uid_t cuid;     /* creator’s effective user id */
             gid_t cgid;     /* creator’s effective group id */
           mode_t mode; /* access mode */
           ulong seq;   /* slot usage sequence number */
           key_t key;   /* key */


           • IPC structures are system wide and do not have a reference count.
           • IPC structure are known by names in the file system. We can’t access and
           modify their properties
         • It is hard to use more than one IPC structure at a time, because we can’t use
           multiplexed I/O functions.
Message Queues

           • These are linked list of messages stored within the kernel and identified by
               a message queue identifier called queue ID.
           •   msgget : is used to create a new queue or open an existing queue.
           •   msgsnd : used to add messages to the end of the queue
           •   msgrcv : messages are fetched from a queue
           •   every queue has a msqid_ds structure associated with it.
           •   This structure is used to define the current status of the queue.
           •   strcut msqid_ds {
                – struct ipc_perm msg_perm ;
                – struct msg *msg_first;
                – struct msg *msg_last;
                – olong msg_cbytes;
                – ulong msg_qbytes;
                – pid_t msg_lspid;
                – pid_t msg_lrpid;
                – time_t msg_stime


           • Semaphores are a programming construct designed by E. W. Dijkstra in the
             late 1960s
           • Semaphores let processes query or alter status information. They are often
             used to monitor and control the availability of system resources such as
             shared memory segments.

Initializing a Semaphore Set
         •   The function semget() initializes or gains access to a semaphore.
         •    It is prototyped by:
         •   int semget(key_t key, int nsems, int semflg);
         •   The cmd argument is one of the following control flags:
             – GETVAL -- Return the value of a single semaphore.
             – SETVAL -- Set the value of a single semaphore. In this case, arg is taken
                as arg.val, an int.
             – GETPID -- Return the PID of the process that performed the last
                operation on the semaphore or array.
             – GETNCNT -- Return the number of processes waiting for the value of a
                semaphore to increase.
             – GETZCNT -- Return the number of processes waiting for the value of a
                particular semaphore to reach zero.
             – GETALL -- Return the values for all semaphores in a set. In this
                case, arg is taken as arg.array, a pointer to an array of unsigned
                shorts (see below).
             – SETALL -- Set values for all semaphores in a set. In this case, arg is
                taken as arg.array, a pointer to an array of unsigned shorts.
             – IPC_STAT -- Return the status information from the control
                structure for the semaphore set and place it in the data structure
                pointed to by arg.buf, a pointer to a buffer of type semid_ds.
             – IPC_SET -- Set the effective user and group identification and
                permissions. In this case, arg is taken as arg.buf.
             – IPC_RMID -- Remove the specified semaphore set.

Semaphore Operations

         • semop() performs operations on a semaphore set. It is prototyped by:
         • int semop(int semid, struct sembuf *sops, size_t nsops); The semid
             argument is the semaphore ID returned by a previous semget() call. The
             sops argument is a pointer to an array of structures, each containing the
             following information about a semaphore operation:
                 • The semaphore number
                 • The operation to be performed
                 • Control flags, if any.

         • The sembuf structure specifies a semaphore operation, as defined in
             struct sembuf


                 ushort_t sem_num;          /* semaphore number */
                 short   sem_op;         /* semaphore operation */

                 short sem_flg;          /* operation flags */};


         • There are two control flags that can be used with semop():
           – IPC_NOWAIT Can be set for any operations in the array. Makes the
             function return without changing any semaphore value if any operation
             for which IPC_NOWAIT is set cannot be performed. The function fails if
             it tries to decrement a semaphore more than its current value, or tests a
             nonzero semaphore to be equal to zero.

           – SEM_UNDO Allows individual operations in the array to be undone
             when the process exits.

Mutual Exclusion and Synchronization

         • Mutual exclusion
           • Guarantee that no two processes (or other types of agents) should access
             a critical region at the same time
         • Synchronization
           • Guarantee that one step should occur before/after another
         • Use lock, semaphore, or monitor to achieve mutual execution and/or

Critical Region and Locking
How to Implement the Lock?

           • How to use lock for mutual exclusion?
                lock (lck);
                < critical section >
                unlock (lck);
           • How to use lock for synchronization?
              – E.g., Guarantee that S12 executes before S22
                             P1: S11, S12           P2: S21, S22
                  lock (lck);
                  P1: S11, S12, unlock (lck);
                  P2: S21, lock (lck); S22

Bounded Buffer Problem

         •   Analysis
              – full: indicates how far from full
                  •   full.count = 0  n items in buffer  producer wait
                  •    producer decreases full.count

                                consumer increases full.count
             –   empty: indicates how far from empty
                  •   empty.count = 0  0 item in buffer  consumer wait
                  •    producer increases empty.count

                                consumer decreases full.count
             –   Both have to wait on mutex before accessing buffer
             –   If only one producer and one consumer, do we still have to use mutex?
                  •   No, each uses a different buffer slot
                  •   But we still need to use full and empty semaphores
Reader-Writer Problem

        • Many reader can read at the same time without causing problem
        • When there is a writer, has to write exclusively
        • First reader/writer problem (reader has priority)
           – If  1 reader in CS  allow more readers to get in
           – If a writer is in CS, no one else can enter
           – This way readers has a higher priority
           – Have starvation problem for writers
        • There are other forms of reader/writer problem
        • Protect shared objects within an abstraction
           – Provide encapsulation
           – Accesses to a shared object is confined within a monitor
           – Easier to debug the code
        • Provide mutual exclusive accesses
           – No two process can be active at the same time within a monitor

Monitor and Semaphore
• Monitor provides encapsulation
  – What should be encapsulated?
  – Too much  reduce concurrency
     • Some part of the code that can be executed
         concurrently, if encapsulated in the monitor, can cause
         reduced concurrency
       • If not encapsulate them, then lose the meaning of
       • Reader/writer example
•   If monitor is used to do what a semaphore would do
    – Monitor is more expensive
                  UNIT – II

Overview of TCP/IP
Socket address Structures
Byte ordering functions
Byte Manipulation Functions
Address conversion functions
Address conversion functions
Elementary TCP Sockets program
Iterative Server
Concurrent Server
                                      Transport Layer
•   OSI Layer Protocols
•   Some protocols used
     – User datagram protocol
     – Transmission Control Protocol
•   TCP Connection establishment and Termination
•   TCP State Transition
•   Port Number
•   Protocols are the standards that specify how data is represented when being transferred
    from one machine to another.
•   Protocols specify how the transfer occurs, how errors are detected, and how
    acknowledgements are passed.
•   To simplify protocol design and implementation, communication stacks are segregated
    into layers that can be solved independently.
•   Each layer is assigned a separate protocol.

                                     Network Layer
•   The Network Layer is responsible for establishing paths for data transfer through the
     – IP
     – IGMP
     – ICMP
     – ARP
     – RARP

• IP :
   – IP is a network layer protocol in the Internet protocol suite
   – Addressing.
   – Packet timeouts
   – Options : trace the route a packet takes (record route), label packets with security
• IPV4 :
   – Uses 32 bit addresses

• IPV6
   – Uses 128 bit addresses 3ffe:ffff:101::230:6eff:fe04:d9ff.

   – Internet Control Message Protocol
   – Handles error and control information between routers and hosts
   – Internet Group Management Protocol
   – Used with multicasting

   – the Address Resolution Protocol (ARP)
   – standard method for finding a host's hardware address when only its network layer
      address is known.

   – Reverse Address Resolution Protocol (RARP)
   – used to obtain an IP address for a given hardware address (such as an Ethernet

                                      Transport Layer
•   Process-Level Addressing:
•   Segmentation, Packaging and Reassembly:
•   Multiplexing and Demultiplexing:
•   Connection Establishment, Management and Termination
•   Acknowledgments and Retransmissions
•   Flow Control:

                                  User Datagram Protocol
• UDP is not reliable
   – In case of checksum error or datagram dropped in the network
   – It is not retransmitted
• UDP Datagram has length
   – The length of the data is passed along with the data so it has record boundaries
     unlike the TCP which does not have boundaries
• Provides a connectionless service
   – No long term relationship between the client and the server
                               Transmission Control Protocol
                          Feature of Transmission Control Protocol
• Connections :
   – Provides connection between clients and servers
• Reliability
   – Acknowledge required when data is sent over the network
   – If acknowledgement not received , TCP automatically retransmits the data
   – UDP not reliable
   – TCP contains algorithm to estimate the Round Trip Time between the client and

• Sequences Data :
   – TCP sequences the data by associating a sequence number with every byte that it
   – If a byte arrives out of order TCP can reorder it .
   – If duplicate data arrives, It discards the duplicate data
• Flow Control
   – Advertised window : TCP tells the peer how many bytes of data it is wiling to
     accept from the peer at any one time
   – As data is received from the sender , the window size decreases , but as the
     receiving application reads data from the buffer , the window size decreases
   – UDP does not provide flow control

• Full duplex connection
   – The application can send and receive data in both the directions on a given
     connection at any given time
   – UDP can be full duplex


           • A socket is an end to end communication link between a server and a client
             application. This allows applications to be network aware, and send and
             receive data via a network. Interface details vary from computer to
           • applications consist of a server portion and a client portion. An application
             program request the operating system to create a socket connection. Each
             time a socket connection is used, the application program must specify the
             destination address, or alternatively, bind the IP address to the socket.
           • Sockets use a destination address and port number to communicate with
             another application. Each connection uses a specific port number, some of
             which are reserved (see /etc/services).

Types of sockets

           • Internet Sockets,
           • unix sockets,
           • X.25 sockets
          • Two Types of Internet Sockets
            • Stream Sockets (SOCK_STREAM)
                   • Connection oriented, rely on TCP to provide reliable two-way
                      connected communication
            • Datagram Sockets (SOCK_DGRAM)
                   • Rely on UDP, Connection is unreliable

Datagram sockets

          • The datagram protocol, also known as UDP, is connectionless.
          • This means that each time a datagram (a packet of data to a destination) is
            sent, the socket and destination computers address must be included. There
            is a limit of 64KB for datagrams sent to a specific location.
          • UDP is also unreliable, as there is no quarantee that the datagrams sent will
            arrive in the same order at the destination.

          • The files to include in application programs that define sockets and the
            various calls associated with them are
          • sys/types.h

Data types

          • a socket descriptor : int
          • struct sockaddr
          • struct sockaddr_in

Creating a socket

          The socket() call creates a socket on demand. The format is
          • int s;
            s = socket( AF_INET, SOCK_DGRAM, 0 );
            /* specify TCP/IP and use datagrams */

          • If the socket was not created, -1 is returned to indicate an error. When a
            socket is created, it is in an unconnected state. An application program
            normally uses the system call connect() to bind a destination address to the
            socket and place it into a connected state.
          • Sockets can be used in either connectionless datagram or as a more reliable
            stream. In connectionless datagram (udp), there is no guarentee of delivery.
            In tcp sockets, data delivery is guaranteed.
          • recvfrom(), sendto() and sendmsg() allow udp as they require the
            destination address to be specified as part of the call.

Setting up a destination address and port number

          • An application program creates a variable of type struct sockaddr_in, then
            assigns the destination address and port number to this variable. In sending
            or receiving data on the socket connection, this variable is passed as a

          • struct sockaddr_in server;

             /* set up server name and port number */
             server.sin_family = AF_INET; /* use TCP/IP */
             server.sin_port = 800; /* specify port 800 */
             server.sin_addr.s_addr = inet_addr("");

Binding the destination address

          • Rather than specify the destination address in each call, the destination
            address can be bound to the socket.
          • /* set up the server connection side */server.sin_family = AF_INET; /* use
            TCP/IP */server.sin_port = 0; /* use first available port
            */server.sin_addr.s_addr = INADDR_ANY;if( bind( s, &server,
            sizeof(server) ) < 0 ) { perror("Error, socket not bound."); exit(3);}

Sending data to the socket connection

          • There are five possible system calls that an application program can use to
            send data to a socket. They are send(), sendto(), sendmsg(), write() and
          • The following code fragment sends data to the port.
          • char buf[32];
            strcpy( buf, "Hello" );
            sendto( s, buf, sizeof(buf)+1, 0, &server, sizeof(server));

Receiving data from the socket connection

            The following code fragment receives data from the port.
          • char buf[32];int s, client_address_size;struct sockaddr_in client, server; if(
            recvfrom( s, buf, sizeof(buf), 0, (struct sockaddr *) &client,
             &client_address_size) < 0 ) { perror("Error getting data from socket
             connection."); exit( 4 );}

Closing the socket connection

           • When the application program is finished, the socket connection should be
           • close( s );

Byte Ordering functions
         • htons() - short integer from host byte order to network byte order.
         • ntohs() - short integer from network byte order to host byte order.
         • htonl() - long integer from host byte order to network byte order.
         • ntohl() - long integer from network byte order to host byte order.
         • S = 16 bit ( port number )
         • L = 32 bit (IPv4 addresses)

Stream protocol

           • The stream protocol, also known as TCP, is connection orientated.
           • This requires a connection to be established between the sender and
           • One of the sockets listens for a connection request (the server), the other
             socket asks for a connection (the client).
           • When the server accepts the connection request from the client, data can
             then be sent between the server and client.
           • In TCP there is no limit on the amount of data that can be transmitted. TCP
             is also a reliable protocol, in that data is received in the same order in
             which it was sent.

Now Let us write the TCP Program

              The main aim of the client is

              •   to create a socket
              •   Get and store the server`s address
              •   Use the connect function to establish connection with the server
              •   Send and receive messages
•     include the following header files
    –    #include <stdio.h>
    –    #include <stdlib.h>
    –    #include <unistd.h>
    –    #include <string.h>
    –    #include <netdb.h> /* bcoz v use hostent structure */
    –    #include <sys/types.h>
    –    #include <netinet/in.h> /* for internet family adderss structures */
    –    #include <sys/socket.h>
•     declare the two integer values
    –    numbytes,
    –    sockfd
•     declare a char buff of size 100
•     declare a pointer variable hname of type struct hostent
•     declare a variable serveraddr of type sockaddr_in

• use the gethostbyname() function to lookup the server name and check
  if it was successful or not . Note that the return value must be received
  in hname variable

• call the socket function and check if it is successful or not and store the
  return value in sockfd

• initialize the values of serveraddr to 0, using the bzero function

• Initialize the values of sin_family and sin_port of the serveraddress

• get the h_addr of the hname and type cast it to struct in_addr pointer
  and assign this value to the sin_addr as follows

      serveraddr.sin_addr = *((struct in_addr *)hname->h_addr);

• use the connect function to establish the connection with the server and
  check if it successful or not

• Now you can send and receive the data

• use recv() function to receive the data sent by the server and check if it
  was successfully received or not

• Print the data that was received on the screen

The main aim of the server is
         •   to create a socket
         •   Bind with the local protocol
         •   Listen for any incoming requests
         •   Accept the requests
         •   Send and receive messages

         •     declare two integer values – sockfd and newsockfd

         •     declare a variable myaddress of type structure sockaddr_in;

         •     declare a variable clientaddress of type structure sockaddr_in;

         •     using the socket function create a socket and check if it was
               successful or not

         •     initialize the values of the myaddress to zero using bzero function

         •     Initialize the values of sin_family, sin_port and

         • Use the bind() function to get the local protocol address of the server
           and check if it is successful or not

         • Use the listen() function to wait for a connection and check if it is
           successful or not

         • use the accept () function to accept a client connection and store the
           return value in the newsockfd variable . Check if it is successful or not

         • use the send function to send some data to the client and check if it was
           successfully sent or not ( note that you must use the newsockfd to send
           the data and not the sockfd)


         •   Client does not establish a connection with the server
         •   Instead, the client just sends a datagram to the server using sendto()
         •   Similarly the server does not accept a connection from the client.
             Instead it just calls a recvfrom()

         • Recvfrom(int sockfd, void *buff, size_t nbytes, int flags, struct
             sockaddr *from , socklen_t *addrlen)
               • Sendto(int sockfd, const void *buff, size_t nbytes, int flags, struct
                  sockaddr *to, socklen_t *addrlen)


inet_aton, function
              o converts the specified string, in the Internet standard dot notation, to a
                  network address, and stores the address in the structure provided.
                 int inet_aton(const char *cp, struct in_addr *addr);
              o Ex
                  inet_aton(―‖, &serveraddress);
              o returns 1 if the address is successfully converted, or 0 if the conversion

               o converts the specified string, in the Internet standard dot notation, to
                 an integer value suitable for use as an Internet address.
               o The converted address is in network byte order (bytes ordered from
                 left to right).
                 in_addr_t inet_addr(const char *cp);
               o Ex:
                  Struct in_addr a ;
                  a.in_addr_t = inet_addr(―‖)
               o On success, inet_addr() returns the Internet address. Otherwise, it
                 returns -1.
               o converts the specified Internet host address to a string in the Internet
                  standard dot notation.
                 char *inet_ntoa(struct in_addr in);
               o returns a pointer to the network address in Internet standard dot


               o char *inet_net_pton(int af, const char *src, void *dst, size_t size);
               o converts an Internet network number from presentation format (either
                 Internet standard dot notation, or Classless Internet Domain Routing
                 (CIDR) format) to network format
               o Used for IPV6

               o converts an address from network format to presentation format.
               o Used in IPV6
                 char *inet_ ntop(int af, const void *src, char *dst, size_t size);
             UNIT – III

Posix Signal handling
I/O multiplexing –I/O Models
I/O multiplexing –I/O Models
select function
shutdown function -poll function

A signal is a limited form of inter-process communication used in Unix, Unix-like, and
other POSIX-compliant operating systems. Essentially it is an asynchronous notification
sent to a process in order to notify it of an event that occurred. When a signal is sent to a
process, the operating system interrupts the process's normal flow of execution.
Execution can be interrupted during any non-atomic instruction. If the process has
previously registered a signal handler, that routine is executed. Otherwise the default
signal handler is executed.

Signal          Description
SIGABRT         Process aborted
SIGALRM         Signal raised by alarm
                Bus error: "access to undefined portion of
                memory object"
                Child process terminated, stopped (or
SIGCONT         Continue if stopped
                Floating point exception: "erroneous arithmetic
SIGHUP          Hangup
SIGILL          Illegal instruction
SIGINT          Interrupt
SIGKILL         Kill (terminate immediately)
SIGPIPE         Write to pipe with no one reading
SIGQUIT         Quit and dump core
SIGSEGV         Segmentation violation
SIGSTOP         Stop executing temporarily
SIGTERM         Termination (request to terminate)
SIGTSTP         Terminal stop signal
                Background process attempting to read from
                tty ("in")
                Background process attempting to write to tty
SIGUSR1         User-defined 1
SIGUSR2         User-defined 2
SIGPOLL         Pollable event
SIGPROF   Profiling timer expired
SIGSYS    Bad syscall
SIGTRAP   Trace/breakpoint trap
SIGURG    Urgent data available on socket
          Signal raised by timer counting virtual time:
          "virtual timer expired"
SIGXCPU   CPU time limit exceeded
SIGXFSZ   File size limit exceeded


         Type of I/O Models
             o Blocking I/O
             o Non Blocking I/O
             o I/O Multiplexing
             o Signal driven I/O
             o Asynchronous I/O

         Synchronous I/O Vs Asynchronous I/O
             o Synchronous I/o
                    Causes the requesting process to be blocked until that I/O
                        operation completes
                    Eg : Blocking, non blocking, I/O multiplexing, signal driven
                    Asynchronous I/O
                    Does not cause the requesting process to be blocked
             o Asynchronous I/O
   Blocking I/O ( eg. Socket)

   Non Blocking I/O
       o Polling : continually checking the kernel to see if the datagram is ready
       o It is a waste of CPU Time

   I/O Multiplexing

       o The process blocks in call to select , waiting for one of possible many
         sockets to become readable
       o Slight disadvantage : requires 2 system calls
       o Advantage : can wait for more than one descriptor
   Signal driven I/O
       o waiting for the datagram to arrive

   Asynchronous I/O
       o Signal driven tells us when the I/O operation can be initiated
       o Asynchronous tells us when the I/O operation can be completed
       o Few systems support POSIX asynchronous I/O
   Comparison

   Select Function
        o This function allows the process to instruct the kernel to wait for any
            one of multiple events to occur
        o and to wake up the process when one or more of the event occurs
        o Or when a specified amount of time has passed
        o We can call select and tell the kernel to return only when
        o Any of the descriptors in the set {1,4,5} are ready for reading
        o Any of the descriptors in the set {2,7} are ready for writing
        o Any of the descriptors in the set {1,4} have an exception condition
        o 10 seconds have elapsed

           int select( int nfds,
          fd_set* readfds,
          fd_set* writefds,
          fd_set* exceptfds,
          const struct timeval* timeout

struct timeval
long tv_sec;
long tv_usec
                       UNIT IV

Socket options – getsocket and setsocket functions –
Generic socket options –
Generic socket options –
IP socket options –
ICMP socket options – TCP socket options –
Elementary UDP sockets
Domain name system –.
gethostbyname function – Ipv6 support in DNS
gethostbyadr function
getservbyname and getservbyport functions

       SOL_SOCKET
          o Use this constant as the level argument to getsockopt or setsockopt to
             manipulate the socket-level options described in this section.
       SO_DEBUG
          o This option toggles recording of debugging information in the
             underlying protocol modules. The value has type int; a nonzero value
             means ―yes‖.
          o This option controls whether bind (see Setting Address) should permit
             reuse of local addresses for this socket. If you enable this option, you
             can actually have two sockets with the same Internet port number; but
             the system won't allow you to use the two identically-named sockets in
             a way that would confuse the Internet. The reason for this option is
             that some higher-level Internet protocols, including FTP, require you
             to keep reusing the same port number.
          o The value has type int; a nonzero value means ―yes‖.
          o This option controls whether the underlying protocol should
             periodically transmit messages on a connected socket. If the peer fails
             to respond to these messages, the connection is considered broken. The
             value has type int; a nonzero value means ―yes‖.
          o This option controls whether outgoing messages bypass the normal
             message routing facilities. If set, messages are sent directly to the
             network interface instead. The value has type int; a nonzero value
             means ―yes‖.
       SO_LINGER
          o This option specifies what should happen when the socket of a type
             that promises reliable delivery still has untransmitted messages when it
             is closed; see Closing a Socket. The value has type struct linger.
          o — Data Type: struct linger
          o This structure type has the following members:
          o int l_onoff
          o This field is interpreted as a boolean. If nonzero, close blocks until the
             data are transmitted or the timeout period has expired.
          o int l_linger
          o This specifies the timeout period, in seconds.
          o This option controls whether datagrams may be broadcast from the
             socket. The value has type int; a nonzero value means ―yes‖.
          o If this option is set, out-of-band data received on the socket is placed
             in the normal input queue. This permits it to be read using read or recv
                 without specifying the MSG_OOB flag. See Out-of-Band Data. The
                 value has type int; a nonzero value means ―yes‖.
           SO_SNDBUF
              o This option gets or sets the size of the output buffer. The value is a
                 size_t, which is the size in bytes.
           SO_RCVBUF
              o This option gets or sets the size of the input buffer. The value is a
                 size_t, which is the size in bytes.
           SO_STYLE
           SO_TYPE
              o This option can be used with getsockopt only. It is used to get the
                 socket's communication style. SO_TYPE is the historical name, and
                 SO_STYLE is the preferred name in GNU. The value has type int and
                 its value designates a communication style; see Communication
           SO_ERROR
              o This option can be used with getsockopt only. It is used to reset the
                 error status of the socket. The value is an int, which represents the
                 previous error status.

Here are the functions for examining and modifying socket options. They are declared in

— Function: int getsockopt (int socket, int level, int optname, void *optval, socklen_t

The getsockopt function gets information about the value of option optname at level
level for socket socket.

The option value is stored in a buffer that optval points to. Before the call, you should
supply in *optlen-ptr the size of this buffer; on return, it contains the number of bytes of
information actually stored in the buffer.

Most options interpret the optval buffer as a single int value.

The actual return value of getsockopt is 0 on success and -1 on failure. The following
errno error conditions are defined:

        The socket argument is not a valid file descriptor.
        The descriptor socket is not a socket.
        The optname doesn't make sense for the given level.
— Function: int setsockopt (int socket, int level, int optname, void *optval, socklen_t

This function is used to set the socket option optname at level level for socket socket. The
value of the option is passed in the buffer optval of size optlen.

Generic Socket Options

SO_BROADCAST: permit sending of broadcast datagram, only on broadcast links

SO_DEBUG: enable debug tracing of packets sent or received by TCP socket, trpt
program to examine the kernel circular buffer

SO_DONTROUTE: bypass routing table lookup, used by routing daemons (routed and
gated) in case the routing table is incorrect

SO_ERROR: get pending error and clear

SO_KEEPALIVE: test if TCP connection still alive periodically (2 hours, changed by

Generic Socket Options (Cont.)

SO_LINGER: linger on TCP close if data in socket send buffer, linger structure passed

SO_OOBINLINE: leave received out-of-band data inline in the normal input queue

SO_RCVBUF/SO_SNDBUF: socket receive / send buffer size, TCP default: 8192-
61440, UDP default: 40000/9000

SO_RCVLOWAT/SO_SNDLOWAT: receive / send buffer low water mark for select to

SO_RCVTIMEO/SO_SNDTIMEO: receive / send timeout for socket read/write

SO_REUSEADDR/SO_REUSEPORT: allow local address reuse for TCP server restart,
IP alias, UDP duplicate binding for multicasting

SO_TYPE: get socket type, SOCK_STREAM or SOCK_DGRAM

SO_USELOOPBACK: routing socket gets copy of what it sends

IPv4 Socket Options

IP_HDRINCL: IP header included with data, e.g. traceroute builds own IP header on a
raw socket
IP_OPTIONS: specify socket options like source route, timestamp, record route, etc.

IP_RECVSTADDR: return destination IP address of a received UDP datagram by

IP_RECVIF: return received interface index for a received UDP datagram by recvmsg

IP Socket Options (Cont.)

IP_TOS: set IP TOS field of outgoing packets for TCP/UDP socket, TOS:

IP_TTL: set and fetch the default TTL for outgoing packets, 64 for TCP/UDP sockets,
255 for raw sockets, used in traceroute


IPv6 Socket Options

ICMP6_FILTER: fetch and set icmp6_filter structure specifying message types to pass

IPV6_ADDFORM: change address format of socket between IPv4 and IPv6

IPV6_CHECKSUM: offset of checksum field for raw socket

IPV6_DSTOPTS: return destination options of received datagram by recvmsg

IPV6_HOPLIMIT: return hop limit of received datagrams by recvmsg

IPv6 Socket Options (Cont.)

IPV6_HOPOPS: return hop-by-hop options of received datagrams by recvmsg

IPV6_NEXTHOP: specify next hop address as a socket address structure for a datagram

IPV6_PKTINFO: return packet info, dest IPv6 and arriving interface, of received

IPV6_PKTOPTIONS: specify socket options of TCP socket

IPV6_RTHDR: receive source route


TCP Socket Options

TCP_KEEPALIVE: seconds between probes

TCP_MAXRT: TCP max retx time

TCP_MAXSEG: TCP max segment size

TCP_NODELAY: disable Nagle algorithm, to reduce the number of small packets

TCP_STDURG: interpretation of TCP’s urgent pointer, used with out-of-band data
condition variables
raw sockets – raw socket creation
raw socket input-raw socket output
ping program
trace route program.

• Usually, sockets are used to build applications on top of a transport protocol
   – Stream sockets (TCP)
   – Datagram sockets (UDP)
• Some applications need to access a lower layer protocol,
   – Control protocols built on IP rather than UDP or TCP, such as ICMP and IGMP
   – Experimental transport protocols
• A ―raw‖ socket allows direct access to IP
• Used to build applications on top of the network layer

• Standard socket() call used to create a raw socket
   – Family is AF_INET, as for TCP or UDP
   – Socket type is SOCK_RAW instead of SOCK_STREAM or SOCK_DGRAM
   – Socket protocol needs to be specified, e.g. IPPROTO_ICMP
                                   Features of Raw Socket
• Read and write ICMP and IGMP packets (instead for putting more code into the
  kernel, it is handled entirely in the user process )
• A process can read and write IPv4 datagram with an IPV4 protocol field that is not
  processed by the kernel (most kernel process datagram of ICMP, IGMP, TCP, UDP)
• A process can build its own IPV4 header using IP_HDRINCL socket option

Raw socket creation

     • Only a super user can create a raw socket
     • Created as follows
           Sockfd=socket(AF_INET, SOCK_RAW, protocol)
     • Protocol: IPPROTO_xxx
     • IP_HDRINCL option can be set as follows
         int on = 1;
         setsockopt(sockfd,, IPPROTO_IP, IP_HDRINCL, &on, sizeof(on))

Raw Socket Output
    • Normal output can be performed : sendto, send, write
    • The starting address to be specified
       – If IP_HDRINCL is not set : the first byte following the IP header
       – If IP_HDRINCL is set : the first byte of the IP Header
Raw Socket Input
    • Cannot read the TCP & UDP directly through the raw socket. It must be read
      through the data link layer
    • Icmp packets are passed to the raw socket after the kernel has finished process
      the ICMP message

Internet Control Message Protocol

     •   ICMP messages
     •   Query network node(s) for information
     •   Report error conditions
     •   ICMP messages are carried as IP datagrams
     •   ICMP ―uses‖ or is ―above‖ IP
     •   ICMP messages usually processed by IP, UDP,or TCP

Two important applications of Raw Sockets
    • Ping program
    • Trace route Program

        Ping is a computer network tool used to test whether a particular host is
         reachable across an IP network;
        it is also used to self test the network interface card of the computer.
        It works by sending ICMP ―echo request‖ packets to the target host and
         listening for ICMP ―echo response‖ replies.
        Ping estimates the round-trip time, generally in milliseconds, and records any
         packet loss, and prints a statistical summary when finished
         ICMP packet

                                ICMP ping packet
                              Bit 0 - 7          Bit 8 - 15     Bit 16 - 23 Bit 24 - 31
                            Version/IHL       Type of service           Length
                                 Identification                      flags et offset
      IP Header
                       Time To Live(TTL)       Protocol               Checksum
(160 bits OR 20 Bytes)
                                           Source IP address
                                            Destination IP address
    ICMP Payload          Type of message          Code               Checksum
(64+ bits OR 8+ Bytes)                              Quench
                                                Data (optional)

        Composition of an ICMP Echo Reply packet
           o Header (in blue), with Protocol set to 1 and Type of Service set to 0.
           o Type of ICMP message (8 bits)
           o Code (8 bits)
           o Checksum (16 bits), calculated with the ICMP part of the packet (the
              header is not used)
           o Data load for the different kind of answers


        Traceroute is a computer network tool used to determine the route taken by
         packets across an IP network.
        Traceroute works by increasing the "time-to-live" value of each successive
         batch of packets sent.
        The first three packets sent have a time-to-live (TTL) value of one (implying
         that they are not forwarded by the next router and make only a single hop).
        The next three packets have a TTL value of 2, and so on.
        When a packet passes through a host, normally the host decrements the TTL
         value by one, and forwards the packet to the next host.
        When a packet with a TTL of one reaches a host, the host discards the packet
         and sends an ICMP time exceeded (type 11) packet to the sender.
        The traceroute utility uses these returning packets to produce a list of hosts that
         the packets have traversed en route to the destination.
         The three timestamp values returned for each host along the path are the delay
         (aka latency) values typically in milliseconds (ms) for each packet in the batch.
         If a packet does not return within the expected timeout window, a star (asterisk)
         is traditionally printed.
         Traceroute may not list the real hosts. It indicates that the first host is at one
         hop, the second host at two hops, etc. IP does not guarantee that all the packets
         take the same route.

Shared By: