TRAVERSAL OF RTSP
THROUGH
FIREWALLS & NAT DEVICES
Author: Jochanan Sommerfeld
Creation Date: 13-01-2002
Page 1 of 27
ABOUT THIS GUIDE
This document was written to help you design and configure firewall [FW] and network
address translation [NAT] solutions for video streaming environments that are based on
the Real Time Streaming Protocol [RTSP].
Anytime you have connections between networks with different security policies, you
need to provide protection, usually in the form of a firewall [FW] combined with network
address translation [NAT]. The Real Time Streaming Protocol [RTSP] introduces
dynamic conditions exceeding capabilities of firewalls using static policies. Whilst the
following document lays a solid foundation for implementing RTSP in a firewall
environment by describing the interaction of the involved protocols with firewalls, it does
not provide configuration instructions for specific firewall products.
Page 2 of 27
CONTENTS
ABOUT THIS GUIDE 2
CONTENTS 3
TARGET AUDIENCE 4
1. INTRODUCTION 5
2. REAL TIME STREAMING PROTOCOL OVERVIEW 6
2.1. REAL TIME STREAMING PROTOCOL DESCRIPTION 6
2.2. RTSP-STATES 7
3. OVERVIEW OF FIREWALL TECHNOLOGIES 12
3.1. PACKET FILTERING GATEWAYS 12
3.2. CIRCUIT-LEVEL GATEWAYS 13
3.3. APPLICATION LEVEL GATEWAYS 13
4. NETWORK ADDRESS TRANSLATION (NAT) TECHNOLOGIES 14
4.1. CLASSIC NETWORK ADDRESS TRANSLATION TECHNIQUES 14
4.2. STATIC NETWORK ADDRESS TRANSLATION 14
4.3. DYNAMIC NETWORK ADDRESS TRANSLATION 14
4.4. NETWORK ADDRESS AND PORT TRANSLATION (NAPT): 15
4.5. OTHER NETWORK ADDRESS TRANSLATION TECHNIQUES 17
5. ADVICES FOR SUCCESSFUL DEPLOYMENT OF RTSP 18
5.1. STATIC NAT IN FRONT OF THE RTP SERVER 19
5.2. DYNAMIC NAT OR NAPT IN FRONT OF THE RTSP SERVER 21
5.3. VIRTUAL IP FOR SERVER LOAD BALANCING 22
5.4. DYNAMIC NAT OR NAPT IN FRONT OF THE RTP PLAYER 23
5.5. FW IN FRONT OF THE RTSP SERVER 25
6. REFERENCES 27
Page 3 of 27
TARGET AUDIENCE
This guide is intended for the following qualified personnel who are responsible for the
definition and configuration of the Enterprise Security Policy:
! System Administrators
! Network Administrators
! Security Administrators
! Project Managers
Knowledge of TCP/IP protocols is assumed and necessary for understanding the
concern of this document. This document considers streaming via the RTSP standard
only. For details about the RTSP protocol consult chapter 2 of this document or the
relevant RFC 2326.
Page 4 of 27
1. INTRODUCTION
The first chapter introduces the reader to the Real Time Streaming Protocol and the
related protocols for video streaming. The next chapters provide a brief introduction to
the basic terms and concepts of firewalls and NAT devices. Readers knowledgeable
with firewall and NAT technologies can just move on to chapter 5. This chapter explains
the barriers to successful deployment of RTSP in FW and NAT environments. The
chapter gives advices for how to implement RTSP through FWs and NAT devices in
such environments.
Page 5 of 27
2. REAL TIME STREAMING PROTOCOL OVERVIEW
RTSP is an application-level protocol for control over the delivery of data with real-time
properties and can be seen as a network remote control. It provides an extensible
framework to enable controlled, on-demand delivery of real-time data, such as audio
and video streams using the Transmission Control Protocol (TCP) or the User
Datagram Protocol (UDP). All the known RTSP servers today are TCP-based. In
general, an RTSP server can use any type of packet format for sending media data to
an RTSP client. Today’s RTSP servers use the Real Time Transport Protocol (RTP)
over unicast UDP for sending media data to an RTSP client.
2.1. Real Time Streaming Protocol description
The RTSP client sets up three network channels with the RTSP server when media
data is delivered, using the RTP over UDP, as shown in Figure 5. A full-duplex TCP
connection is used for control and negotiation. A simplex UDP channel is used for
media data delivery using the RTP packet format. A full-duplex UDP channel called
RTCP is used to provide synchronization information to the client and packet loss
information to the server.
Figure 5: RTSP overview
RTSP/TCP
control connection
RTSP-Player RTSP-Server
Def. port 554
Port 2n
Port 2n+1 Odd port
RTP data RTCP reports
Page 6 of 27
2.2. RTSP-States
The following RTSP-Methods play a central role in the allocation and usage of
streaming recourses on the streaming server. For the simplicity of this document I am
concentrating on the 5 methods shown in figure 6. Please consult RFC2326 to get
additional information about the remaining methods:
Figure 6: RTSP states
R T S P -P la ye r R T S P -S e r v e r
D E S C R IB E (T C P -5 5 4 )
S E T U P (T C P -5 5 4 )
P L A Y (T C P -5 5 4 )
R T P o f m e d ia (U D P d y n a m ic )
R T C P (U D P d y n a m ic )
P A U S E (T C P -5 5 4 )
T E A R D O W N (T C P -5 5 4 )
DESCRIBE: Describes the requested media.
SETUP: Server allocates the required resources and starts a RTSP session.
PLAY and RECORD: Starts the media stream that was allocated by SETUP.
PAUSE: Temporarily stops the stream without freeing the allocated recourses.
TEARDOWN: Frees the recourses and terminates the session.
DESCRIBE-METHOD
A presentation description file on the RTSP server describes the properties of the
required media. This file maybe obtained by the client via HTTP or via the DESCRIBE
method of RTSP. The general syntax for an RTSP method is as follows:
C->S DESCRIBE http://emblaze.com/welcome.mp4 RTSP/1.0
Cseq: 312
Accept: application/sdp
S->C DESCRIBE http://dbnet.co.il/welcome.mp4 RTSP/1.0
Cseq: 312
Date: 20 Oct 2001 13:00:00 GMT
Session: 1111111
Page 7 of 27
Content-Type: application/sdp
Content-Length: 332
v=0
o = s02/0/0/F5003EB0-B268-11d5-A813-006097D530D3.mp4 IN IP4
192.168.20.106
c = IN 192.168.20.106
t = 0.000000 0
m = audio 0 RTP/AVP G.723.1 (4)
b = AS:5
a = control:trackID=2
SETUP-Method
The SETUP request for a URL specifies the transport mechanism to be used for the
streamed media. A client can issue a SETUP request for a stream that is already
playing to change transport parameters. For the benefit of any intervening firewall, a
client must indicate the transport parameters even if it has no influence over these
parameters, for example, where the server advertises a fixed multicast address.
Note: Since SETUP includes all transport initialization information, firewalls and other
intermediate network devices (which need this information) are spared the more arduous task of
parsing the DESCRIBE response, which has been reserved for media initialization.
The Transport header specifies the transport parameters acceptable to the client for
data transmission; the response will contain the transport parameters elected by the
server.
Example:
C->S SETUP rtsp://example.com/foo/bar/baz.rm RTSP/1.0
CSeq: 302
Transport: RTP/AVP;unicast;client_port=4588-4589
S->C: RTSP/1.0 200 OK
CSeq: 302
Date: 20 Oct 2001 15:35:06 GMT
Session: 47112344
Transport: RTP/AVP;unicast;
client_port=4588-4589;server_port=6256-6257
PLAY-Method
The PLAY method tells the server to start sending data via the mechanism specified in
the SETUP phase. A client must not issue a PLAY request until any outstanding SETUP
requests have been acknowledged as successful. The PLAY request positions the
Page 8 of 27
normal play-time to the beginning of the range specified and delivers stream data until
the end of the range is reached. PLAY requests may be pipelined (queued); a server
queues PLAY requests and executes them in order. That is, a PLAY request arriving
while a previous PLAY request is still active is delayed until the first has been
completed. This allows precise editing. For example, regardless of how closely spaced
the two PLAY requests in the example below arrive, the server will first play seconds 10
through 15, then, immediately following, seconds 20 to 25, and finally seconds 30
through the end.
Example:
C->S: PLAY rtsp://dbnet.co.il/audio RTSP/1.0
CSeq: 835
Session: 12345678
Range: npt=10-15
C->S: PLAY rtsp://dbnet.co.il/audio RTSP/1.0
CSeq: 836
Session: 12345678
Range: npt=20-25
C->S: PLAY rtsp://dbnet.co.il/audio RTSP/1.0
CSeq: 837
Session: 12345678
Range: npt=30-
PAUSE-Method
The PAUSE request causes the stream delivery to be interrupted (halted) temporarily. If
the request URL names a stream, only playback and recording of that stream is halted.
For example, for audio, this is equivalent to muting. If the request URL names a
presentation or group of streams, delivery of all currently active streams within the
presentation or group is halted. After resuming playback or recording, synchronization
of the tracks must be maintained. Any server resources are kept, though servers may
close the session and free resources after being paused for the duration specified with
the timeout parameter of the Session header in the SETUP message.
Example:
C->S: PAUSE rtsp://dbnet.co.il/example RTSP/1.0
CSeq: 834
Session: 12345678
S->C: RTSP/1.0 200 OK
CSeq: 834
Date: 23 Jan 1997 15:35:06 GMT
Page 9 of 27
Page 10 of 27
TEARDOWN-Method
The TEARDOWN request stops the stream delivery for the given URI, freeing the
resources associated with it. Any RTSP session identifier associated with this session is
no longer valid. Unless all transport parameters are defined by the session description,
a SETUP request has to be issued before the session can be played again.
Example:
C->S: TEARDOWN rtsp://dbnet.co.il/example RTSP/1.0
CSeq: 892
Session: 12345678
S->C: RTSP/1.0 200 OK
CSeq: 892
Page 11 of 27
3. OVERVIEW OF FIREWALL TECHNOLOGIES
Depending upon the degree of protection required and the security policies of the
organizations, the firewall can be implemented in any of the several ways. Traditional, a
firewall is a standalone box. The box can be anything from a dedicated appliance to a
hardened general purpose NT or UNIX system running firewall software. Many router
vendors also provide firewall capabilities on their routers, which may be suitable in cost-
conscious environments with moderate security requirements. There are many styles of
firewall operation, from simple packet filtering gateways to intelligent application level
gateways. Figure 1 describes the general operation of a firewall:
Figure 1: General FW operation
OUTSIDE NETWORK INSIDE NETWORK
Firewall
GATEWAY
Internet Filter Filter
inbound traffic outbound traffic
3.1. Packet filtering Gateways
Packet filtering gateways exist in two variations, those that do not remember what has
happened previous in the session, called stateless, and those that do remember, called
stateful. Both, stateless and stateful monitor the traffic for the various headers of the IP
datagrams, i.e. the IP header, the TCP header and the UDP header. The most
important information when performing filtering, are IP addresses (source and
destination) and port numbers (source and destination). Stateful packet filters monitor
the traffic in addition for information such as the SYN and the ACK flag in the TCP
header, in order to track TCP connections. Stateful filters can recognize application
protocols without having to base its decisions on whether a certain packet is destined
for a service using a well-known port or not. Worth noting is that the packet filtering
gateways do not have the ability to make changes to content of the application data of
the IP datagrams. The filtering functions and rules on packet filtering gateways are
applied per interface and per direction (in or out).
Page 12 of 27
3.2. Circuit-Level gateways
The circuit level gateway (often referred to as circuit-level proxy) is a generic tool for
relaying TCP connections from one side of the firewall to the other. See Figure 2. The
internal client makes a connection to a TCP port on the gateway, which then opens a
connection to the external server. The gateway does not try to interpret the content of
the application part of the TCP segment, it only relays any information received from
one side to the other.
Figure 2: Circuit level gateway
OUTSIDE NETWORK INSIDE NETWORK
Firewall
con
n e c ti Relay n1
External Server on
2 ec tio Internal Client
co nn
Cicuit Level Gateway
Internet
The IP addresses that the clients use on the internal network will not be seen in the IP
header of datagrams leaving the network since it will be replaced with the IP address of
the second interface (on the public side). The port number in the TCP header will
therefore also be replaced.
3.3. Application Level Gateways
The application-level gateway (ALG) is the most sophisticated kind of firewall gateway.
It works in a similar fashion as the circuit level gateway but there are a number of
important differences. Application level gateways are just like the circuit-level gateways
often referred to as proxies. On other occasions an ALG is often referred to a program
running in cooperation with a firewall performing Network Address Translation (NAT), as
described in the next section. The ALG is not generic like the circuit-level gateway;
instead it uses special purpose code for each particular application service it is
supporting. Since it understands the application protocol it is relaying on, it achieves a
higher security level. The ALG does not have the shortcoming of only supporting TCP
as the circuit-level gateway has. It also supports UDP based protocols. If the ALG is
used together with a NAT device, then the ALG will examine the application data for
occurrences of internal addresses and replace them with the address of the firewalls
external interface. ALGs with support for protocols like FTP, DNS, and SNMP are
common components in NAT firewalls.
Page 13 of 27
4. NETWORK ADDRESS TRANSLATION (NAT) TECHNOLOGIES
When network address translation was invented it was a mere hack to circumvent IP
address shortage. Meanwhile it has proven to be useful in completely different fields
nobody had thought of at the beginning, and there are probably many more useful
applications that have not been found yet. Even though I spend a special chapter to
NAT you should be aware that most organizations do not use NAT only devices, but
hybrids of FW and NAT.
4.1. Classic Network Address Translation Techniques
Speaking about NAT, you must know that address translation can be done statically or
dynamically. In the first case the assignment of NAT-IP addresses to original IP
addresses is unambiguous, in the second case it is not. In static NAT a certain fixed
original IP address is always translated to the same NAT IP address at all times, and no
other IP address gets translated to the same NAT-IP address, while in dynamic NAT the
NAT IP address depends on various runtime conditions and may be a completely
different one for each single connection.
4.2. Static Network Address Translation
Static NAT requires the same number of globally unique IP addresses, as there are
hosts in the private environment, which want to be able to connect to outside networks
(all of them most likely). As the name suggests, the mapping between local addresses
and global addresses is intended to stay the same for a long period of time. The static
NAT strategy is easy to implement because no information about the state of
connections that are being translated needs to be kept, looking at each IP packet
individually is sufficient. Connections from outside the network to inside hosts are no
problem; they just appear to have a different IP address than on the inside, so static
NAT is (almost) completely transparent.
4.3. Dynamic Network Address Translation
Dynamic address translation is necessary when the number of IP addresses to translate
does not equal the number of IP addresses to translate to, or they are equal but for
some reason it is not desirable to have a static mapping. The number of NAT IP
addresses available generally limits the number of hosts communicating. When all NAT
IP addresses are being used then no other connections can be translated and must
therefore be rejected by the NAT device, for example by sending back 'host
unreachable'. Dynamic NAT is more complex than static NAT, since NAT devices must
keep track of communicating hosts and possibly even of connections, which requires
looking at TCP information in packets.
As mentioned above, dynamic NAT may also be useful when there are enough network
addresses. Some organizations use this as a security improvement: it is impossible for
Page 14 of 27
someone outside the network to get useful IP addresses to connect to hosts behind a
NAT device providing dynamic address translation by looking at connections that take
place, since next time the same host may connect using a completely different IP
address. In this special case even having more NAT IP addresses than real IP
addresses to be translated may make some sense. Connections from outside are only
possible when the host that shall be reached still has a NAT-IP address assigned, i.e. if
it still has an entry in the dynamic NAT table, where the NAT device keeps track of
which internal IP address is mapped to which NAT IP address.
Example:
The NAT device dynamically translates all IP addresses in network 138.201.0.0/16 to IP
addresses in network 178.201.112.0/24. Each new connection from the inside gets
assigned an IP address from the pool of addresses, as long as there are unused
addresses left. If a mapping already exists for the internal host, it is used instead. As
long as the mapping exists the internal host can be reached via the IP address that has
been (temporarily) assigned to it. Figure 3 shows an example for dynamic address
translation:
Figure 3: Dynamic address translation
NAT Device
src 138.201.148.32 src 178.201.112.34
NAT-Table
138.201.148.51
195.112.18.161
src 138.201.148.151 src 178.201.112.11
dst 138.201.148.151 dst 178.201.112.11
NAT-Table
Internal IP address NAT IP address
138.201.148.32 178.201.112.34
138.201.148.151 178.201.112.11
4.4. Network Address and Port Translation (NAPT):
A very special case of dynamic NAT is NAPT, also known as masquerading, which
became famous under that name in the Linux operating system. It is probably the kind
of NAT-technique that is used most often these days. Here many IP addresses are
hidden behind a single one. In contrast to the original dynamic NAT this does not mean
that there can be only one connection at a time. In masquerading an almost arbitrary
number of connections is multiplexed using TCP port information. Only the number of
Page 15 of 27
available TCP-ports limits the number of simultaneous connections. Incoming
connections are impossible with masquerading, since even when a host has an entry in
the masquerading table of the NAT device this entry is only valid for active connections.
Example:
The NAT device masquerades the internal network 138.201.0.0/24 using the NAT
device’s own IP address for each outgoing packet. The source IP address is replaced
by the NAT device’s (external) IP address, and the source port is exchanged against an
unused port from the range reserved exclusively for masquerading on the device. If the
destination IP address of an incoming packet is the local IP address of the NAT device
and the destination port is inside the range of ports used for masquerading on the
device, the NAT device checks its masquerading table for an active masqueraded
session; if this is the case, the destination IP address and port of the internal host is
inserted and the packet is forwarded to the internal host. See figure 4:
Figure 4: NAT device providing NAPT
NAT Device
src 138.201.148.32:1257 Masquerading-Table src 195.112.12.161:63451
138.201.148.51
195.112.18.161
dst 193.46.94.115:80 dst 193.46.94.115:80
src 138.201.148.151:4192 src 195.112.12.161:63452
dst 53.12.198.15:23 dst 53.12.198.15:23
Masquerading-Table
Internal IP address : Port Local NAT-Port
138.201.148.32 : 1257 63451
138.201.148.151 : 4192 63452
The greatest advantage of masquerading for many organizations is that they only need
one official IP-address, but the entire internal network can still directly access the
Internet.
Page 16 of 27
4.5. Other Network Address Translation Techniques
While classic NAT technologies serving the purpose of saving valuable address space
have been known for a long time other uses for this technology have been found, that
are independent of the problems NAT was proposed to help to solve. One example is
the Virtual IP for load balancing issues. I decided not to describe load balancing in
detail, because I wrote a special document dealing with streaming over IP load
balancers. If you are interested in details you should consult the mentioned document.
Page 17 of 27
5. ADVICES FOR SUCCESSFUL DEPLOYMENT OF RTSP
In this section I will describe the difficulties for firewalls [FW] and network address
translation [NAT] devices to support the Real Time Streaming Protocol [RTSP]. The
RTSP protocol is unusual in that it is a hybrid protocol, a combination of both TCP and
UDP, which introduces a problem to FWs and NAT devices. This is because RTSP and
the related protocols (RTP and RTCP) exchange address and port parameters within
the control session to establish data sessions and session orientations. Without any
application level knowledge, nor NAT device neither FWs can understand the inter-
dependency of the bundled sessions and would treat each session to be unrelated to
one another. Video streaming applications in this case can fail for a variety of reasons.
Two most likely reasons for failures are:
! Addressing information in control payload is realm-specific and is not valid
once packet crosses the originating realm
! Control session permits data session(s) to originate in a direction that NAT
and FWs might not permit.
Let us now look at some configurations and topologies we could expect in reality.
We should have in mind that there exist a great number of possible solutions and that it
blows up the scope of this paper to discuss all of them. That’s why I decided to focus on
the most common implementations. Although I discuss NAT devices, FWs and load
balancers separately, you should be aware of it that existing implementations mostly
represent a mixture of these technologies
Page 18 of 27
5.1. Static NAT in front of the RTSP Server
As already described in chapter 4, static address translation is the simplest form of NAT
and is mostly used in server environments. In the configuration in figure 7 the NAT
device is installed in front of the RTSP server:
Figure 7: Static NAT on server site
172.16.0.10
RTSP-Player RTSP Server
207.232.10.1 Internet NAT
207.232.9.10
NAT
Public address Privat address
Figure 8 shows the packet flow and the related address translation:
Figure 8: Packet flow
NAT-Router
Source IP Destination IP Source Port Destination Port DATA Source IP Destination IP Source Port Destination Port DATA
TCP 207.232.10.1 207.232.9.10 1046 554 RTSP TCP 207.232.10.1 172.16.0.10 1046 554 RTSP
Source IP Destination IP Source Port Destination Port DATA Source IP Destination IP Source Port Destination Port DATA
TCP 207.232.9.10 207.232.10.1 554 1046 RTSP TCP 172.16.0.10 207.232.10.1 554 1046 RTSP
Source IP Destination IP Source Port Destination Port DATA Source IP Destination IP Source Port Destination Port DATA
UDP 207.232.9.10 207.232.10.1 1327 6000 RTP UDP 172.16.0.10 207.232.10.1 1327 6000 RTP
Source IP Destination IP Source Port Destination Port DATA Source IP Destination IP Source Port Destination Port DATA
UDP 207.232.9.10 207.232.10.1 1328 6001 RTCP UDP 172.16.0.10 207.232.10.1 1328 6001 RTCP
Source IP Destination IP Source Port Destination Port DATA Source IP Destination IP Source Port Destination Port DATA
UDP 207.232.10.1 207.232.9.10 6001 1328 RTCP UDP 207.232.10.1 172.16.0.10 6001 1328 RTCP
Page 19 of 27
Let us look at the different steps of the streaming session and the way the NAT device
processes the traversing packets:
Step 1: RTSP-Player establishes a control connection to the RTSP-Server over TCP
port 554 and issues the RTSP requests. The static NAT device does not have any
impact on the RTSP connection.
Note: Be aware that the implementation of static NAT does not translate IP addresses inside the
packets. It is recommended to use a FQDN (Fully Qualified Domain Name) in the RTSP-URL,
otherwise it could happen that some RTSP-Servers compare the IP address in the RTSP-URL with
the IP address in the IP packet header and reject the RTSP session.
Step 2: Following to the issued RTSP-PLAY-Method the RTSP-Server starts streaming
to the player on the negotiated UDP port (RTP). Also here the NAT device has no
impact on the UDP packets, because it only translates the IP addresses but does not
change any port information in the packets.
Step 3: RTSP-Server sends Sender Reports to the client on the negotiated UDP port
(RTCP).
Step 4: RTSP-Player sends Receiver Reports to the RTSP-Server on the negotiated
UDP port (RTCP).
Step 5: The RTP and RTCP packet flow continues until the RTSP-Player issues the
TEARDOWN-Message.
In the above-described example RTSP would work fine. The reason is that static NAT
does not conflict with any information in the RTSP methods. Even when more static
NAT devices on the communication path between the player and server are used,
RTSP would work without any problem. Summarizing we can make following statement:
Static NAT devices do not require any application level knowledge in order to process
RTSP streaming.
Page 20 of 27
5.2. Dynamic NAT or NAPT in front of the RTSP Server
In contrast to static NAT, any kind of dynamic NAT requires to store and manage
dynamic information about the clients currently using the system. Figure 9 shows an
example of NAPT in front of the RTSP server:
Figure 9: NAPT on server site
172.16.0.10
RTSP-Player RTSP Server 1
172.16.0.11
207.232.10.1 Internet NAPT
207.232.9.10 RTSP Server 2
Dynamic NAT or NAPT
Public address Privat addresses
As you know, incoming connections are impossible with masquerading or dynamic NAT,
since even when a host has an entry in the masquerading table of the NAT device this
entry is only valid for the active connection. While it is true that incoming connections
are impossible we can take additional measures to enable them, but they are not part of
the masquerading code. We could, for an example, set up the NAT-device so that it
relays all connections coming in from the outside to a specific TCP-port of a host on the
inside. However, since we have just one IP that is visible outside for enabling incoming
connections for the same service but for different hosts on the inside we must listen on
different ports on the NAT-device, one for each service and internal IP. Since RTSP-
Servers listens on the well-known port TCP/554, this is quite inconvenient and in our
case no option. We can definitely say that dynamic NAT and NAPT cannot be used on
server site.
Page 21 of 27
5.3. Virtual IP for server load balancing
The objective of RTSP server load balancing is to intelligently switch an RTSP request,
and the following media streams associated with a presentation. Only load balancers
with application intelligence are capable of ensuring the packet forwarding to the right
server. Figure 10 describes an example of RTSP load balancing.
Figure 10: IPLB on server site
172.16.0.10
RTSP-Player RTSP Server 1
172.16.0.11
207.232.10.1 Internet IPLB
207.232.9.10 RTSP Server 2
Virtual IP Real IP addresses
Let us look at the different stages of the RTSP session and define the required
measures of the IPLB in order to support the traverse of the packets.
Step 1: RTSP-Player establishes a control connection to the virtual IP address [VIP] of
the RTSP-Servers over TCP port 554 and issues his RTSP requests. Beginning from
the first SETUP request of the player server persistence is required.
Step 2: The IPLB should inspect the IP datagrams coming from the RTSP-Player on
application level. On receiving the SETUP request the IPLB has to choose one RTSP-
Server for the rest of the session. The SETUP response from the RTSP-Server includes
the server and client UDP ports in addition to the unique session ID and SSRC. The
IPLB needs to remember these parameters during the whole RTSP session.
Step 3: The Player issues a PLAY request. The IPLB checks the session ID in order to
forward the request to the right server. All the following RTSP requests are handled in
the same manner.
Step 4: RTSP-Server sends Sender Reports [SR] to the Player on the negotiated UDP
port (RTCP). There exists no need for intervention of those packets, since the RTSP
server is the origination of the packet flow.
Page 22 of 27
Step 5: The Player sends Receiver Reports [RR] to the RTSP-Server on the negotiated
UDP port (RTCP). It is essentially that the IPLB forwards those reports to the correct
RTSP server, since they are related to a specific stream. The IPLB forwards the packets
based on the source IP addresses, the UDP ports and the SSRC of the server.
Step 6: The player sends the TEARDOWN method to the server to stop media data
delivery. When the TEARDOWN method is received, the ports associated with this
session can be removed from the IPLB tables.
As you can see, application intelligence is required in order to support the streaming
data flow. I am aware of at least one IPLB operating system that supports RTSP today,
and this is the OS 9.0 upwards from Alteon. Arrowpoint [Cisco] declares the same but I
did not check their equipment yet.
5.4. Dynamic NAT or NAPT in front of the RTSP Player
Although there are exist NAT only devices that can perform several dynamic address
translations, in most organizations you will find firewalls that perform those tasks in
addition to their filtering processes. Hence we will look at the example in Figure 11
where the FW also implements NAPT. Obviously the FW in this example needs to be
RTSP aware in order to support RTSP sessions.
Figure 11: NAPT on client site
207.232.9.10
RTSP-Player RTSP Server 1
Internet
FW
172.16.0.10
207.232.10.1
Dynamic NAT or NAPT
Privat addresses Public address
Following you see the relevant stages of the RTSP session and the required measures
of the FW in order to support the traverse of the packets.
Step 1: Player starts TCP connection with the RTSP-Server and starts RTSP session.
The FW device monitors for the SETUP-phase and should change the source UDP port
Page 23 of 27
information in the message before forwarding. Afterwards it adds two entries into its
NAT table consisting of the RTP and RTCP source addresses found in the client packet.
The player issues a PLAY request to let the streaming begin and as a result the FW
opens pinholes for the RTP and RTCP packets.
Note: It is recommended to use a FQDN (Fully Qualified Domain Name) in the RTSP-URL,
otherwise it could happen that some RTSP-Servers compare the IP address in the RTSP-URL with
the IP address in the IP packet header and reject the RTSP session.
Step 2: Server starts streaming to the client via RTP over UDP. The FW device
translates the received packets according to its entries in the NAT table before
forwarding.
Step 3: Server sends Sender Reports [SR] via RTCP over UDP. The FW device
translates the received packets according to its entries in the NAT table before
forwarding.
Step 4: Player sends Receiver Reports [RR] via RTCP over UDP. The FW device
translates the received packets according to its entries in the NAT table before
forwarding.
Step 5: Player sends TEARDOWN message to the server in order to terminate the
RTSP session. The FW device monitors for the TEARDOWN message and clears the
for this session relevant entries from the NAT table of the device.
For my knowledge, Checkpoint and some other FW vendors include RTSP awareness
in their products and support video streaming.
Note: Be aware that not all streaming servers on the market are compliant to RFC 2326, so that a
FW vendor that declares RTSP support does not support all streaming servers and vice versa.
Page 24 of 27
5.5. FW in front of the RTSP Server
As already explained in the example above, in order to alter the UDP ports of the RTP
and RTCP packet the FW requires application intelligence. Figure 12 gives a schematic
overview of placing a FW in front of the servers. In some circumstances this FW will
realize static NAT, but as demonstrated before this would anyway has no impact.
Figure 12: FW on server site
RTSP-Player
207.232.9.10
RTSP Server
207.232.10.1 Internet
FW
Step 1: RTSP-Player establishes a control connection to the RTSP servers over TCP
port 554 and issues its RTSP requests. FW looks for a SETUP request on a client
packet, that contains the ports of the UDP (RTP and RTCP) connections to follow. It
saves the expected UDP port in the table of allowed communication.
Step 2: FW looks for a packet with RTSP header and then looks for the server ports
and a confirmation on the client ports. It saves the expected UDP ports in the table of
allowed communication.
Step 3: After the receiving of the PLAY request the RTSP server starts the streaming.
For the media stream to traverse the firewall it is necessary to open UDP pinholes for
each call session.
Step 4: The connections are removed from the connection table after a defined timeout
or after the registering of a TEARDOWN request from the client.
Note: Be aware that you can also configure a FW that is not RTSP aware to allow RTSP traffic. In
such cases you would enable the port TCP 554 and all possible UDP ports that are used by RTP
and RTCP. Generally the RTSP servers and Players use defined UDP port ranges. These static
defined holes represent a security risk and I would suggest to avoid this method whenever
possible.
Page 25 of 27
As already mentioned, today there exist firewalls that support RTSP. But we will have to
check the market leaders in order to give a list of RTSP supporting firewalls.
Page 26 of 27
6. REFERENCES
RTP: A transport protocol for real-time applications – RFC 1889
RTSP: Real time streaming protocol – RFC 2326
NAT-TERM: IP network address translator terminology and considerations – RFC 2663
NAT-TRAD: Traditional network address translator – RFC 3022
NAT-COMP: Protocol complications with the IP NAT – RFC 3027
NAT-PT: Network address translation, protocol translation – RFC 2766
FW: Behavior of and Requirements for Internet Firewalls – RFC 2979
Page 27 of 27