Network Infrastructure Design for IP Phone

View With Charts And Images

Network Infrastructure Design for IP Phone

1 Introduction

Let’s start by describing the new type of phone system that businesses are now using to replace traditional phone systems in their offices. An “IP Phone System” (sometimes called an IP PBX) uses the technology of “IP (Internet Protocol)” to carry the voice conversations in your office. This does not necessarily mean it uses the public Internet. An IP Phone System uses IP technology within the private data network of a business in a single location or across a private network.

The same cabling that a business uses for its data network is used to carry the voice traffic of the phone system. In some ways they are totally independent and just sharing the same cabling. In one way they affect each other.

They are independent in that if the data server goes down, the voice will still go through. Your phone system will still work. Likewise if they phone system goes down, the data will still go through.The way the IP Phone System and data network could affect each other is in the capacity or “bandwidth” of the network, both in the office and going to the outside world. Data is “forgiving” meaning it is not time sensitive. If it is delayed by several tenths of a second or seconds to move your data back and forth the quality of the data doesn’t suffer. However, voice is time sensitive. It must occur in “real time” which typically means there can’t be more than 150 milliseconds (0.15 seconds) of delay in moving the voice traffic between its destinations. If the combined voice and data traffic is more than the capacity of the network infrastructure to handle it then the voice quality can suffer.

The network infrastructure consists of the cabling and the equipment throughout the network. IP Telephony on a properly designed, private network has the same voice quality as traditional phone systems. To be “properly designed” the network must include a proper “Quality of Service” plan and execution with the proper equipment. (That discussion is too much to include in this article. Give us a call.)

You can use IP Telephony over your private data network to connect remote sites with multiple workers or remote workers in home offices. If you don’t have a private network between sites you can use the public Internet to access remote sites.

2 Helpful

· Seamless extension dialing between all our locations.

· IP Telephony creates lower cost and greater functionality advantages from carrier services.

· Easily and economically connecting home based workers.

· Easily and economically connecting home based workers.

· Enhanced contact center (call center) responsiveness to customer needs.

· “Contact center” is the replacement term for what used to be “call center.”

· Disaster recovery and power outage backup for business continuity.

· Simplified system administration: Through a GUI (graphical user interface). You can make changes to your system that previously required your telephone equipment vendor to make the changes. Therefore, you can significantly reduce your maintenance costs.

· Easier moves of telephone sets: When moving from one location in your building to another, it previously required re-programming the telephone switch and physically changing some wires in the “telephone closet.” With IP Telephony as you pack up your desk supplies and plants, you also grab your telephone. In your new location, you simply plug the telephone into the Ethernet connection in the wall and then connect your computer to a jack in the phone that acts as a bypass for your data. All your personal settings move with you. Costs for moves are dramatically reduced.

· Software upgrades are much easier: And can be performed by you instead of paying the telephone equipment vendor to do them.

There are many more benefits to IP Telephony but this brief overview should be enough to peak your interest to continue your investigation. You don’t need to make a total swap out of your current phone system. It is possible to gradually introduce an IP Telephone System into your organization and interface it to legacy systems.

3 Voice over IP Overview

There are four new sections about SIP: Introduction to SIP, SIP Messages, SIP Call Flow, and SIP — Session Description Protocol.

The other pieces cover digitization of voice, audio codecs, codec latency vs bandwidth optimization, audio jitter, the Real Time Protocol, introduction to H.323, description of H.323 call flow, and H.323 call signalling optimizations.

4 Echo Canceling

5 Gatekeeper Basic Operations

6 Goal of the Project

IP PHONE OR TELEPHONY SYSTEM ARE 2 TYPE

*LAN, MAN, WAN Networks

&

*WIRELESS Networks.

(Discussed in wired technology)

7 OSI Model

The Open Systems Interconnection model (OSI model) is a product of the Open Systems Interconnection effort at the International Organization for Standardization. It is a prescription of characterizing and standardizing the functions of a communications system in terms of abstraction layers. Similar communication functions are grouped into logical layers. An instance of a layer provides services to its upper layer instances while receiving services from the layer below.

For example, a layer that provides error-free communications across a network provides the path needed by applications above it, while it calls the next lower layer to send and receive packets that make up the contents of that path. Two instances at one layer are connected by a horizontal connection on that layer.

OSI model
7. Application Layer
NNTP · SIP · SSI · DNS · FTP ·Gopher · HTTP · NFS · NTP · SMPP ·SMTP · SNMP · Telnet · DHCP ·Netconf · RTP · SPDY · (more)
6. Presentation Layer
MIME · XDR · TLS · SSL
5. Session Layer
Named Pipes · NetBIOS · SAP · L2TP · PPTP · SOCKS
4. Transport Layer
TCP · UDP · SCTP · DCCP · SPX
3. Network Layer
IP (IPv4, IPv6) · ICMP · IPsec · IGMP ·IPX · AppleTalk
2. Data Link Layer
ATM · SDLC · HDLC · ARP · CSLIP ·SLIP · GFP · PLIP · IEEE 802.3 ·Frame Relay · ITU-T G.hn DLL · PPP ·X.25 · Network Switch ·
1. Physical Layer
EIA/TIA-232 · EIA/TIA-449 · ITU-T V-Series · I.430 · I.431 · POTS · PDH ·SONET/SDH · PON · OTN · DSL ·IEEE 802.3 · IEEE 802.11 ·IEEE 802.15 · IEEE 802.16 · IEEE 1394 · ITU-T G.hn PHY · USB · Bluetooth ·Hubs
OSI Model
Data unit Layer Function
Host

layers

Data 7. Application Network process to application
6. Presentation Data representation, encryption and decryption, convert machine dependent data to machine independent data
5. Session Interhost communication
Segments 4. Transport End-to-end connections, reliability and flow control
Media

layers

Packet/Datagram 3. Network Path determination andlogical addressing
Frame 2. Data Link Physical addressing
Bit 1. Physical Media, signal and binary transmission

Description of OSI layers

According to recommendation X.200, there are seven layers, each generically known as an N layer. An N+1 entity requests services from the layer N entity.

At each level, two entities (N-entity peers) interact by means of the N protocol by transmitting protocol data units (PDU).

A Service Data Unit (SDU) is a specific unit of data that has been passed down from an OSI layer to a lower layer, and which the lower layer has not yet encapsulated into a protocol data unit (PDU). An SDU is a set of data that is sent by a user of the services of a given layer, and is transmitted semantically unchanged to a peer service user.

The PDU at any given layer, layer N, is the SDU of the layer below, layer N-1. In effect the SDU is the ‘payload’ of a given PDU. That is, the process of changing a SDU to a PDU, consists of an encapsulation process, performed by the lower layer. All the data contained in the SDU becomes encapsulated within the PDU. The layer N-1 adds headers or footers, or both, to the SDU, transforming it into a PDU of layer N-1. The added headers or footers are part of the process used to make it possible to get data from a source to a destination.

Some orthogonal aspects, such as management and security, involve every layer.

Security services are not related to a specific layer: they can be related by a number of layers, as defined by ITU-T X.800 Recommendation.<href=”#cite_note-x800-2″>[3]

These services are aimed to improve the CIA triad (i.e.confidentiality, integrity, availability) of transmitted data. Actually the availability of communication service is determined by network design and/or network management protocols. Appropriate choices for these are needed to protect against denial of service.

Layer 1: Physical Layer

The Physical Layer defines electrical and physical specifications for devices. In particular, it defines the relationship between a device and transmission, such as a copper or optical cable. This includes the layout

of pins, voltages, cable specifications, hubs, repeaters, network, host bus adapters (HBA used in storage area networks) and more.

The major functions and services performed by the Physical Layer are:

§ Establishment and termination of a connection to a communications medium.

§ Participation in the process whereby the communication resources are effectively shared among multiple users. For example, contention resolution and flow control.

§ Modulation, or conversion between the representation of digital data in user equipment and the corresponding signals transmitted over a communications channel. These are signals operating over the physical cabling (such as copper and optical fiber) or over a <href=”#Radio_waves” title=”Electromagnetic wave”>radio link.

Parallel SCSI buses operate in this layer, although it must be remembered that the logical SCSI protocol is a Transport Layer protocol that runs over this bus. Various Physical Layer Ethernet standards are also in this layer; Ethernet incorporates both this layer and the Data Link Layer. The same applies to other local-area networks, such as token ring, FDDI, ITU-T G.hn and IEEE 802.11, as well as personal area networks such as Bluetooth and <href=”#Task_group_4_.28Low_Rate_WPAN.29″ title=”IEEE 802.15″>IEEE 802.15.4.

Layer 2: Data Link Layer

The Data Link Layer provides the functional and procedural means to transfer data between network entities and to detect and possibly correct errors that may occur in the Physical Layer. Originally, this layer was intended for point-to-point and point-to-multipoint media, characteristic of wide area media in the telephone system. Local area network architecture, which included broadcast-capable multiaccess media, was developed independently of the ISO work in IEEE Project 802. IEEE work assumed sublayering and management functions not required for WAN use. In modern practice, only error detection, not flow control using sliding window, is present in data link protocols such asPoint-to-Point Protocol (PPP), and, on local area networks, the IEEE 802.2 LLC layer is not used for most protocols on the Ethernet, and on other local area networks, its flow control and acknowledgment mechanisms are rarely used. Sliding window flow control and acknowledgment is used at the Transport Layer by protocols such as TCP, but is still used in niches where X.25 offers performance advantages.

The ITU-T G.hn standard, which provides high-speed local area networking over existing wires (power lines, phone lines and coaxial cables), includes a complete Data Link Layer which provides both error correction and flow control by means of a selective repeat Sliding Window Protocol.

Both WAN and LAN service arranges bits, from the Physical Layer, into logical sequences called frames. Not all Physical Layer bits necessarily go into frames, as some of these bits are purely intended for Physical Layer functions. For example, every fifth bit of the FDDI bit stream is not used by the Layer.

WAN protocol architecture

Connection-oriented WAN data link protocols, in addition to framing, detect and may correct errors. They are also capable of controlling the rate of transmission. A WAN Data Link Layer might implement a sliding window flow control and acknowledgment mechanism to provide reliable delivery of frames; that is the case for Synchronous Data Link Control (SDLC) and HDLC, and derivatives of HDLC such as LAPB andLAPD.

IEEE 802 LAN architecture

Practical, connectionless LANs began with the pre-IEEE Ethernet specification, which is the ancestor of IEEE 802.3. This layer manages the interaction of devices with a shared medium, which is the function of a Media Access Control (MAC) sublayer. Above this MAC sublayer is the media-independent IEEE 802.2 Logical Link Control (LLC) sublayer, which deals with addressing and multiplexing on multiaccess media.

While IEEE 802.3 is the dominant wired LAN protocol and IEEE 802.11 the wireless LAN protocol, obsolescent MAC layers include Token Ring and FDDI. The MAC sublayer detects but does not correct errors.

Layer 3: Network Layer

The Network Layer provides the functional and procedural means of transferring variable length data sequences from a source host on one network to a destination host on a different network, while maintaining the quality of service requested by the Transport Layer (in contrast to the data link layer which connects hosts within the same network). The Network Layer performs network routing functions, and might also perform fragmentation and reassembly, and report delivery errors. Routers operate at this layer—sending data throughout the extended network and making the Internet possible. This is a logical addressing scheme – values are chosen by the network engineer. The addressing scheme is not hierarchical.

The Network Layer may be divided into three sublayers:

1. Subnetwork Access – that considers protocols that deal with the interface to networks, such as X.25;

2. Subnetwork Dependent Convergence – when it is necessary to bring the level of a transit network up to the level of networks on either side

3. Subnetwork Independent Convergence – which handles transfer across multiple networks.

An example of this latter case is CLNP, or IPv7 ISO 8473. It manages the connectionless transfer of data one hop at a time, from end system to ingress router, router to router, and from egress router to destination end system. It is not responsible for reliable delivery to a next hop, but only for the detection of erroneous packets so they may be discarded. In this scheme, IPv4 and IPv6 would have to be classed with X.25 as subnet access protocols because they carry interface addresses rather than node addresses.

A number of layer management protocols, a function defined in the Management Annex, ISO 7498/4, belong to the Network Layer. These include routing protocols, multicast group management, Network Layer information and error, and Network Layer address assignment. It is the function of the payload that makes these belong to the Network Layer, not the protocol that carries them.

Layer 4: Transport Layer

The Transport Layer provides transparent transfer of data between end users, providing reliable data transfer services to the upper layers. The Transport Layer controls the reliability of a given link through flow control, segmentation/desegmentation, and error control. Some protocols are state- and connection-oriented. This means that the Transport Layer can keep track of the segments and retransmit those that fail. The Transport Layer also provides the acknowledgement of the successful data transmission and sends the next data if no errors occurred.

OSI defines five classes of connection-mode transport protocols ranging from class 0 (which is also known as TP0 and provides the least features) to class 4 (TP4, designed for less reliable networks, similar to the Internet). Class 0 contains no error recovery, and was designed for use on network layers that provide error-free connections. Class 4 is closest to TCP, although TCP contains functions, such as the graceful close, which OSI assigns to the Session Layer. Also, all OSI TP connection-mode protocol classes provide expedited data and preservation of record boundaries. Detailed characteristics of TP0-4 classes are shown in the following table:<href=”#cite_note-3″>[4]

Feature Name TP0 TP1 TP2 TP3 TP4
Connection oriented network Yes Yes Yes Yes Yes
Connectionless network No No No No Yes
Concatenation and separation No Yes Yes Yes Yes
Segmentation and reassembly Yes Yes Yes Yes Yes
Error Recovery No Yes Yes Yes Yes
Reinitiate connection (if an excessive number of PDUs are unacknowledged) No Yes No Yes No
Multiplexing and demultiplexing over a single virtual circuit No No Yes Yes Yes
Explicit flow control No No Yes Yes Yes
Retransmission on timeout No No No No Yes
Reliable Transport Service No Yes No Yes Yes

Perhaps an easy way to visualize the Transport Layer is to compare it with a Post Office, which deals with the dispatch and classification of mail and parcels sent. Do remember, however, that a post office manages the outer envelope of mail. Higher layers may have the equivalent of double envelopes, such as cryptographic presentation services that can be read by the addressee only. Roughly speaking, tunneling protocols operate at the Transport Layer, such as carrying non-IP protocols such as IBM‘s SNA or Novell‘s IPX over an IP network, or end-to-end encryption with IPsec. While Generic Routing Encapsulation (GRE) might seem to be a Network Layer protocol, if the encapsulation of the payload takes place only at endpoint, GRE becomes closer to a transport protocol that uses IP headers but contains complete frames or packets to deliver to an endpoint. L2TP carries PPP frames inside transport packet.

Although not developed under the OSI Reference Model and not strictly conforming to the OSI definition of the Transport Layer, the Transmission (TCP) and the User Datagram Protocol (UDP) of the Internet Protocol Suite are commonly categorized as Layer 4 protocols within OSI.

Layer 5: Session Layer

The Session Layer controls the dialogues (connections) between computers. It establishes, manages and terminates the connections between the local and remote application. It provides for full-duplex, half-duplex, or simplex operation, and establishes check pointing, adjournment, termination, and restart procedures. The OSI model made this layer responsible for graceful close of sessions, which is a property of the Transmission Control Protocol, and also for session check pointing and recovery, which is not usually used in the Internet Protocol Suite. The Session Layer is commonly implemented explicitly in application environments that use remote procedure calls.

Layer 6: Presentation Layer

The Presentation Layer establishes context between Application Layer entities, in which the higher-layer entities may use different syntax and semantics if the presentation service provides a mapping between them. If a mapping is available, presentation service data units are encapsulated into session protocol data units, and passed down the stack.

This layer provides independence from data representation (e.g., encryption) by translating between application and network formats. The presentation layer transforms data into the form that the application accepts. This layer formats and encrypts data to be sent across a network. It is sometimes called the syntax layer.<href=”#cite_note-4″>[5]

The original presentation structure used the basic encoding rules of Abstract Syntax Notation One (ASN.1), with capabilities such as converting an EBCDIC-coded text file to an ASCII-coded file, or serialization of objects and other data structures from and to XML.

Layer 7: Application Layer

The Application Layer is the OSI layer closest to the end user, which means that both the OSI application layer and the user interact directly with the software application. This layer interacts with software applications that implement a communicating component. Such application programs fall outside the scope of the OSI model. Application layer functions typically include identifying communication partners, determining resource availability, and synchronizing communication. When identifying communication partners, the application layer determines the identity and availability of communication partners for an application with data to transmit. When determining resource availability, the application layer must decide whether sufficient network or the requested communication exists. In synchronizing communication, all communication between applications requires cooperation that is managed by the application layer. Some examples of application layer implementations also include:

§ On OSI stack:

§ FTAM File Transfer and Access Management Protocol

§ X.400 Mail

§ Common management information protocol (CMIP)

§ On TCP/IP stack:

§ Hypertext Transfer Protocol (HTTP),

§ File Transfer Protocol (FTP),

§ Simple Mail Transfer Protocol (SMTP)

§ Simple Network Management Protocol (SNMP).

8 Call Features

ADSI On-Screen Menu System

Alarm Receiver

Append Message

Authentication

Automated Attendant

Blacklists

Blind Transfer

Call Detail Records

Call Forward on Busy

Call Forward on No Answer

Call Forward Variable

Call Monitoring

Call Parking

Call Queuing

Call Recording

Call Retrieval

Call Routing (DID & ANI)

Call Snooping

Call Transfer

Call Waiting

Caller ID

Caller ID Blocking

Caller ID on Call Waiting

Calling Cards

Conference Bridging

Database Store / Retrieve

Database Integration

Dial by Name

Direct Inward System Access

Distinctive Ring

Distributed Universal Number Discovery (DUNDi™)

Do Not Disturb

E911

ENUM

Fax Transmit and Receive

Flexible Extension Logic

Interactive Directory Listing

Interactive Voice Response (IVR)

Local and Remote Call Agents

Macros

Music On Hold

Music On Transfer:

– Flexible Mp3-based System

– Random or Linear Play

– Volume Control

Predictive Dialer

Privacy

Open Settlement Protocol (OSP)

Overhead Paging

Protocol Conversion

Remote Call Pickup

Remote Office Support

Roaming Extensions

Route by Caller ID

SMS Messaging

Spell / Say

Streaming Media Access

Supervised Transfer

Call Features

Talk Detection

Text-to-Speech (via Festival)

Three-way Calling

Time and Date

Transcoding

Trunking

VoIP Gateways

Voicemail:

– Visual Indicator for Message Waiting

– Stutter Dialtone for Message Waiting

– Voicemail to email

– Voicemail Groups

– Web Voicemail Interface

Zapateller

Computer-Telephony Integration

AGI (Asterisk Gateway Interface)

Graphical Call Manager

Outbound Call Spooling

Predictive Dialer

TCP/IP Management Interface

Scalability

TDMoE (Time Division Multiple over Ethernet)

Allows direct connection of Asterisk PBX

Zero latency

Uses commodity Ethernet hardware

Voice-over IP

Allows for integration of physically separate installations

Uses commonly deployed data connections

Allows a unified dialplan across multiple offices

Speech

Cepstral TTS

Lumenvox ASR

Vestec ASR

Codecs

ADPCM

G.711 (A-Law & ?-Law)

G.719 (pass through)

G.722

G.722.1 licensed from Polycom®

G.722.1 Annex C licensed from Polycom®

G.723.1 (pass through)

G.726

G.729a

GSM

iLBC

Linear

LPC-10

Speex

VoIP Protocols

Google Talk

H.323

IAX™ (Inter-Asterisk exchange)

Jingle/XMPP

MGCP (Media Gateway Control Protocol

SCCP (Cisco® Skinny®)

SIP (Session Initiation Protocol)

Skype

UNIStim

Traditional Telephony Protocols

E&M

E&M Wink

Feature Group D

FXS

FXO

GR-303

Loopstart

Groundstart

Kewlstart

MF and DTMF support

Robbed-bit Signaling (RBS) Types

MFC-R2 (Not supported. However, a patch is available)

ISDN Protocols

AT&T 4ESS

EuroISDN PRI and BRI

Lucent 5ESS

National ISDN 1

National ISDN 2

NFAS

Nortel DMS100

Q.SIG

9 VoIP Basics: Converting Voice to Digital Form

Are you interested in Voice over IP? Would you like to know more about its background? This text begins a series that should shed some light on it.

Let’s start with the beginning. VoIP sends digitized voice across computer networks. So how do we convert voice to the digital form?

When converting an analog signal (be it speech or another noise), you need to consider two important factors: sampling and quantization. Together, they determine the quality of the digitized sound.

· Sampling is about the sampling rate — i.e. how many samples per second you use to encode the sound.

· Quantization is about how many bits you use to represent each sample. The number of bits determines the number of different values you can represent with each sample.

Figures 1 and 2 shows the idea of sampling — Figure 1 is the original analog signal, while Figure 2 shows the digitized form as a sequence of discrete samples.

Figure 1: Analog signal
Figure 2: Digitized signal

10 Quantization

As mentioned above, quantization is about how many bits you use to represent individual sound samples. In practice, we want to work with whole bytes, so let’s consider 8 or 16 bits.

With 8-bit samples, each sample can represent 256 different values, so we can work with whole numbers between -128 and +127. Because of the whole numbers, it is inevitable that we introduce some noise into the signal as we convert it to digital samples. For example, if the exact analog value is “7.44125”, we will represent it as “7”. As we do this with each sample in the sequence, we slightly distort the signal — inject noise, in other words.

It turns out 8-bit samples do not result in a good quality. With only 256 sample values, the analog-to-digital conversion adds too much noise. The situation improves a lot if we switch to 16-bit samples as 16 bits give us 65536 different representations (from -32768 to +32767). 16-bit samples are what you will find on a CD and what VoIP codecs use as their input.

11 Sampling

Now that we have decided what sample size to use (16 bits), let’s look at sampling rates. The table below shows three frequently used sampling rates:

Type Transmitted Bandwidth Sampling Frequency
Telephone Speech 300-3400 Hz 8 kHz
Wide Band Speech 50-7000 Hz 16 kHz
CD quality audio 20-20000 Hz 44.1 kHz

With VoIP, you will most frequently encounter the sampling rate of 8 kilohertz. The frequency of 16 kHz can be used now and then in situations when a higher quality audio is required (with proportionally higher Internet bandwidth consumption).

The choice of sampling frequencies for the individual types of audio is not random. There is a rule (based on the work of Nyquist and Shanon) that the sampling frequency needs to be equal or greater than two times the transmitted bandwidth. Figures 3 and 4 show why this is required.

Figure 3

In Figure 3, the sinusoid represents the original analog sound. The large black dots are where we read our samples. Note that we take two samples in each period, i.e. the sampling rate is two times the frequency of the sound. This is the absolute minimum that will allow us to reconstruct a signal that is still comprehensible. It certainly won’t be a hi-fi sound but it will have the correct frequency – see the thin black lines in the picture.

The Figure 4 shows a situation where we take less than two samples per period. The thin black lines show what would happen after we feed the samples into a digital-to-analog converter — we would hear something different from the original, a sound with lower frequency. This problem is known as “aliasing” since the lower frequency appears to be an “alias” to the original correct one.

12 VoIP Protocols: Introducing SIP

The Session Initiation Protocol (SIP for short) is a Voice over IP protocol designed by the Internet Engineering Task Force. SIP was created by the MMUSIC group of the IETF (MMUSIC stands for Multi-party Multimedia Session Control). Formally, the protocol is intended for creating, modifying and terminating sessions with one or more participants. The sessions are mainly VoIP telephone calls or conferences.

The first version of SIP was published in 1999 in RFC2543 with the two main authors being Mark Handley and Henning Schulzrinne. The standard was updated to version 2.0 in 2002 with RFC3261 and naturally there were many subsequent updates and extensions (RFC3265, RFC3853, RFC4320, RFC4916, RFC5393, RFC5621, RFC5626, RFC5630).

13 SIP Characteristics

Unlike H.323, SIP is a text-based protocol. The formatting of SIP requests and responses is based on HTTP version 1.1. Endpoints that communicate using SIP use the following three protocols:

SIP itself, used to establish and terminate the session; Session Description Protocol (SDP for short, RFC2327, obsoleted by RFC4566), used to exchange information about audio/video channels. Like SIP, SDP is also a product of the IETF’s MMUSIC group; RTP, used to send the real-time streams of audio or video across the network.SIP messages are exchanged between endpoints in transactions. A transaction consists of a request and the related response or responses. The messages that belong to the same transaction share the same transaction ID. This ID is called CSeq in SIP. Each transaction should have a unique CSeq number, with only a single exception: the ACK message (ACK for “acknowledge”) uses the same CSeq number as the transaction which it applies to.

SIP can use either UDP or TCP as the underlying transport protocol. Originally (in RFC2543), UDP was the only mandatory option. According to RFC3261 from 2002, all endpoints must be able to send SIP messages over both UDP and TCP. Still, UDP is the more frequently used option. When communicating over TCP, two modes are possible: either the same TCP channel is used for all transactions of a session or a new TCP connection is established for each individual transaction.

13 The SIP Protocol

The Session Initiation Protocol (SIP) is a protocol for establishing real time communication sessions with one or more participants. It’s most frequently used for Voice communications but it can handle video as well, as well as future applications. SIP was designed to be independent of the transport layer, i.e it can work on UDP, TCP or STCP. All voice/video communications take place via another protocol, usually RTP.

There are many RFCs surrounding SIP, but the most important one is RFC 3261

SIP is a text based protocol that looks and acts very much like the HTTP protocol. The original designers (Henning Schulzrinne& Mark Handley) wanted to make a protocol that had its roots in the IP world, rather then in the telecoms world. Sip has been an amazing success, beingthe major driver in the adoption of VOIP and Computer Telephony in recent years. All major manufacturers have adopted the standard and availability of SIP software, SIP hardware and Sip service providers is widespread.

Sip servers are responsible for setting up the calls between Sip devices. SIP servers usually combine several of the SIP server functions such as SIP proxy and SIP register into one piece of software. 3CX Phone System is both SIP proxy, a SIP registrar as well as a media server in order to handle real time voice communications as well.

14 Registration

Before we describe the flow of a typical SIP call, let’s have a look at how SIP user agents register with a SIP registrar. The example below shows a situation where an SIP softphone (namely, the Ekiga client) registers with an Asterisk PBX. The Asterisk’s IP address is 10.10.1.99, while the client is at 10.10.1.13 and wants to register the telephone number 13.

In order to register, the SIP telephone needs the send the REGISTER request:

SIP registration, phase 1

The registrar server will immediately reply with the provisional response “100 Trying”. This indicates that the request has been received (and thus the client does not need to retransmit it) and that it is being processed. While processing the request, the registrar discovers that the user agent needs to authenticate. It therefore responds with “401 Unauthorized”. For the user agent, this means that it has to send the REGISTER request once more, this time providing authentication.

Let’s have a look at the detail of the messages. This is the text of the register message:

REGISTER sip:10.10.1.99 SIP/2.0

CSeq: 1 REGISTER

Via: SIP/2.0/UDP 10.10.1.13:5060;

branch=z9hG4bK78946131-99e1-de11-8845-080027608325;report

User-Agent: Ekiga/3.2.5

From: <sip:13@10.10.1.99>

;tag=d60e6131-99e1-de11-8845-080027608325

Call-ID: e4ec6031-99e1-de11-8845-080027608325@vvt-laptop

To: <sip:13@10.10.1.99>

Contact: <sip:13@10.10.1.13>;q=1

Allow: INVITE,ACK,OPTIONS,BYE,CANCEL,SUBSCRIBE,NOTIFY,REFER,MESSAGE,

INFO,PING

Expires: 3600

Content-Length: 0

Max-Forwards: 70

We probably do not need to show the “100 Trying” response. The text of the “401 Unauthorized” message is as follows:

SIP/2.0 401 Unauthorized

Via: SIP/2.0/UDP 10.10.1.13:5060;

branch=z9hG4bK78946131-99e1-de11-8845-080027608325;

received=10.10.1.13;rport=5060

From: <sip:13@10.10.1.99>;

tag=d60e6131-99e1-de11-8845-080027608325

To: <sip:13@10.10.1.99>;tag=as5489aead

Call-ID: e4ec6031-99e1-de11-8845-080027608325@vvt-laptop

CSeq: 1 REGISTER

User-Agent: Asterisk PBX

Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER,

SUBSCRIBE, NOTIFY

Supported: replaces

WWW-Authenticate: Digest algorithm=MD5, realm="asterisk",

nonce="343eb793"

Content-Length: 0

In the “401 Unauthorized” response, the important header is WWW-Authenticate:. It instructs the client to authenticate using the digest authentication (RFC2617). The nonce (a short for “number used once”) parameter is a “challenge string”. The client will combine the challenge string with the user’s password and compute the MD5 hash of the resulting string. The server will compute its own hash using the same method and compare it with the MD5 hash provided by the client. The digest authentication is the most frequently used method because the password is never sent over the network in plain text. The “basic” authentication has been deprecated in SIP 2.0 as it is insecure (sending a password in plain text is generally a bad idea).

Once the client computes the MD5 digest, it will re-send the REGISTER request. The message will look like this:

REGISTER sip:10.10.1.99 SIP/2.0

CSeq: 2 REGISTER

Via: SIP/2.0/UDP 10.10.1.13:5060;

branch=z9hG4bK32366531-99e1-de11-8845-080027608325;rport

User-Agent: Ekiga/3.2.5

Authorization: Digest username="test13", realm="asterisk",

nonce="343eb793", uri="sip:10.10.1.99", algorithm=MD5,

response="6c13de87f9cde9c44e95edbb68cbdea9"

From: <sip:13@10.10.1.99>;

tag=d60e6131-99e1-de11-8845-080027608325

Call-ID: e4ec6031-99e1-de11-8845-080027608325@vvt-laptop

To: <sip:13@10.10.1.99>

Contact: <sip:13@10.10.1.13>;q=1

Allow: INVITE,ACK,OPTIONS,BYE,CANCEL,SUBSCRIBE,NOTIFY,REFER,

MESSAGE,INFO,PING

Expires: 3600

Content-Length: 0

Max-Forwards: 70

The registrar server will again first respond with “100 Trying” and then compare the two MD5 hashes (the one provided by the client with the one computed by the registrar itself). If they match, the registrar will respond with “200 OK” and insert the endpoint to the location database. The database is usually shared between the registrar and the proxy server so that the proxy can use it connects calls.

The figure below shows the message exchange:

SIP registration, phase 2

The response “200 OK” contains one important parameter, Expires. It tells the client that the registration will expire after the given number of seconds and the client will be required to register again.

Call Flow

Let us now have a look at a typical SIP call. We will consider a scenario with a SIP proxy server involved. Suppose a user at the SIP telephone with number 121 dials the number 122. The following will happen:

1. The user agent in telephone 121 does not know the IP address of 122. But it knows the IP address of the SIP proxy (suppose this address is 10.10.1.99). The user agent will compose an INVITE request and send it to the proxy. The To:header of the request contains the SIP URI <sip:122@10.10.1.99>. The body of the INVITE request carries an SDP (Session Description Protocol) message providing the parameters (codec, IP address, port) the called party will need to send its RTP stream to the caller. See the previous section for an <href=”#invite_example”>example of the INVITE request.

2. The SIP proxy immediately responds with “100 Trying” and then forwards the INVITE request to the target telephone. The proxy server adds one Via: header to the message. As<href=”#location_service”>mentioned before, the SIP proxy has access to the location database and thus knows the IP addresses of all registered telephones (the simplest implementation of this is such that the registrar server and the proxy are the same application).

Steps 1 and 2 are shown in Figure A below.

Figure A