CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
TOWARDS GLITCH-FREE VOIP AND VIDEO CONFERENCING JIN LI MICROSOFT RESEARCH
Outline 2
Jin Li, Microsoft Research
Introduction Anatomy of VoIP and Video Conferencing Systems Audio/Video Components Network Components Summary
1
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
3
1/12/2010
Introduction
Booming of IP Based Communication 4
Jin Li, Microsoft Research
Advanced voice over IP (VoIP) Web-, audio-, video-conferencing Tele-presence Instant messaging Calendar and other PIM functions Email, fax and voice mail
2
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Worldwide VoIP subscribers 5
• Worldwide VoIP service revenue was $24.1B in 2007, up 52% over 2006. • It is expected that worldwide VoIP service to more than double over the next 4 years, to $61.3B in 2011, with an annual growth rate of 26%. Source: 2008 Infonetics Research Inc,
US Broadband Telephony Forecast, 2007-2013 6
VoIP subscriber base are predicted to double from 2007 to 2013. Source: Jupiter Research, US Broadband Telephony Forecast, 2008 to 2013
Jin Li, Microsoft Research
3
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
VoIP Trend 7
IP networks are the next gen networks for all forms of communication. Broadband penetration is a key driver of VoIP expansion
Worldwide DSL subscriptions were at 205.9M at the end of 2007, up 23% from 2011. It is predicted to increase to 363.6M in 2011. Cable subscriptions were up 15% annually to 68M at the end of 2007, climbing to 97.3M in 2011. Passive Optical Network (PON) subscribers were at 10.9M in 2007 Ethernet FTTH subscribers were at 1.7M in 2007 2004/2005 are breakthrough years for VoIP adoption
High End Systems – Tele-Presence 8
Cisco Telepresence $299K
HP Halo $425K + $18K/mo
Jin Li, Microsoft Research
Tandberg Experia $225K
Polycom RPX210M $269K + $18.5K/mo
4
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Worldwide Tele-presence Forecast (2006-2012) 9
# of end points
Revenue forecast
Source: 2008 IDC Research
Desktop Video Conferencing 10
Multiple solutions, often acted as add on to VoIP
Benefit
See faces of people you may not have met before See facial expressions & gestures Easier to follow a conversation More interactive than phone Get the general mood of ambience See and show documents/objects
Drawback
Difficult to setup and planning Network reliability
Interpersonal factors
Jin Li, Microsoft Research
Without(or poor) video, people talk; without(or poor) audio, people walk.
5
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
11
1/12/2010
Anatomy of VoIP and Video Conferencing Systems
Infrastructure vs. P2P 12
Infrastructure based Microsoft
Unified Communication
P2P based Skype
Cisco Gtalk
Jin Li, Microsoft Research
6
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
13
1/12/2010
Infrastructure Based VoIP: Microsoft Unified Communication
Unified Communication: Architecture 14
Jin Li, Microsoft Research
7
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Unified Communication: P2P Call 15
Key Steps 16
Alice calls Bob
Find Bob’s registered SIP endpoints
Jin Li, Microsoft Research
8
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Unified Communication: To VoiceMail 17
Key Steps 18
Alice calls Bob
Find Bob’s registered SIP endpoints
Bob doesn’t answer after a certain period, call re-routes
Jin Li, Microsoft Research
Voicemail system plays a greeting, records Alice’s msg, send the msg to Bob’s email, and use speech server to transcribe the msg
9
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Unified Communication: PSTNUC 19
Key Steps 20
PSTN user Alice calls Bob
IP-PSTN gateway terminates the call
Jin Li, Microsoft Research
MS/Gateway routes call to mediation server, which performs transcoding & ICE, etc.. Through director, the proper UC client is found
10
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
P2P VoIP: Skype
21
P2P VoIP: Skype 22
Information Debut:
08/2003, by N. Zennstrom and J. Friis, who founded KaZaA A P2P overlay network for VoIP and other app Free intra-net VoIP and fee-based SkypeOut/SkypeIn
Jin Li, Microsoft Research
11
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Skype Usage (Apr. 2008) 23
11 million concurrent Skype users on line in peak time (180,000+ simultaneous calls) 309 million registered users worldwide, the largest registered user base within eBay portfolio (33 million added users for Q1FY08) $126M revenue in Q1FY08 (61% YOY growth, 5.6 billion SkypeOut minutes in FY2007) 100 billion cumulative Skype-to-Skype minutes
Skype Share of International VoIP Traffic 24
Jin Li, Microsoft Research
12
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Skype Gadget 25
IPDRUM mobile Skype Cable Motorola CN620 WiFi Cellphone
IPEVO Free-1 USB Skype Phone
Netgear Skype Wi-Fi Phone 50 hardware partners, 150+ Skype certificated device.
USB Mouse with Phone
Skype vs. VoIP 26
Public VoIP standard H.323,
SIP
Skype is a proprietary VoIP solution Rely
on P2P network for user directory
Scalable
Route
calls through supernodes in Skype
Universal
Encrypted
Jin Li, Microsoft Research
without costly infrastructure firewall/NAT traversal
traffic (but you have to trust eBay/Skype)
13
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Skype Ingredient (1) 27
User retrieves ID from a skype server
Skype Network 28
Skype Server authentication
Supernode Overlay:
Jin Li, Microsoft Research
any computer w/ sufficient CPU, memory & network bw & not behind firewall For distributed directory service Relay traffic for computer behind NAT/firewall
14
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
NAT Traversal (Skype) 29
NAT/Firewall detection Try
UDP connection Try TCP connection (arb port, 80 (http), 443(https) )
Traversal Direct
connection if a) both clients have no NAT, b) one client has no NAT, and one behind cone-NAT Relay by supernode otherwise Since Skype doesn’t need to pay for relay cost High
bitrate wideband voice codec (>24kbps)
Skype : Call Routing Through Supernode 30
Skype Server authentication
Supernode Overlay:
Route
call through supernodes High bitrate wideband voice codec (>24kbps)
Jin Li, Microsoft Research
15
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Skype Encryption 31
Peer 1 Peer 2
256-bit
AES over 128 bit data block 1536/2048 RSA for key negotiation (2048/2048 for paid service)
Skype: Complete Black box (Security by Obfuscation ) 32
Almost everything is obfuscated Many protections, anti-debugging tricks, ciphered code Avoid static disassembly: xor binary with a hard-coded key, erasure beginning of the code, own packer Code integrity check: use checksum to avoid breakpoint Anti-debugging technique: anti softice, integrity check Code obfuscation Network obfuscation
Jin Li, Microsoft Research
16
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
33
1/12/2010
Audio/Video Component
Audio/Video Component 34
Jin Li, Microsoft Research
Audio Codec Video Codec Acoustic Echo Cancellation
17
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
35
1/12/2010
Audio Codec
G.711 (PCM)
Still widely used today: PSTN interface If uniform quantization 12 bits * 8 k/sec = 96 kbps Non-uniform quantization 65 kbps DS0 rate North America: µ-law Other
MOS
Jin Li, Microsoft Research
countries: A-law of about 4.3
µ = 255 , A = 87.6
18
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
G.722.1: Siren
Audio bandwidth: 14 kHz Sample rate: 32 kHz Bit rate: 24, 32, and 48 kbit/s Algorithm: Transform coding (Siren14TM) Frame size: 20 ms Algorithmic delay: 40 ms Complexity: > (qbits +1) sign (ZD(i, j)) = sign (YD(i, j))
|ZD(i, j)| = (|YD(i, j)| MF(0,0) + 2f ) >> (qbits +1) sign (ZD(i, j)) = sign (YD(i, j))
CAVLC: Context-Based Adaptive Variable Length Coding 68
Characteristics: Run-level
coding to compact zero string Trailing ones (+1, -1 after 0) Number of nonzero coefficient in neighboring blocks is correlated Choice VLC lookup table for level parameter for level magnitude
Jin Li, Microsoft Research
34
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
CAVLC Encoding 69
1. Encode the number of coefficients and trailing ones (coeff token)
TotalCoeffs : 0 ~ 16 TrailingOnes : 0 ~ 3
Four look up table
if more than 3 TrailingOnes, only last three are treated as ‘special cases’ Three variable-length, one fixed-length Choice depend on neighboring blocks
2. Encode the sign of each TrailingOne: In reverse order 3. Encode the levels of the remaining nonzero coefficients
level_prefix, level_suffix
4.Encode the total number of zeros before the last coefficient
5. Encode each run of zeros
70
Jin Li, Microsoft Research
Zero-runs at start of the array need not to be encoded If less then 3 TrailingOnes, the first nonzero coefficient is adjusted
Acoustic Echo Cancellation
35
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Acoustic Echo Cancellation 71
From Audio Decoder
To Audio Encoder Acoustic Echo Cancellation
Acoustic Echo Cancellation Module 72
Jin Li, Microsoft Research
36
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Adaptive Traversal Filter 73
FIR filter – inherently stable
Short Filters
Length of the filter affects other performance, convergence, goodness, and complexity. Filter introduces errors since it is trying to model IIR response. 128 – 256 coefficients (taps) Faster convergence, but final solution has more residual error Less complex O(N).
Long Filters
512-1024 Slower convergence, but final solution has less error. More complex, as algorithm can be O(N2)
Challenges 74
Dynamic range of the human ear = 120dB.
Longer delays from satellite (300-500ms), VoIP
More difficult for the AEC to remain converged.
Nonlinear echo components
Jin Li, Microsoft Research
Ear is more sensitive to longer delays. More difficult to find the beginning of the echo. Long filters (~1000 taps) are needed (complexity & convergence)
Near-end noise: corrupt the echo, decreasing the cancellers ability to converge. Acoustic echo paths can change rapidly
Even quiet echoes can be heard.
Speakers driven beyond linear region.
37
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
75
1/12/2010
Network Component
IP-based VoIP / Video Conference 76
Jin Li, Microsoft Research
38
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
77
1/12/2010
Internet Primer
Internet : Grand View 78
Jin Li, Microsoft Research
39
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Impact on ISPs 79
transit
peering entity boundary
Economics of ISP relationships sibling
relationship
several
ISPs belong to same org
peering peering
relationship
mutual
beneficial free agreement (to certain extent)
sibling
sibling entity boundary
transit one
relationship
ISP pays another
Inside ISP 80
Jin Li, Microsoft Research
40
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
ISP POP (Point of Presence) 81
Home Networking 82
Jin Li, Microsoft Research
41
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
83
1/12/2010
Network Characteristics
Under-provisioned Links 84
Branch
Jin Li, Microsoft Research
Branch
42
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Growth Trends 85
Packet Loss vs. Jitter (vs. Delay?) 86
Jin Li, Microsoft Research
43
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
The Usual Suspects 87
Packet Bursts 88
Jin Li, Microsoft Research
44
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
What kind of Enterprise User? 89
How QoS can help 90
Jin Li, Microsoft Research
45
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
QoS helps inside and between branches! 91
Observation 92
Jin Li, Microsoft Research
IP-based communication in the enterprise is growing Empirical results show poor calls for Wireless and VPN users QoS (DiffServ) is both used and useful!
46
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
93
1/12/2010
Available Bandwidth Estimation
What is Available Bandwidth (ABW)? 94
Jin Li, Microsoft Research
ABW is the left-over capacity along an Internet path
47
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Why Is It Useful?
Maximizing QoE (Quality of Experience) in A/V conferencing Audio prefers minimum delay (high priority) Video prefers maximum rate (low priority)
One Way Delay (OWD) = propagation delay (constant) + queuing delay (variable)
One solution: measure ABW, encode and send video at the ABW rate
Typical Targeting Scenario
First hop is the bottleneck Cable
Jin Li, Microsoft Research
modem, DSL, high-speed link…
Timescale for the ABW estimation: 2-4 seconds
48
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Why Is Measuring ABW Hard?
Available bandwidth changes over time
ABW measurements must be quick
Audio packets (along the same path) should experience minimum delay
Measurement must be non-intrusive
Two Models
Probe Rate Model (PRM) based solutions Pathload,
Probe Gap Model (PGM) based solutions Spruce,
Jin Li, Microsoft Research
TOPP, Pathchirp, Bfind, PTR …
Delphi, IGI, Moseab …
49
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Pathload (PRM) [Jain & Dovrolis]
Send probe trains at various rates ABW is the probe rate at transition, where OWD is increasing (queuing delay is observed)
Spruce (PGM) [Jacob et. al.]
Jin Li, Microsoft Research
Send probe pairs/train at Ri (Ri > A), measure sending gaps and receiving gaps Compute A directly
50
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Advantage/Disadvantages of The Approaches PGM based approaches PRM based approaches
102
Jin Li, Microsoft Research
Advantages Fast estimation:
Disadvantages Assumptions are not easy to verify in practice
Estimation can be done in single probe. Slow estimation: No assumption iterative probes
Forward Error Correction
51
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Block Based Erasure Resilient Coding 103
Original data:
1
2
3
k
ERC:
1
2
3
k
At a certain instance
X
X
X
k messages n
k+1
X
X
X
Some of the blocks may be lost in delivery. However, as long as there are at least k blocks delivered, the original data can be reconstructed.
ERC in VoIP and Video Conferencing 104
VoIP Mainly
packet replication, due to small VoIP packet size & low delay requirement
Video Conferencing Packet
loss protection (for I frame or P frame in HD) Each frame is separate into k msg, and protect by n-k msg. As long as there are less than n-k loss, the transmission succeeds
Jin Li, Microsoft Research
52
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
ERC Terms 105
Number of Original Block: k Number of Coded Block: n Rate of ERC: k/n MDS: Maximum Distance Separable Any
k of n coded block may recover the original The theoretical optimal performance
Erasure Encoding: Mathematics Original data:
x1
x2
Coded data:
y1
y2
xk yn
: Vectors on Galois Field. 106
Jin Li, Microsoft Research
53
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Example: ERC of 10MB Original data (10MB): Coded data: (n=30)
x1
x2
y1
y2
xk k=10, GF(28), each vector is 1MB. yn
30
10
1M
1M 107
Erasure Decoding: Mathmatics 108
Original data:
x1
x2
Coded data:
y1
y2
xk yn Available Code select
Jin Li, Microsoft Research
54
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Erasure Decoding: Mathmatics 109
Original data:
x1
x2
Coded data:
y1
y2
xk yn
Original data can be recovered if the sub-generator matrix has a full rank k.
Systematic vs Non-Systematic ERC 110
k messages
Original data:
1
2
3
k
Non systematic ERC:
1
2
3
k
k+1
n
Systematic ERC:
1
2
3
k
k+1
n
Systematic ERC Slightly
low encoding & decoding complexity Even can’t recover, we can still use some original msg
Jin Li, Microsoft Research
55
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Reed-Solomon 111
Has been around for decades Has systematic form Cauchy Reed-Solomon Code
Tutorial, Jin Li
Reed-Solomon Decoding
Inverse
Receive
112
Jin Li, Microsoft Research
56
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
113
1/12/2010
Dejitter Buffer
Variable Delay & Dejitter Buffer Queuing Delay
Queuing Delay
Queuing Delay
Dejitter Buffer
Queuing delay Dejitter buffers Variable packet sizes
Jin Li, Microsoft Research
57
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Fixed Dejitter Buffer – Budget For Worst Case Coder Queuing Delay Delay 40 ms 4-50 ms Site A
Dejitter Buffer 50 ms Site B
Propagation Delay—8 ms (128kbps Bandwidth
Total End-to-End Delay Codec delay: 40ms Propagation delay: 8ms Dejitter buffer: 50ms
To accommodate queuing delay: 0-50 ms
Total delay: 98ms
Dejitter Buffer Size & Late Loss late loss
buffering delay
Playout Jitter
Delay
Jin Li, Microsoft Research
Fixed playout deadline and jitter absorption: The playout rate is constant The tradeoff is between Dejitter buffer size and late loss
Packet Loss
58
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Adaptive Playout and Dejitter Buffer Adaptation
buffering delay
Playout Jitter
Adaptive playout and jitter adaptation Scaling of voice/video packets in highly dynamic way Playout schedule set according to past delays recorded
Delay
Packet Loss
Usually dejitter buffer size expand quickly to late packet arrival, and shrink slowly when jitter reduces
Improved tradeoff between buffering delay and late loss Playout rate is not constant
Adaptive Play Out 118
Audio Adaptive Playout
Jin Li, Microsoft Research
Packets push into Adaptive Playout module Render requests new waveform seg for playout Playout module passes packet to audio decoder
59
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
119
1/12/2010
Packet Loss Concealment
Audio Packet Loss Concealment L i-2
i-1
∆L i lost
i+1
i+2
time
alignment found by correlation i-2
i-1
i+1
i+2
time
2L 1.3 L
Depend on voiced & unvoiced segment
Jin Li, Microsoft Research
60
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Voiced segments
Unvoiced segments
Jin Li, Microsoft Research
61
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Concealment as (bi-directional) stretching
Video Packet Loss Concealment 124
Spatial Concealment Use
spatial correlation
E.g.,
bilinear interpolation Projection onto convex sets
Temporal Concealment Use
correlation exists between consecutive frames
Temporal
replacement Boundary matching
Jin Li, Microsoft Research
62
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Spatial-Temporal Concealment 125
126
Jin Li, Microsoft Research
Summary
63
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing
1/12/2010
Summary 127
VoIP/Video Conference Systems
Audio/Video Components
Audio codec Video codec Acoustic echo cancellation
Network components
Jin Li, Microsoft Research
Infrastructure based P2P based
Primer of the Internet Network characteristics Available bandwidth estimation Forward error correction (FEC) Dejitter buffer Packet loss concealment
64