TOWARDS GLITCH-FREE VOIP AND VIDEO CONFERENCING

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010 TOWARDS GLITCH-FREE VOIP AND VIDEO CONFERENCING JIN LI MICROSOFT RESEA...
Author: Vernon Burke
4 downloads 2 Views 6MB Size
CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

TOWARDS GLITCH-FREE VOIP AND VIDEO CONFERENCING JIN LI MICROSOFT RESEARCH

Outline 2

    

Jin Li, Microsoft Research

Introduction Anatomy of VoIP and Video Conferencing Systems Audio/Video Components Network Components Summary

1

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

3

1/12/2010

Introduction

Booming of IP Based Communication 4

     

Jin Li, Microsoft Research

Advanced voice over IP (VoIP) Web-, audio-, video-conferencing Tele-presence Instant messaging Calendar and other PIM functions Email, fax and voice mail

2

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Worldwide VoIP subscribers 5

• Worldwide VoIP service revenue was $24.1B in 2007, up 52% over 2006. • It is expected that worldwide VoIP service to more than double over the next 4 years, to $61.3B in 2011, with an annual growth rate of 26%. Source: 2008 Infonetics Research Inc,

US Broadband Telephony Forecast, 2007-2013 6

VoIP subscriber base are predicted to double from 2007 to 2013. Source: Jupiter Research, US Broadband Telephony Forecast, 2008 to 2013

Jin Li, Microsoft Research

3

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

VoIP Trend 7





IP networks are the next gen networks for all forms of communication. Broadband penetration is a key driver of VoIP expansion 

   

Worldwide DSL subscriptions were at 205.9M at the end of 2007, up 23% from 2011. It is predicted to increase to 363.6M in 2011. Cable subscriptions were up 15% annually to 68M at the end of 2007, climbing to 97.3M in 2011. Passive Optical Network (PON) subscribers were at 10.9M in 2007 Ethernet FTTH subscribers were at 1.7M in 2007 2004/2005 are breakthrough years for VoIP adoption

High End Systems – Tele-Presence 8

Cisco Telepresence $299K

HP Halo $425K + $18K/mo

Jin Li, Microsoft Research

Tandberg Experia $225K

Polycom RPX210M $269K + $18.5K/mo

4

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Worldwide Tele-presence Forecast (2006-2012) 9

# of end points

Revenue forecast

Source: 2008 IDC Research

Desktop Video Conferencing 10 

Multiple solutions, often acted as add on to VoIP



Benefit      



See faces of people you may not have met before See facial expressions & gestures Easier to follow a conversation More interactive than phone Get the general mood of ambience See and show documents/objects

Drawback 

Difficult to setup and planning Network reliability



Interpersonal factors





Jin Li, Microsoft Research

Without(or poor) video, people talk; without(or poor) audio, people walk.

5

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

11

1/12/2010

Anatomy of VoIP and Video Conferencing Systems

Infrastructure vs. P2P 12



Infrastructure based  Microsoft

Unified Communication



P2P based  Skype

 Cisco  Gtalk

Jin Li, Microsoft Research

6

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

13

1/12/2010

Infrastructure Based VoIP: Microsoft Unified Communication

Unified Communication: Architecture 14

Jin Li, Microsoft Research

7

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Unified Communication: P2P Call 15

Key Steps 16



Alice calls Bob



Find Bob’s registered SIP endpoints

Jin Li, Microsoft Research

8

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Unified Communication: To VoiceMail 17

Key Steps 18



Alice calls Bob



Find Bob’s registered SIP endpoints

Bob doesn’t answer after a certain period, call re-routes



Jin Li, Microsoft Research

Voicemail system plays a greeting, records Alice’s msg, send the msg to Bob’s email, and use speech server to transcribe the msg

9

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Unified Communication: PSTNUC 19

Key Steps 20



PSTN user Alice calls Bob



IP-PSTN gateway terminates the call





Jin Li, Microsoft Research

MS/Gateway routes call to mediation server, which performs transcoding & ICE, etc.. Through director, the proper UC client is found

10

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

P2P VoIP: Skype

21

P2P VoIP: Skype 22



Information  Debut:

08/2003, by N. Zennstrom and J. Friis, who founded KaZaA  A P2P overlay network for VoIP and other app  Free intra-net VoIP and fee-based SkypeOut/SkypeIn

Jin Li, Microsoft Research

11

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Skype Usage (Apr. 2008) 23









11 million concurrent Skype users on line in peak time (180,000+ simultaneous calls) 309 million registered users worldwide, the largest registered user base within eBay portfolio (33 million added users for Q1FY08) $126M revenue in Q1FY08 (61% YOY growth, 5.6 billion SkypeOut minutes in FY2007) 100 billion cumulative Skype-to-Skype minutes

Skype Share of International VoIP Traffic 24

Jin Li, Microsoft Research

12

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Skype Gadget 25

IPDRUM mobile Skype Cable Motorola CN620 WiFi Cellphone

IPEVO Free-1 USB Skype Phone

Netgear Skype Wi-Fi Phone 50 hardware partners, 150+ Skype certificated device.

USB Mouse with Phone

Skype vs. VoIP 26



Public VoIP standard  H.323,



SIP

Skype is a proprietary VoIP solution  Rely

on P2P network for user directory

 Scalable

 Route

calls through supernodes in Skype

 Universal

 Encrypted

Jin Li, Microsoft Research

without costly infrastructure firewall/NAT traversal

traffic (but you have to trust eBay/Skype)

13

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Skype Ingredient (1) 27

User retrieves ID from a skype server

Skype Network 28

Skype Server authentication

Supernode Overlay:



 

Jin Li, Microsoft Research

any computer w/ sufficient CPU, memory & network bw & not behind firewall For distributed directory service Relay traffic for computer behind NAT/firewall

14

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

NAT Traversal (Skype) 29



NAT/Firewall detection  Try

UDP connection  Try TCP connection (arb port, 80 (http), 443(https) ) 

Traversal  Direct

connection if a) both clients have no NAT, b) one client has no NAT, and one behind cone-NAT  Relay by supernode otherwise  Since Skype doesn’t need to pay for relay cost  High

bitrate wideband voice codec (>24kbps)

Skype : Call Routing Through Supernode 30

Skype Server authentication

Supernode Overlay:

Route

call through supernodes High bitrate wideband voice codec (>24kbps)

Jin Li, Microsoft Research

15

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Skype Encryption 31

Peer 1 Peer 2

 256-bit

AES over 128 bit data block  1536/2048 RSA for key negotiation (2048/2048 for paid service)

Skype: Complete Black box (Security by Obfuscation ) 32



Almost everything is obfuscated Many protections, anti-debugging tricks, ciphered code  Avoid static disassembly: xor binary with a hard-coded key, erasure beginning of the code, own packer  Code integrity check: use checksum to avoid breakpoint  Anti-debugging technique: anti softice, integrity check  Code obfuscation  Network obfuscation 

Jin Li, Microsoft Research

16

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

33

1/12/2010

Audio/Video Component

Audio/Video Component 34

  

Jin Li, Microsoft Research

Audio Codec Video Codec Acoustic Echo Cancellation

17

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

35

1/12/2010

Audio Codec

G.711 (PCM)   

Still widely used today: PSTN interface If uniform quantization  12 bits * 8 k/sec = 96 kbps Non-uniform quantization  65 kbps DS0 rate  North America: µ-law  Other

 MOS

Jin Li, Microsoft Research

countries: A-law of about 4.3

µ = 255 , A = 87.6

18

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

G.722.1: Siren        

Audio bandwidth: 14 kHz Sample rate: 32 kHz Bit rate: 24, 32, and 48 kbit/s Algorithm: Transform coding (Siren14TM) Frame size: 20 ms Algorithmic delay: 40 ms Complexity: > (qbits +1) sign (ZD(i, j)) = sign (YD(i, j))

|ZD(i, j)| = (|YD(i, j)| MF(0,0) + 2f ) >> (qbits +1) sign (ZD(i, j)) = sign (YD(i, j))

CAVLC: Context-Based Adaptive Variable Length Coding 68



Characteristics:  Run-level

coding to compact zero string  Trailing ones (+1, -1 after 0)  Number of nonzero coefficient in neighboring blocks is correlated  Choice VLC lookup table for level parameter for level magnitude

Jin Li, Microsoft Research

34

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

CAVLC Encoding 69



1. Encode the number of coefficients and trailing ones (coeff token)  

TotalCoeffs : 0 ~ 16 TrailingOnes : 0 ~ 3 



Four look up table  

 

if more than 3 TrailingOnes, only last three are treated as ‘special cases’ Three variable-length, one fixed-length Choice depend on neighboring blocks

2. Encode the sign of each TrailingOne: In reverse order 3. Encode the levels of the remaining nonzero coefficients 

level_prefix, level_suffix



4.Encode the total number of zeros before the last coefficient



5. Encode each run of zeros





70

Jin Li, Microsoft Research

Zero-runs at start of the array need not to be encoded If less then 3 TrailingOnes, the first nonzero coefficient is adjusted

Acoustic Echo Cancellation

35

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Acoustic Echo Cancellation 71

From Audio Decoder

To Audio Encoder Acoustic Echo Cancellation

Acoustic Echo Cancellation Module 72

Jin Li, Microsoft Research

36

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Adaptive Traversal Filter 73



FIR filter – inherently stable  



Short Filters   



Length of the filter affects other performance, convergence, goodness, and complexity. Filter introduces errors since it is trying to model IIR response. 128 – 256 coefficients (taps) Faster convergence, but final solution has more residual error Less complex O(N).

Long Filters   

512-1024 Slower convergence, but final solution has less error. More complex, as algorithm can be O(N2)

Challenges 74



Dynamic range of the human ear = 120dB. 



Longer delays from satellite (300-500ms), VoIP   





More difficult for the AEC to remain converged.

Nonlinear echo components 

Jin Li, Microsoft Research

Ear is more sensitive to longer delays. More difficult to find the beginning of the echo. Long filters (~1000 taps) are needed (complexity & convergence)

Near-end noise: corrupt the echo, decreasing the cancellers ability to converge. Acoustic echo paths can change rapidly 



Even quiet echoes can be heard.

Speakers driven beyond linear region.

37

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

75

1/12/2010

Network Component

IP-based VoIP / Video Conference 76

Jin Li, Microsoft Research

38

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

77

1/12/2010

Internet Primer

Internet : Grand View 78

Jin Li, Microsoft Research

39

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Impact on ISPs 79



transit

peering entity boundary

Economics of ISP relationships  sibling

relationship

 several

ISPs belong to same org

peering  peering

relationship

 mutual

beneficial free agreement (to certain extent)

sibling

sibling entity boundary

 transit  one

relationship

ISP pays another

Inside ISP 80

Jin Li, Microsoft Research

40

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

ISP POP (Point of Presence) 81

Home Networking 82

Jin Li, Microsoft Research

41

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

83

1/12/2010

Network Characteristics

Under-provisioned Links 84

Branch

Jin Li, Microsoft Research

Branch

42

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Growth Trends 85

Packet Loss vs. Jitter (vs. Delay?) 86

Jin Li, Microsoft Research

43

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

The Usual Suspects 87

Packet Bursts 88

Jin Li, Microsoft Research

44

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

What kind of Enterprise User? 89

How QoS can help 90

Jin Li, Microsoft Research

45

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

QoS helps inside and between branches! 91

Observation 92

 



Jin Li, Microsoft Research

IP-based communication in the enterprise is growing Empirical results show poor calls for Wireless and VPN users QoS (DiffServ) is both used and useful!

46

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

93

1/12/2010

Available Bandwidth Estimation

What is Available Bandwidth (ABW)? 94



Jin Li, Microsoft Research

ABW is the left-over capacity along an Internet path

47

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Why Is It Useful? 

Maximizing QoE (Quality of Experience) in A/V conferencing Audio prefers minimum delay (high priority)  Video prefers maximum rate (low priority) 

One Way Delay (OWD) = propagation delay (constant) + queuing delay (variable) 

One solution: measure ABW, encode and send video at the ABW rate

Typical Targeting Scenario



First hop is the bottleneck  Cable



Jin Li, Microsoft Research

modem, DSL, high-speed link…

Timescale for the ABW estimation: 2-4 seconds

48

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Why Is Measuring ABW Hard? 

Available bandwidth changes over time 



ABW measurements must be quick

Audio packets (along the same path) should experience minimum delay 

Measurement must be non-intrusive



Two Models 

Probe Rate Model (PRM) based solutions  Pathload,



Probe Gap Model (PGM) based solutions  Spruce,

Jin Li, Microsoft Research

TOPP, Pathchirp, Bfind, PTR …

Delphi, IGI, Moseab …

49

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Pathload (PRM) [Jain & Dovrolis]  

Send probe trains at various rates ABW is the probe rate at transition, where OWD is increasing (queuing delay is observed)

Spruce (PGM) [Jacob et. al.] 



Jin Li, Microsoft Research

Send probe pairs/train at Ri (Ri > A), measure sending gaps and receiving gaps Compute A directly

50

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Advantage/Disadvantages of The Approaches PGM based approaches PRM based approaches

102

Jin Li, Microsoft Research

Advantages Fast estimation:

Disadvantages Assumptions are not easy to verify in practice

Estimation can be done in single probe. Slow estimation: No assumption iterative probes

Forward Error Correction

51

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Block Based Erasure Resilient Coding 103

Original data:

1

2

3

k

ERC:

1

2

3

k

At a certain instance

X

X

X

k messages n

k+1

X

X

X

Some of the blocks may be lost in delivery. However, as long as there are at least k blocks delivered, the original data can be reconstructed.

ERC in VoIP and Video Conferencing 104



VoIP  Mainly

packet replication, due to small VoIP packet size & low delay requirement



Video Conferencing  Packet

loss protection (for I frame or P frame in HD)  Each frame is separate into k msg, and protect by n-k msg. As long as there are less than n-k loss, the transmission succeeds

Jin Li, Microsoft Research

52

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

ERC Terms 105

   

Number of Original Block: k Number of Coded Block: n Rate of ERC: k/n MDS: Maximum Distance Separable  Any

k of n coded block may recover the original  The theoretical optimal performance

Erasure Encoding: Mathematics Original data:

x1

x2

Coded data:

y1

y2

xk yn

: Vectors on Galois Field. 106

Jin Li, Microsoft Research

53

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Example: ERC of 10MB Original data (10MB): Coded data: (n=30)

x1

x2

y1

y2

xk k=10, GF(28), each vector is 1MB. yn

30

10

1M

1M 107

Erasure Decoding: Mathmatics 108

Original data:

x1

x2

Coded data:

y1

y2

xk yn Available Code select

Jin Li, Microsoft Research

54

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Erasure Decoding: Mathmatics 109

Original data:

x1

x2

Coded data:

y1

y2

xk yn

Original data can be recovered if the sub-generator matrix has a full rank k.

Systematic vs Non-Systematic ERC 110

k messages

Original data:

1

2

3

k

Non systematic ERC:

1

2

3

k

k+1

n

Systematic ERC:

1

2

3

k

k+1

n



Systematic ERC  Slightly

low encoding & decoding complexity  Even can’t recover, we can still use some original msg

Jin Li, Microsoft Research

55

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Reed-Solomon 111

  

Has been around for decades Has systematic form Cauchy Reed-Solomon Code

Tutorial, Jin Li

Reed-Solomon Decoding

Inverse

Receive

112

Jin Li, Microsoft Research

56

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

113

1/12/2010

Dejitter Buffer

Variable Delay & Dejitter Buffer Queuing Delay

Queuing Delay

Queuing Delay

Dejitter Buffer

Queuing delay  Dejitter buffers  Variable packet sizes 

Jin Li, Microsoft Research

57

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Fixed Dejitter Buffer – Budget For Worst Case Coder Queuing Delay Delay 40 ms 4-50 ms Site A

Dejitter Buffer 50 ms Site B

Propagation Delay—8 ms (128kbps Bandwidth



Total End-to-End Delay Codec delay: 40ms  Propagation delay: 8ms  Dejitter buffer: 50ms 





To accommodate queuing delay: 0-50 ms

Total delay: 98ms

Dejitter Buffer Size & Late Loss late loss

buffering delay

Playout Jitter  

Delay

Jin Li, Microsoft Research

Fixed playout deadline and jitter absorption: The playout rate is constant The tradeoff is between Dejitter buffer size and late loss

Packet Loss

58

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Adaptive Playout and Dejitter Buffer Adaptation

buffering delay

Playout Jitter





Adaptive playout and jitter adaptation Scaling of voice/video packets in highly dynamic way Playout schedule set according to past delays recorded 

Delay

Packet Loss 



Usually dejitter buffer size expand quickly to late packet arrival, and shrink slowly when jitter reduces

Improved tradeoff between buffering delay and late loss Playout rate is not constant

Adaptive Play Out 118

Audio Adaptive Playout

  

Jin Li, Microsoft Research

Packets push into Adaptive Playout module Render requests new waveform seg for playout Playout module passes packet to audio decoder

59

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

119

1/12/2010

Packet Loss Concealment

Audio Packet Loss Concealment L i-2

i-1

∆L i lost

i+1

i+2

time

alignment found by correlation i-2

i-1

i+1

i+2

time

2L 1.3 L



Depend on voiced & unvoiced segment

Jin Li, Microsoft Research

60

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Voiced segments

Unvoiced segments

Jin Li, Microsoft Research

61

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Concealment as (bi-directional) stretching

Video Packet Loss Concealment 124



Spatial Concealment  Use

spatial correlation

 E.g.,

bilinear interpolation  Projection onto convex sets 

Temporal Concealment  Use

correlation exists between consecutive frames

 Temporal

replacement  Boundary matching

Jin Li, Microsoft Research

62

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Spatial-Temporal Concealment 125

126

Jin Li, Microsoft Research

Summary

63

CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing

1/12/2010

Summary 127 

VoIP/Video Conference Systems  



Audio/Video Components   



Audio codec Video codec Acoustic echo cancellation

Network components      

Jin Li, Microsoft Research

Infrastructure based P2P based

Primer of the Internet Network characteristics Available bandwidth estimation Forward error correction (FEC) Dejitter buffer Packet loss concealment

64