MOHAMAD FAUZAN BIN SAFIEE

AN INTRUSION DETECTION SYSTEM (IDS) FOR INTERNET NETWORK MOHAMAD FAUZAN BIN SAFIEE A project report submitted in partial fulfilment of the Requireme...

Author: Junior Johns

2 downloads 3 Views 4MB Size

Report

Download PDF

Recommend Documents

MOHAMAD FAIZAL BIN BAHAROM

Oleh. Mohd Asri bin Mohamad

Mahathir bin Mohamad - Wikipedia, the free encyclopedia

Hak-Hak Ukhuwwah. Dr. Syaikh Shalih bin Fauzan Alu Fauzan. Terjemah : Tim al manhaj.net Editor : Eko Haryanto Abu Ziyad

MOHAMAD ARIFF MOHAMAD RAZALI

oleh Megat Aman Zahiri bin Megat Zakaria Meor Ibrahim bin Kamaruddin Mohamad bin Bilal Ali Noraffandy bin Yahaya

Muhammad Rashid Bin Rajuddin & Mohd Dzulfadli Bin Mohamad Saleh Fakulti Pendidikan Universiti Teknologi Malaysia

DUAL BAND MICROSTRIP PATCH ANTENNA FOR WIFI AND LTE APPLICATIONS MOHAMAD ALIF BIN MOHAMAD ZAWANI UNIVERSITI TEKNOLOGI MALAYSIA

BIN BIN BIN BIN BIN BIN BIN 1

PENYINGKIRAN KEKERUHAN, PEPEJAL TERAMPAI DAN ALUMINIUM MENGGUNAKAN LOJI PANDU PENGAPUNGAN UDARA TERLARUT MOHAMAD FARED BIN MURSHED

Mohamad Abdillah Royo & Zul Azli Bin Zainun Fakulti Pendidikan Universiti Teknologi Malaysia

THE STUDY OF IMPACT RESPONSE OF COMPOSITE MATERIAL MOHD ALFADULY BIN MOHAMAD SALEH

A NEW SEARCH AND EXTRACTION TECHNIQUE FOR MOTION CAPTURE DATA RAFIDEI BIN MOHAMAD

Ph.D: Problem Based Learning (PBL) for Malaysia Teacher Education Borhan, Mohamad Termizi Bin

KESALAHFAHAMAN PELAJAR TINGKATAN EMPAT DALAM MENYELESAIKAN PERSAMAAN KUADRATIK MOHAMAD ARIFF BIN ABD RAHMAN

POWER FLOW ANALYSIS SOFTWARE USING MATLAB MOHD SHAHIMI BIN MOHAMAD ISA UNIVERSITY MALAYSIA PAHANG

COMPARISON OF EROSION CORROSION OF TITANIUM AND STAINLESS STEEL IN SEAWATER MOHAMAD IZWAN BIN ABDUL GHANI

Mohamad Norizham b. Hamzah a, Norfadzillah Binti Ismail b dan Mohd Yusaini bin Mohamed Ali c

Nabil Sehnaoui. Mohamad El Hout. GhassanTaher Fadlalla. Riad Mohamad Mekaou

FARIDAH BINTI MOHAMAD

NORHUSNA BINTI MOHAMAD

Memahami Hubungan Gerakan Syiah Antarabangsa dengan Syiah di Malaysia: Penilaian di Facebook. Muhamad Faisal bin Asha ari Mohamad Jarpani bin Syopiyan

EFFECTS OF ABSORBED HYDROGEN ON FRACTURE TOUGHNESS OF WELDED SA516 GRADE 70 STEEL MOHAMAD HAIDIR BIN MASLAN

THE EFFECT OF AMBIENT TEMPERATURE TO THE PERFORMANCE OF INTERNAL COMBUSTION ENGINE MOHAMAD FUAD BIN ABDUL AZIZ

AN INTRUSION DETECTION SYSTEM (IDS) FOR INTERNET NETWORK

MOHAMAD FAUZAN BIN SAFIEE

A project report submitted in partial fulfilment of the Requirements for the award of the degree of Master of Engineering (Electrical - Electronic and Telecommunication)

Faculty of Electrical Engineering Universiti Teknologi Malaysia

MAY 2007

PSZ 19:16 (Pind. 1/97)

UNIVERSITI TEKNOLOGI MALAYSIA

BORANG PENGESAHAN STATUS TESISυ JUDUL: AN INTRUSION DETECTION SYSTEM (IDS) FOR INTERNET NETWORK

SESI PENGAJIAN: Saya

2006/2007

MOHAMAD FAUZAN BIN SAFIEE (HURUF BESAR)

mengaku membenarkan tesis (PSM/Sarjana/Doktor Falsafah)* ini disimpan di Perpustakaan Universiti Teknologi Malaysia dengan syarat-syarat kegunaan seperti berikut: 1. 2. 3. 4.

Tesis adalah hakmilik Universiti Teknologi Malaysia. Perpustakaan Universiti Teknologi Malaysia dibenarkan membuat salinan untuk tujuan pengajian sahaja. Perpustakaan dibenarkan membuat salinan tesis ini sebagai bahan pertukaran antara institusi pengajian tinggi. **Sila tandakan (4)

SULIT

(Mengandungi maklumat yang berdarjah keselamatan atau kepentingan Malaysia seperti yang termaktub di dalam AKTA RAHSIA RASMI 1972)

TERHAD

(Mengandungi maklumat TERHAD yang telah ditentukan oleh organisasi/badan di mana penyelidikan dijalankan)

/

TIDAK TERHAD Disahkan oleh

(TANDATANGAN PENULIS) Alamat Tetap: NO. 23, JALAN PANDAN INDAH 5/17, PANDAN INDAH 55100 KUALA LUMPUR Tarikh:

11 Mei 2007

CATATAN:

* ** υ

(TANDATANGAN PENYELIA)

DR. SYARIFAH HAFIZAH BT SYED ARIFFIN Nama Penyelia Tarikh:

11 Mei 2007

Potong yang tidak berkenaan. Jika tesis ini SULIT atau TERHAD, sila lampirkan surat daripada pihak berkuasa/organisasi berkenaan dengan menyatakan sekali sebab dan tempoh tesis ini perlu dikelaskan sebagai SULIT atau TERHAD. Tesis dimaksudkan sebagai tesis bagi Ijazah Doktor Falsafah dan Sarjana secara penyelidikan, atau disertasi bagi pengajian secara kerja kursus dan penyelidikan, atau Laporan Projek Sarjana Muda (PSM).

“I hereby declare that I have read this thesis and in my opinion this thesis is sufficient in terms of scope and quality for the award of the degree of Master of Engineering (Electrical – Electronic and Telecommunication).”

Signature

:

Name of Supervisor : Date

:

Dr. Sharifah Hafizah Bt. Syed Ariffin 11 May 2007

ii

I declare that this thesis entitled “An Intrusion Detection System (IDS) For Internet Network” is the result of my own research except as cited in the references. The thesis has not been accepted for any degree and not concurrently submitted in candidature of any other degree.

Signature

:

Name

:

Mohamad Fauzan bin Safiee

Date

:

11 May 2007

iii

To my beloved wife, mum and all my siblings thanks a lot for your patience and prayer for my success. Al-Fatihah to my late father.

iv

ACKNOWLEDGEMENT

In preparing this thesis, I was in contact with many people, researchers, academicians, and practitioners. They have contributed towards my understanding and thoughts. In particular, I wish to express my sincere appreciation to my thesis supervisor, Dr. Sharifah Hafizah bt. Syed Ariffin, for encouragement, guidance, critics and friendship. I am also very thankful to Assoc. Prof. Liza bt. Abdul Latif and my friend Br. Muhammad Sadry bin Abu Seman for their guidance, advices and motivation. Without their continued support and interest, this thesis would not have been the same as presented here. I am also indebted to Ministry of Defence (MINDEF) for funding my Master study. Librarians at UTM and MINDEF also deserve special thanks for their assistance in supplying the relevant literatures. My sincere appreciation also extends to all my colleagues and others who have provided assistance at various occasions. Their views and tips are useful indeed. Unfortunately, it is not possible to list all of them in this limited space. I am grateful to all my family members especially my beloved wife.

v

ABSTRACT

An Intrusion Detection System (IDS) is detects and blocks unwanted attacks to the civilian or the military systems. These attacks can be an internal attack or external attack. The traffic or the normal networks is heterogeneous where else the military network has more homogeneous traffic. Even though the internet security includes firewall and other system security it usually failed to filter out the unwanted attack to the system and allows system breakdown and system failures. The major problems in developing the IDS method is the evolving growth of the internet topology and the growth of the internets users which makes the modeling of the network with attack free data is difficult. Real world test has shown overwhelming numbers of false alarms of attack and little success in filtering them out. This project is to analysis the network with data free attacks in a simulator that involved selfsimilar traffic that ideally represents the internet traffic modeling as well as the Poisson traffic modeling for the non peak hours periods. With templates of data free attacks a system will reduce the complexity in detecting the attacks during peak hours and non peak hours. The network system was simulated in NS-2 simulator.

vi

ABSTRAK

Sistem Pengesanan Pencerobohan (SPP) ini mengesan dan menyekat serangan yang tidak diingini kepada sistem orang awam atau tentera. Serangan ini boleh dilaksanakan samada serangan secara dalaman atau luaran. Bagi aliran trafik rangkaian biasa, ia adalah berbentuk pelbagai jenis yang mana rangkaian tentera aliran trafiknya lebih berbentuk sama jenis. Walaupun ia dilengkapi dengan sistem keselamatan internet termasuk ‘firewall’ dan sistem keselamatan yang lain, yang mana kebiasaannya ia gagal untuk menapis serangan yang tidak dikehendaki kepada sistem dan menyebabkan sistem terganggu dan gagal beroperasi. Masalah utama dalam membangunkan kaedah bagi sistem ini ialah perkembangan dalam topologi internet dan juga pengguna internet yang mana permodelan rangkaian ini lebih sukar untuk serangan data bebas. Ujian sebenar yang dilaksanakan ia menyatakan bahawa masalah ini berpunca kerana banyak serangan-serangan palsu dilakukan ke atas sistem ini dan ia tidak berupaya untuk menapis semua serangan tersebut. Projek ini bertujuan menganalisis rangkaian dalam simulator dengan serangan data bebas yang mana melibatkan trafik ‘self-similar’ yang menunjukkan permodelan trafik internet dan sebaik-baiknya permodelan trafik ‘Poisson’ digunakan bagi luar jangka waktu puncak. Dengan templet serangan data bebas, sistem akan mengurangkan kesulitan dalam mengesan serangan semasa waktu puncak dan luar waktu puncak. Sistem rangkaian ini disimulasikan dalam perisian ‘NS-2 Simulator’.

vii

TABLE OF CONTENTS

CHAPTER

1

2

TITLE

PAGE

DECLARATION

ii

DEDICATION

iii

ACKNOWLEDGEMENTS

iv

ABSTRACT

v

ABSTRAK

vi

TABLE OF CONTENTS

vii

LIST OF TABLES

xi

LIST OF FIGURES

xii

LIST OF ABBREVIATIONS

xiv

LIST OF SYMBOLS

xvi

INTRODUCTION

1

1.1

Overview

1

1.2

Problem Statement

2

1.3

Objective

2

1.4

Scope of Work

3

1.5

Thesis Organization

3

LITERATURE REVIEW

4

2.1

Background

4

2.2

Intrusion Detection System (IDS)

5

2.2.1 Types of intrusion detection systems

6

2.2.1.1

Host-based intrusion detection systems (HIDS)

7

viii

2.2.1.2

Network-based intrusion detection

8

systems (NIDS) 2.2.2 Methods and modes of intrusion detection

2.3

10

2.2.2.1 Anomaly detection

10

2.2.2.2

12

Misuse detection or pattern matching

2.2.3 Detection issues

12

2.2.4 Responses to Intrusion Detection

14

2.2.5 Common Attacks

15

Internet Network

16

2.3.1

Differentiated Services (DiffServ)

18

2.3.2

DiffServ Vulnerabilities

21

2.4

Internet Traffic

24

2.5

Poisson Traffic

27

2.5.1 Poisson Law

27

2.5.2 Poisson Process

29

2.5.3 Traffic Analysis

32

Self-similar Traffic

33

2.6.1 Self-similarity

33

2.6.2 Stochastic self-similarity and network traffic

36

2.6.3 Traffic Research

38

2.6

2.6.3.1

Measurement-based traffic modeling

38

2.6.3.2

Physical modeling

39

2.6.3.3 Queueing analysis

42

2.6.3.4 Traffic control and resource

44

provisioning 2.6.4 Issues and Remarks

46

2.6.4.1

Traffic measurement and estimation

46

2.6.4.2

Traffic modeling

48

2.6.4.3

Performance

analysis

and

traffic

50

control 2.7

Literature Finding

53

ix

3

METHODOLOGY

55

3.1

Simulation and Analysis

55

3.1.1

Simulation

55

3.1.2

Analysis

56

3.2

3.3

3.4

4

3.1.2.1

Analysis Process

56

3.1.2.2

1st Level Analysis

58

3.1.2.3

2nd Level Analysis

59

3.1.2.4

Response

60

Sequential Discrete Event Simulation

62

3.2.1 Simulation

62

3.2.2 Event

62

3.2.3 Simulation Time

63

3.2.4 Random Variables

64

3.2.5 Event Relations

64

Statistical Anomaly Detection

65

3.3.1

The Concept

65

3.3.2

Justification For Using the NIDES

67

The NS-2 Simulator

70

SIMULATION AND ANALYSIS

72

4.1

Simulation

72

4.1.1

DiffServ Network Topology Setup

72

4.1.2

QoS Application Traffic Setup

77

4.1.3

Background Traffic Setup

77

4.1.4

Sensors Setup

77

4.1.5

Attacks Setup

78

4.2

Validation of DiffServ and Traffic Simulation

78

4.3

Result of Simulation

80

4.4

Analysis of Simulation

88

x

5

DISCUSSION AND CONCLUSION

90

5.1

Discussion

90

5.2

Recommendation and Future work

90

5.3

Conclusion

91

REFERENCES

92

xi

LIST OF TABLES

TABLE NO. 2.1

TITLE Likelihood, Impact and Difficulty-of-detection for

PAGE 23

Attacks 4.1

NS-2 Traffic Service Differentiation

80

4.2

Average packets rate and peak packet received at Node 5

87

(Sink).

xii

LIST OF FIGURES

FIGURE NO.

TITLE

PAGE

2.1

A centralized IDS.

8

2.2

Distributed IDS.

9

2.3

Detection issues in IDSs.

13

2.4

Relation between hosts on LANs and the subnet.

17

2.5

A Simplified DiffServ Architecture.

18

2.6

DiffServ Classification and Conditioning/Policing at

19

Ingress. 2.7

Differentially Serving Scheduler

19

2.8a

The histogram of the Poisson distribution (λ = 5).

28

2.8b

Smoothed shape of the Poisson distribution for different

28

parameter values. 2.9

2-dimensional Cantor set.

34

2.10

Left: 1-dimensional Cantor set interpreted as on/off

35

traffic. Middle: 1-dimensional non uniform Cantor set with weights αL = 2/3, αR = 1/3. Right: Cumulative process corresponding to 1-dimensional on/off Cantor traffic. 2.11

Stochastic self-similarity - in the ‘burstiness

37

preservation sense’ - across time scales 100s, 10s, 1s, 100ms (top-left, top-right, bottom-left, bottom-right). 2.12

Mean queue length as a function of buffer capacity for 51 input traffic with varying long-range dependence (α = 1.05, 1.35, 1.65, 1.95).

xiii

2.13

Performance gain of TCP Reno, Vegas, Rate, when 52 endowed with multiple time scale capabilities as a function of RTT.

3.1

IDS Data Analyses

57

3.2

Algorithm of discrete-event simulator

63

4.1

Network Topology for the Simulations

73

4.2

Network topology for data free attack (a) Light network 74-75 (10 nodes) (b) Medium network (20 nodes) (c) Heavy network (40 nodes)

4.3

Network topology for with attack data (a) Light network 75-76 (10 nodes) (b) Medium network (20 nodes) (c) Heavy network (40 nodes)

4.4

CBR Traffic with 1000, 100 and 10 Second Time 77 Scales

4.5

Result data free attack for self-similar traffic (a) Light 81-82 network (10 nodes) (b) Medium network (20 nodes) (c) Heavy network (40 nodes)

4.6

Result data free attack for Poisson traffic (a) Light 82-83 network (10 nodes) (b) Medium network (20 nodes) (c) Heavy network (40 nodes)

4.7

Result with attack data for self-similar traffic (a) Light 84-85 network (10 nodes) (b) Medium network (20 nodes) (c) Heavy network (40 nodes)

4.8

Result with attack data for Poisson traffic (a) Light 85-86 network (10 nodes) (b) Medium network (20 nodes) (c) Heavy network (40 nodes)

xiv

LIST OF ABBREVIATIONS

ACK

-

acknowledgement

AF

-

Assured Forwarding

ATM

-

Asynchronous Transfer Mode

BE

-

Best-effort

CBR

-

Constant Bit Rate

CBS

-

Committed Burst Size

CIR

-

Committed Information Rate

CPU

-

central processing unit

DiffServ

-

Differentiated Services

DES

-

Discrete Event Simulation

DNS

-

Domain Name Server

DoS

-

Denial of Service

DSCP

-

Differentiated Service Code Point

EF

-

Expedited Forwarding

HIDS

-

host-based intrusion detection system

IDS

-

Intrusion Detection System

IntServ

-

Integrated Service

IP

-

Internet protocol

ISP

-

Internet service provider

FARIMA

-

(fractional) autoregressive integrated moving average

FTP

-

File transfer protocol

GUI

-

graphical user interface

HTTP

-

Hypertext transfer protocol

LAN

-

Local Area Network

MPEG4

-

NIDES/STAT

-

statistical anomaly detection engine/algorithm

xv

NIDS

-

network-based intrusion detection system

NNTP

-

network news transfer protocol

NS-2

-

Network Simulator, version 2

PDF

-

probability distribution function

PHB

-

Per Hop Behavior

QoS

-

Quality of Service

RED

-

Random Early Dropping

RFC

-

Request For Comment

RNG

-

random number generators

RTT

-

round-trip times

RULE

-

Rule-based detection engine

SLA

-

Service Level Agreement

SMTP

-

simple mail transfer protocol

TBF

-

Token Bucket Filter

TCA

-

Traffic Conditioning Agreement

TCP

-

transmission control protocol

UDP

-

user datagram protocol

URL

-

uniform resource location

VLL

-

Virtual Leased Line

VoIP

-

Voice over Internet protocol

WAN

-

Wide Area Network

WRR

-

Weighted Round Robin

WWW

-

World Wide Web

xvi

LIST OF SYMBOLS

Gbps

-

Giga bits per second

KB

-

Kilobyte

Kbps

-

Kilobit per second

Mbps

-

Megabit per second

CHAPTER 1

INTRODUCTION

1.1

Overview

In the context of physical security, intrusion detection systems mean tools used to detect activity on the boundaries of a protected facility. When we commit to physically protecting the premises on which our staff work and which house our information processing equipment, we should carry out an exhaustive risk analysis and, where the threat requires, consider installing a perimeter intrusion detection system (IDS).

The simplest IDS are a guard patrol. Guards who walk on the corridors and perimeter of a facility are very effective at identifying attempts of break-in on the premises. If anything goes wrong, they will either raise the alarm or attempt to challenge the intruder. Of course, the most obvious shortcoming of a guard patrol is that the patrol cannot be at all points of the facility at the same time.

This leads to the next simplest IDS and that is video monitoring. Video camera can be place at locations in the facility where all points in the perimeter can be monitored simultaneously. If there is an intrusion attempt it will be detected and the alarm will be raise by the person in charged with monitoring the video an alarm.

IDS are designed to function like a burglar alarm on your house where these systems should record suspicious activity against the target system or network, and should alert the information security manager or support staff when an electronic

2

break-in is underway. The biggest downfall with IDS products is the necessary level of customization ‘of the box’. Without significant amounts of customization, the IDS will produce a large number of false-positive alerts. A false positive is created when the IDS alerts the support staff to an event that will not have an impact on the target system. For example, a Code Red attack against and Apache Web server will not work, but the IDS may still sound the alarm.

1.2

Problem Statement

The major problems in developing the IDS method is the evolving growth of the internet topology and the growth of the internets users which makes the modelling of the network with attack free data is difficult. Real world test has shown overwhelming numbers of false alarms of attack and little success in filtering them out.

1.3

Objective

Objective of this thesis are:

(a)

To analysis the network with data free attacks in that involved the complex Internet Traffic as well as data traffic for the peak hours and non peak hours periods for the peak packet and average packet received.

(b)

To analysis the network with attacks in that involved the complex Internet Traffic as well as data traffic for the peak hours and non peak hours periods for the peak packet and average packet received.

1.4

3 Scopes of Work

The scopes of this project consist of:

(a)

Discussion about the concept and application of intrusion detection system (IDS).

(b)

Study on self-similar traffic which represent the Internet traffic source and Poisson traffic modelling which represent voice traffic in the analysis.

(c)

Simulate the attacks during peak hours and non peak hours in Network Simulator, version 2 (NS-2) simulator.

(d)

Analyses the result to determine the performance of Internet network and propose the proactive action to solve the weakness that going happened.

1.5

Thesis Organization

This thesis has the following structure. Chapter 2 give some literature review background information. Chapter 3 discusses the methodology simulation and analysis that to be used. Then, Chapter 4 explains the simulation and analysis. Finally, Chapter 5 concludes the thesis and gives possible directions for future research.

CHAPTER 2

LITERATURE REVIEW

2.1

Background The primary focus of computer security is intrusion prevention, where the

goal is to keep bad guys out of your system or network. Authentication can be viewed as a way to prevent intrusions, and firewalls are certainly a form of intrusion prevention, as are most types of virus protection. Intrusion prevention can be viewed as the information security analogy of locking the doors on your car. But even if you lock the doors on your car, it might still get stolen. In information security, no matter how much effort you put into intrusion prevention, occasionally the bad guys will be successful and an intrusion will occur. What can we do when intrusion prevention fails? Intrusion detection system (IDS) is a relatively recent development in information security. The purpose of an IDS is to detect attacks before, during, and after they have occurred. The basic approach employed by IDSs is to look for ‘unusual’ activity. In the past, an administrator would scan through log files looking for signs of unusual activity. Automated intrusion detection is a natural out growth of such manual log file analysis. Intrusion detection is currently a very active research topic. As a result, there are many claims in the field that have yet to be substantiated and it’s far from clear how successful or useful some of these techniques will prove, particularly in the face of increasingly sophisticated attacks.

5

Who are the intruders that an IDS is trying to detect? An intruder could be a hacker who got through your network defences and is now launching an attack on the internal network. Or, even more insidious, the ‘intrusion’ could be due to an evil insider, such as a disgruntled employee. What sorts of attacks might an intruder launch? An intruder with limited skills (a ‘script kiddie’) would likely attempt a wellknown attack or as light variation on such an attack. A more skilled attacker might be capable of launching a variation on a well-known attack or a little-known attack or an entirely new attack. Or the attacker might simply use the breached system as a base from which to launch attacks on other systems.

2.2

Intrusion Detection System (IDS) An intrusion is technically defined as “an attempt by an unauthorized entity

to compromise the authenticity, integrity, and confidentiality of a resource.” The role of an intrusion detection system (IDS) is to attempt to trap a hacker’s presence on a compromised network, to weed out any malfeasance as a result of the hacker’s presence, and to catalogue the activities so that similar attack scan be avoided in the future. Intrusions include the following types of attacks: (a)

Malign sensitive information on internal networks.

(b)

Appropriate confidential and proprietary information.

(c)

Dampen functionalities and resources available to possible legitimate users.

IDSs are required to prevent problems arising out of an attack. Rectification of damage wrought by an attacker and the subsequent legal issues can be far more costly and time consuming than detecting the attacker’s presence and removing him at an earlier stage. IDSs give a very good logged account of the means and modalities used by various attackers, which can be used to prevent and circumvent possible future attacks. Thus, present-day intrusion detection capabilities provide an

6

organization with a good source for overall security analysis. The question as to what kind of intrusion detection system to deploy depends on the size and scale of the organization’s internal networks, the amount of confidential information maintained on the network, and so on. From time to time, attackers will manage to compromise other security measures, such as cryptography, firewalls, and so on. It is crucial that knowledge of these compromises immediately flow to the administrators. Such tasks can be easily accomplished using intrusion detection systems. Negligence of administrators is a problem in network security. Deployment of intrusion detection systems can aid administrators in figuring out any missed vulnerability or exploits that a potential attacker could perform.

2.2.1

Types of intrusion detection systems IDSs fall under many different categories depending on their functionality

and architecture. Each type has its own specialized functionalities. An organization wishing to install IDS normally goes through a comprehensive review of their needs and security requirements before choosing suitable IDS. Basically, IDSs are classified under the following categories: (a)

Host-based intrusion detection systems

(b)

Network-based intrusion detection systems

Underneath the hood, IDS products function either as a host-based intrusion detection system (HIDS) or a network-based intrusion detection system (NIDS). There are positives and negatives with each type. With an HIDS product, the product protects the system by monitoring a single system. There are a number of different ways that an HIDS can monitor the system. One of the more common ways is for the HIDS product to monitor all network traffic entering or leaving the host. The HIDS product can also function by monitoring the log files on the system itself. The disadvantage of using an HIDS product is that the product, by its very nature, cannot detect common network preamble attacks such as a ping sweep.

7

A network-based intrusion detection system (NIDS) works by monitoring a network segment to determine if the network traffic matches the pattern of a wellknown network attack. This type of system can detect preamble attacks such as a ping sweep, but can be fooled by high network congestion and encryption. Also, the NIDS can have a lag time for new network attacks being written to the intrusion detection system profile. A new network attack may bypass the NIDS device until the attack pattern can be written and the NIDS updated.

2.2.1.1 Host-based intrusion detection systems (HIDS) Host-based IDSs (HIDS) are designed to monitor, detect, and respond to activity and attacks on a given host. In most cases, attackers target specific systems on corporate net-works that have confidential information. They will often try to install scanning programs and other vulnerabilities that can record user activity on a particular host. A HIDS allows an organization or individual owners of a host on a network to protect against and detect adversaries who may incorporate security loopholes or exploit other vulnerabilities. Some HIDS tools provide policy management, statistical analysis, and data forensics at the host level. HIDSs are best used when an intruder tries to access particular files or other services that reside on the host computer. In most cases, the HIDS are integrated into the operating systems that the host is running. Because attackers mainly focus on operating system vulnerabilities to break into hosts, such placement of the IDS proves very beneficial. Historically, many HIDSs were installed on the respective hosts themselves, because no separate intrusion detection entity could be provided for large mainframes (which needed much security) in a cost-effective manner. This method caused some security bottlenecks. An intruder able to successfully overcome the IDS and the inherent security features of the host could disable the IDS for further actions. Such disadvantages are overcome when the IDS is physically separated from the hosts themselves. With the advent of personal computers and

8

cheaper hardware accessories, separate entities for placing IDSs are favoured (see Figure 2.1).

Figure 2.1

A centralized IDS

2.2.1.2 Network-based intrusion detection systems (NIDS) Network-based IDSs (NIDS) capture network traffic (usually on the network as a whole or from large segments of it) for their intrusion detection operations. Most often, these systems work as packet sniffers that read through incoming traffic and use specific metrics to conclude that a network has been compromised. Various Internets and other proprietary protocols, such as Transmission Control Protocol/Internet Protocol (TCP/IP), NetBEUI, XNS and so on, which handle messages between external and internal networks, are vulnerable to attack and have to rely on additional means to detect malicious events. Frequently, intrusion detection systems have difficulty in working with encrypted information and traffic from virtual private networks. Speed (over 1 Gbps) is a constraining factor, though recent releases of NIDSs have the ability to work much faster. Figure 2.2 shows a representation of HIDS and NIDS deployed on networks.

9

NIDSs can be centralized or distributed in control. In centralized control mechanisms, a central entity is responsible for analyzing and processing the logged information provided by the various constituent IDSs. The constituent systems can also be HIDSs. On the other hand, NIDSs can be on distributed architectures. Corporate networks can be spread over great distances. Some attacks target an organization’s entire network spread over such big dimensions. Distributed systems could be integrated for performance and operations under such environments. Many features from distributed theory (such as cooperative agents) could be applied to realize operations under such IDSs. Cooperative agents are one of the most important components of distributed intrusion detection architecture. An agent, in general, is an entity that acts for or represents another entity. In the software area, an agent is an autonomous or semi-autonomous piece of software that runs in the background and performs useful tasks for another. Relative to IDSs, an agent is generally a piece of software that senses intrusions locally and reports attack information to central analysis servers. The cooperative agent’s them-selves could form a network among themselves for data transmission and processing. The use of multiple agents across a network allows broader view of the network than may be possible with single IDS or centralized IDSs. Figure 2.2 shows the architectural description of such distributed IDSs.

Figure 2.2

Distributed IDS

10

2.2.2

Methods and modes of intrusion detection IDSs can operate in different modes. Essentially, the purpose of designing

such modes of operation is to have a basis for analysis of network packets. These metrics can be used to deduce whether a particular network or system has been compromised or not. In most cases, the information collected indicates to the administrator whether or not further action needs to be taken. The most important of these modes are listed here: (a)

Anomaly detection

(b)

Misuse detection

2.2.2.1 Anomaly detection Anomaly detection is the process of scanning for abnormal activity that is encountered on the network. Most systems maintain a good log of the kinds of activities that take place on their networks and sensitive hosts. Such information can be used for comparison and contrast of all activity that takes place on the network. Unless administrators define static rules for new kinds of activity on the network, any deviation from the normal activity would be referred to as an anomaly. An IDS will alert network administrators when it encounters anomalous activity. A variety of metrics can be used for detecting anomalous activities. Some of the more prominent ones are as follows: (a)

Most often an IDS uses parametric curves to account for historical data that it has logged. In some case, learning curves can be devised from a design perspective to fit the log data. Any new activity that does not properly fit into such curves or that shows heavy deviation from normal curve projections can be classified as anomalous.

11

(b)

Static rules can be set for file access, processor utilization, resource utilization, and so on from which anomalous activities can be inferred. For example, sudden and high utilization of central processing unit (CPU) power on particular systems can be seen to be an anomaly. Extraneous processes could be a reason for the change in such processor activity. Permissible thresholds can be set for resource utilization and sensitive resources can be continuously monitored for anomaly. Many kinds of denial-of-service attacks can be weeded out under such a scheme.

(c)

When a remote system intended for use by remote users shows activity locally, this could be cause for alarm. User systems that show activity at abnormal hours when the intended user, who may be designated to use the system, should not be logged in might also be indicative of abnormal activities.

(d)

Port scanners are tools that an attacker can use to scan through a host’s transmission control protocol (TCP) or transport layer connection ports to evaluate host activity and find unused ports. One approach is to monitor normally unused ports present on a system. For example, if there is a sudden surge of activity on a particular port that has never been used, an alarm could be raised.

(e)

In some instances, anomalies can be defined or modelled either statically or heuristically using soft computing techniques such as fuzzy logic, neural networks, evolutionary computing, genetic algorithms, and so on. The performance of such systems is usually high-end.

Even though anomaly-based IDSs are widespread and highly successful in most environments, they possess various disadvantages, too. The main drawback with anomaly-based systems is that they can raise a high proportion of false alarms. False alarms are raised when legitimate activity that differs from observed patterns of history occur. Anomaly-based IDSs can be very useful in creating and modifying

12

signatures of user activity and accounts. Signatures are very useful metrics in misusebased IDSs

2.2.2.2 Misuse detection or pattern matching Misuse detection is another method employed by IDSs. The main job of these systems is to compare activities with pre-generated signatures. Signatures are normally a set of characteristic features that represent a specific attack or pattern of attacks. Signatures are generated in most cases following an actual attack. Many commercial products store characteristic features of most of the known attacks that have been found and accounted for to compare them with future network activity. Sophisticated techniques, such as state-based analysis, are very useful in analyzing the signatures of attacks and the subsequent intrusion detection process. Normally when misuse detection techniques are employed, highly skilled administrators are not required to infer an attacker’s activity. It becomes very easy for a moderately skilled administrator to take evasive or remedial measures when attacks are detected by signature-based IDSs. In addition to the aforementioned advantages, misuse-based IDSs operate quickly and efficiently. Nonetheless, because signatures are predetermined based on the past history of attacks and attacks that are already known, newer and covert attacks that do not fit the description of the designed signatures may succeed in passing through such IDSs. A high-profile survey of potential attacks and their signatures is required to make an effective design of such systems. This is reflected in the amount of cost involved in their architecture and implementation.

2.2.3

Detection issues IDSs often have both accurate detections and missed attacks. Depending on

the type of alarm raised by the IDS and the actual intrusion scenario, the following

13

types of detection results are possible. Figure 2.3 shows a representation of the design issues: (a)

True positive - Occur when an actual attack occurs and the IDS responds to it by raising the appropriate alarm. Further action by the administrators to counter the attack is required when true positives occur.

(b)

True negatives - Normal activity as expected by the administrators from the IDS. When no attacks happen, the intrusion detection system has no reason to raise alarms.

(c)

False positives - Typically known as false alarms, these occur when an IDS reads legitimate or no activity as being an attack. This is a very serious draw-back in intrusion detection systems.

(d)

False negative - When a potential or genuine attack is missed by the IDS. The more occurrences of this scenario, the more doubtful the accountability of the IDS and its technology.

Figure 2.3

Detection issues in IDSs

14

2.2.4

Responses to Intrusion Detection Intrusion detection systems demand various modes of responses when alarms

are triggered. The degree of responses depends on the type of attack carried out and the type of alarm generated. Many false positive alarms do not require the administrator to respond, yet it would be beneficial to the administrator to log when false positive alarms occur so that the information can be used in the future. Both active and passive modes of responses can be incorporated into the systems, some of which are shown in the following list: (a)

Block IP address - IDSs can effectively block the internet protocol (IP) address from which the attack originated. This scheme may not be effective against a vigilant attacker who spoofs the source IP address during his attacks. Nonetheless, blocking IP addresses proves very effective against spam and denial-of-service attacks.

(b)

Terminate connections - The connections or sessions that the intruder maintains with the compromised system can be disrupted. RESET TCP packets can be targeted at the attacker so that he loses his established connections with the hosts. Routers and firewalls can be reconfigured to take appropriate actions depending on the severity of the intrusion.

(c)

Acquire additional information - Active responses can include collecting information on or observing the intruder over a period of time. Audit logs and sensory mechanisms can be alerted to work more carefully during such information-gathering periods. The gathered information can be used subsequently to analyze the pattern of the attacker and make the whole IDS more robust. In addition, mechanisms can be devised to take legitimate actions against the intruder when sufficient knowledge about his origin is known.

15

2.2.5

Common Attacks Many varieties of attacks can be detected using IDSs. Many of the attacks

focus on altering user records and creating back doors for the attacker. Back doors serve as an entry point for attackers (the creator of the back door or others) to launch attacks at unexpected times. Vulnerability analysis deals with the detection and removal of such back doors so that they cannot be used as exploits. In most cases, the attacker wants some personal gains out of an attack. Attackers may target bank accounts and financial organizations with the intention of embezzling money. In such cases, personal profiling of the attacker is highly recommended. Some of the wellknown attack types are as follows: (a)

Denial-of-service - Attacks intended to deprive legitimate users from accessing network resources and functions. Constant attempts to log on to a server by the intruder can slow down the server’s processing abilities and decrease or eliminate its ability to service legitimate users. Financial organizations run the risk of losing disgruntled customers when such attacks are prevalent. A typical example of a denial-of-service attack is the ping of death. Ping-of-death attacks occur when an attacker causes a sudden surge in ping messages to a particular host or network. If the target system’s processing power is not well protected, a huge amount of power could be wasted in responding to the ping-of-death attack. When the target system exceeds its processing threshold, the entire system collapses. Detriment to the capacity of resources such as memory, bandwidth, and so on can fall under this category.

(b)

Spam - Another well-known mode of interrupting legitimate activity on a network. A user receiving a flood of spam messages has to sort out these messages from legitimate e-mail, resulting in decreased efficiency at many organizations. IDSs should be capable of figuring out and fixing the spam issue.

16

(c)

Scanning - Scanning of network traffic or data may be another activity of interest to attackers. Scanning activities may be used to gain knowledge about the following: (i)

System parameters

(ii)

Host activities

(iii)

Types of network on the secured system

(iv)

Types of resources involved

(v)

Type of services provided

(vi)

Operating systems used

(vii)

Vulnerabilities present on the network

Port scanners and network scanners are common tools that an attacker uses for such activities.

2.3

Internet Network Many networks exist in the world, often with different hardware and

software. People connected to one network often want to communicate with people attached to a different one. The fulfilment of this desire requires that different, and frequently incompatible networks, be connected, sometimes by means of machines called gateways to make the connection and provide the necessary translation, both in terms of hardware and software. A collection of interconnected networks is called an internet network or internet. These terms will be used in a generic sense, in contrast to the worldwide Internet (which is one specific internet), which we will always capitalize. A common form of internet is a collection of local area networks (LANs) connected by a wide area network (WAN). In fact, if we were to replace the label ‘subnet’ in Figure 2.4 by WAN, nothing else in the figure would have to change. The only real technical distinction between a subnet and a WAN in this case is whether hosts are present. If the system within the gray area contains only routers, it is a

17

subnet; if it contains both routers and hosts, it is a WAN. The real differences relate to ownership and use.

Figure 2.4

Relation between hosts on LANs and the subnet.

Subnets, networks, and internet networks are often confused. Subnet makes the most sense in the context of a wide area network, where it refers to the collection of routers and communication lines owned by the network operator. As an analogy, the telephone system consists of telephone switching offices connected to one another by high-speed lines, and to houses and businesses by low-speed lines. These lines and equipment, owned and managed by the telephone company, form the subnet of the telephone system. The telephones themselves (the hosts in this analogy) are not part of the subnet. The combination of a subnet and its hosts forms a network. In the case of a LAN, the cable and the hosts form the network. There really is no subnet. An internet network is formed when distinct networks are interconnected. In our view, connecting a LAN and a WAN or connecting two LANs forms an internet network, but there is little agreement in the industry over terminology in this area. One rule of thumb is that if different organizations paid to construct different parts of the network and each maintains its part, we have an internet network rather than a single network. Also, if the underlying technology is different in different parts (e.g., broadcast versus point-to-point), we probably have two networks.

18

2.3.1

Differentiated Services (DiffServ) Differentiated Services (DiffServ) is architecture for implementing

scalable service differentiation in the Internet (Blake et al., 1998). A DiffServ domain is a contiguous set of DiffServ nodes (routers) which operate with a common service provisioning policy and a common set of Per Hop Behavior (PHB) groups implemented on each node. The domain consists of DiffServ boundary nodes and DiffServ interior nodes. Figure 2.5, in combination with Figures 2.6 and 2.7, illustrates the various DiffServ mechanisms in a simplified configuration (single DiffServ domain with exactly one pair of ingress and egress routers, looking at only one-way traffic from the ingress to the egress). The reader is referred to Request For Comment (RFC) 2475 in Blake et al. (1998) for more complex configurations, exceptions, multi-domain environments, and definition of terms.

Figure 2.5

A Simplified DiffServ Architecture.

The traffic entering the domain is classified and possibly conditioned at the ingress boundary nodes, and assigned to different behavior aggregates. Each behavior aggregate is mapped into a single DiffServ Code Point (DSCP), a value for a field in the IP protocol header, through a one-one or many-one mapping. Each behavior aggregate then receives a different PHB, which is

19

defined as the externally observable forwarding behavior applied at a DiffServ compliant node to a DiffServ behavior aggregate.

Figure 2.6

DiffServ Classifications and Conditioning/Policing at Ingress.

Figure 2.7

Differentially Serving Scheduler

The ingress boundary nodes key functions include traffic classification and/or traffic conditioning (also commonly referred to as policing). (a)

The packet classification identifies the QoS flow, that is, a subset of network traffic which may receive a differentiated service by being conditioned and/or mapped to one or more behavior aggregates within the DiffServ domain. Typical examples would be, using a DSCP mark provided by an upstream DiffServ domain, or using a source-destination IP address pair, or using the

20

type of traffic as indicated by the service/port numbers during classification. (b)

Conditioning involves some combination of metering, shaping, DSCP re-marking to ensure that the traffic entering the domain conforms to the rules specified in the Traffic Conditioning Agreement (TCA). The TCA is established in accordance with the domain’s forwarding service provisioning policy for the specific customer (group). This policy is specified through a Service Level Agreement (SLA) between the Internet Service Provider (ISP) and the QoS Customers.

This conditioning is of particular importance to us from an anomaly detection point of view. As mentioned above, this is done to make the traffic conform with the SLA/TCA specifications, so that QoS is easier to provide. Conformance with a typical SLA/TCA makes the flow statistics reasonably predictable. This incidentally, helps with anomaly detection applied to those statistics. At the interior nodes, packets are simply forwarded/ scheduled according to the PHB associated with their DSCPs. Thus, this architecture achieves scalability by aggregating the traffic classification state, performed by the ingress routers, into the DSCPs. The DiffServ interior nodes do not have to maintain per-flow states as is the case with, say, IntServ. In other words, in a DiffServ domain, the maintenance of perflow states is limited to the DiffServ cloud boundary, that is, the boundary (ingress) nodes. The DiffServ Working Group defines a DiffServ uncompliant node as any node which does not interpret the DSCP and/or does not implement the common PHBs. DiffServ are extended across a DiffServ domain boundary by establishing a SLA between an upstream DiffServ domain and a downstream DiffServ domain, which may specify packet classification and re-marking rules and may also specify traffic profiles and actions to traffic streams which are in- or out-of-profile.

21

PHBs that the architecture recommends are Assured Forwarding (AF) (Heinanen et al., 1999), Expedited Forwarding (EF) (Jacobson et al., 1999) and Best Effort (BE, zero priority) service classes. Briefly, an AF class has multiple subclasses with different dropping priorities marked into their DSCPs. Thus these classes receive varying levels of forwarding assurances/ services from DS-compliant nodes. The intent of the EF PHB/ class is to provide a service category in which suitably marked packets usually encounter short or empty queues to achieve expedited forwarding, that is, relatively minimal delay and jitter. Furthermore, if queues remain short relative to the buffer space available, packet loss is also kept to a minimum. A BE class provides no relatively prioritized service to packets that are marked accordingly.

2.3.2

DiffServ Vulnerabilities Since the DiffServ architecture is based on the Internet Protocols, in general, the

DSCPs are not encrypted. In other words, the DSCP marking process does not require the node to authenticate itself, and therefore, all nodes have full authority to remark the DSCPs downstream of the ingress routers. Vulnerability then is that the architecture leaves scope for attackers who can modify or use these service class code points to effect either a denial or a theft of QoS which is an expensive and critical network resource. With these attacks and other non QoS-specific ones (that do not make use of DSCPs), there is the possibility of disrupting the entire QoS provisioning infrastructure of a company, or a nation. The DiffServ Working Group designed and expects the architecture to withstand random network fluctuations. However, the architecture does not address QoS disruptions due to malicious and intelligent adversaries. Following are the attacks we identified, all of which, we believe, can be detected or defended against by a DiffServ network in combination with the detection system. A malicious external host could flood a boundary router congesting it. A DiffServ core router itself can be compromised. It can then is

22

made to remark, drop, delay QoS flows. It could also flood the network with extraneous traffic. The focus only on attacks and anomalies that affect the QoS parameters typically specified in SLAs such as packet dropping rates, bit rates, end-to-end one-way or two-way delays, and jitter. For example, an EF service class SLA typically specifies low latency/delay and low jitter. An EF flow sensor then monitors attacks on delay and jitter only. Attacks that lead to information disclosure, for example, are irrelevant to QoS provisioning, and thus to this work. Potential attacks that study are as follows: (a)

Packet Dropping Attacks. A compromised router can be forced to drop packets emulating network congestion. Some QoS flows may require exceptionally high packet delivery assurances. For example, for Voice-over-IP flows, typical SLAs guarantee sub 1% packet drop rates. Network-aware flows, such as those based on the TCP protocol, may switch to lower bit rates, perceiving the packet dropping as a sign of unavoidable network congestion.

(b)

DSCP Remarking Attacks. Remarking out-of-profile packets to lower QoS DSCP marks is normal and suggested in the standard. A malicious exploit might deceptively remark even inprofile packets to a lower QoS or could remark a lower QoS flow's packets to higher QoS marks, thus enabling them to compete with the higher QoS flow.

(d)

Flooding. A compromised router can generate extraneous traffic

from

within

the

DiffServ

network

or

without.

Furthermore, the generated flood of traffic could be just a besteffort or no-priority traffic or a high-QoS one for greater impact. (e)

Increase End-to-end Delay. Network propagation times can be increased by a malicious router code by delaying the QoS flow's packets with the use of kernel buffers and timers. This will be

23

perceived by end processes as a fairly loaded or large network. Consider the Table 2.1 which gives the simplicity, the impact and the expected difficulty of detection for each attack that investigate.

Attack Type

Simplicity

Impact

Detection Difficulty

Persistent Dropping

High

High

Low-Avg

Intermittent Dropping

High

Avg

Avg-High

Persistently Remarking BE to QoS

High

High

Low-Avg

Intermittently Remarking BE to QoS

High

Avg

Avg-High

Persistently Remarking QoS to BE

High

High

Low-Avg

Intermittently Remarking QoS to BE

High

Avg

Avg-High

Persistently Jitter

Low

Avg

Avg

Intermittently Jitter

Low

Low

High

Internally Flooding QoS

High

High

Low-Avg

Externally Flooding QoS

High

Low

Low-Avg

Internally Flooding BE

High High

Low

Low-Avg

Low

Low-Avg

High

Low-Avg

Externally Flooding BE End-to-end Delay

Low

Table 2.1

Likelihood, Impact and Difficulty-of-detection for Attacks

(a)

Simplicity, and thus the expected likelihood of an attack, is low when a significant effort or skill is required from the attacker or his/her malicious module on a compromised DiffServ router. For example, to be able to increase jitter and/or delay of a flow while not effecting (an easily detectable) dropping on it, the malicious code has to manage the packets within the kernel’s memory and it involves significant book keeping, whereas, dropping packets only requires interception and a given dropping rate.

24

(b)

Attacks are easier to detect (statistically) if the original profile/distribution of the QoS parameter they target is well suited for anomaly detection (strictly or weakly stationary mean etc). Jitter is thus, typically, not a good subject for this, while the bit rate for a close-to-constant bit rate flow with a low and bounded variability about the constant mean is.

2.4

Internet Traffic Before we plunge right into the details of mechanisms for moving Internet

traffic around, it might be good to know what exactly we are dealing with. The traffic that flows across the links of an internet service provider (ISP) has several interesting properties. First of all, it depends on the context - the time of day (in the middle of the night, there is typically less traffic than at morning peak hours, when every body checks email), the number and type of customers (e.g. business customers can be expected to use peer-to-peer file sharing tools less of ten than private customers do) and the geographical position in the Internet. Second, there can be sudden strange occurrences and unexpected traffic peaks from viruses and worms. Not all of the traffic that traverses a link stems from customers who want to communicate – the authors of (Moore et al. 2001) even deduce worldwide denial-of-service activity from a single monitored link. Despite the unexpected success of some applications and increasing usage of streaming video and voice over Internet protocol (VoIP), TCP makes up the majority of traffic in the Internet (Fomenkov et al. 2004). This means that we are dealing with an aggregate of flows that are mostly congestion controlled – unresponsive flow scan indeed because great harm in such a scenario. Moreover, this traffic is prone to all the peculiarities of TCP that we have seen in the previous chapters, that is, link noise is very problematic (it can lead to misinterpretations of corruption as assign of congestion) and so is reordering (it can cause the receiver to send Dup ACKs, which makes the sender reduce its congestion window). The latter issue is particularly

25

important for traffic management because it basically means that packets from a single TCP flow should not be individually routed - rather, they should stay together. Most of the TCP traffic stems from users who surf the web. Web flows are often very short-lived – it is common for them to end before TCP even reaches ‘ssthresh’, that is, they often remain in their slow-start phase. Thus, Internet traffic is a mixture of long-lived data transfers (often referred to as elephants) and such short flows (often called mice) (Guo and Matta 2001). Web traffic has the interesting property of showing self-similarity (Crovella and Bestavros 1997) - a property that was first shown for Ethernet traffic in the seminal paper (Leland et al. 1993). This has a number of implications – most importantly, it means that some mathematical tools (traditional queuing theory models for analysing telephone networks) may not work so well for the Internet because the common underlying notion that all traffic is Poisson distributed is invalid. What remains Poisson distribute dare user arrivals, not the traffic they generate (Paxson and Floyd 1995). Internet traffic shows long-range dependence, that is, it has a heavy-tailed autocorrelation function. This marks a clear difference between this distribution and a random process: the autocorrelation function of a Poisson distribution converges to zero. The authors of (Paxson and Floyd 1995) clearly explain what exactly this means: if Internet traffic would follow a Poisson process and you would look at a traffic trace of, say, five minutes and compare it with a trace of an hour or a day, you would notice that the distribution flattens as the time scale grows. In other words, it would converge to a mean value because a Poisson process has an equal amount of upward and downward motion. However, if you do the same with real Internet traffic, you may notice the same pattern at different time scales. When it may seem that a 10 - min trace shows a peak and there must be an equally large dip if we look at a longer interval, this may not be so in the case of real Internet traffic - what we saw may in fact be a small peak on top of a larger one; this can be described as ‘peaks that sit on ripples that ride on waves’. This recurrence of patterns is what is commonly referred to as self-similarity - in the case of Internet traffic, what we have is a self-similar time series.

26

It is well known that self-similarity occurs in a diverse range of natural, sociological and technical systems; in particular, it is interesting to note that rainfall bears some similarities to network traffic - the same mathematical model, a (fractional) autoregressive integrated moving average (FARIMA) process, can be used to describe both the time series (Gruber 1994; Xue et al. 1999). The fact that there is no theoretic limit to the time scale at which dependencies can occur (i.e. you cannot count on the aforementioned ‘flattening towards a mean’, no matter how long you wait) has the unhappy implication that it may in fact be impossible to build a dam that is always large enough. Translated into the world of networks, this means that the self-similar nature of traffic does have some implication son the buffer overflow probability: it does not decrease exponentially with a growing buffer size as predicted by queuing theory but it does so very slowly instead (Tsybakov and Georganas 1998) - in other words, large buffers do not help as much as one may believe, and this is another reason to make them small. What causes this strange property of network traffic? In Crovella and Bestavros (1997), it was attributed to user think times and file size distributions, but it has also been said that TCP is the reason - indeed, its traffic pattern is highly correlated. This behaviour was called pseudo - self-similarity in Guo et al. (2001), which makes it clear that TCP correlations in fact only appear over limited time scales. On a side note, TCP has been shown to propagate the self-similarity at the bottle neck router to end systems (Veres et al. 2000); in He et al. (2002), this fact was exploited to enhance the performance of the protocol by means of mathematical traffic modelling and prediction. Self-similarity in network traffic is a well-studied topic, and there is a wealth of literature available; Park and Willinger (2000) may be a good starting point if you are interested in further details. No matter where it comes from, the phenomenon is there, and it may make it hard for network administrators to predict network traffic. Taking this behaviour into consideration in addition to the aforementioned unexpected possible peaks from worms and viruses, it seems wise for an ISP to generally over provision the network and quickly does something when congestion is more than just a rare and sporadic event. In what follows, we will briefly discuss what exactly could be done.

27

2.5

Poisson Traffic

2.5.1 Poisson Law A discrete random variable taking unbounded positive values obeys a Poisson law with parameter λ if its distribution is given by:

The dimensionless parameter λ characterises the law. This distribution is encountered in various circumstances. Especially, if a flow arrives according to a Poisson process (see 2.5.2), then the number of arrivals observed in a window of width T is distributed according to the Poisson law with parameter A = λT . This gives to this law a fundamental role in teletraffic studies, especially for describing arrival of calls, sessions, etc., in communications systems. The conditions explaining the occurrence of such a law are detailed in 2.5.2. The characteristic function, from which the moments are derived, is written:

Mean value: Variance: The central moments of higher order are:

Figures 2.8a and 2.8b illustrate the general shape of the law. As this is a discrete distribution, the histogram resembles Figure 2.8a. In Figure 2.8b a smoothed variant shows the effect of changing the parameter. Variables obeying a Poisson distribution enjoy the following property, allowing combining them easily.

28

Figure 2.8a

Figure 2.8b

The histogram of the Poisson distribution (λ = 5)

Smoothed shape of the Poisson distribution for different parameter

values THEOREM - Let X and Y be two Poisson variables with parameters respectively λ and µ. Then, X + Y obey a Poisson distribution with parameter λ + µ. The proof is immediate; using the transform approach, the transform of the sum being the product of transforms (Laplace, characteristic function, etc.). This property is of the higher importance, as one has often to treat the case of a mix of Poisson streams.

29

2.5.2

Poisson Process Let us assume that the arrival process complies with the following rules: (a)

the probability of an arrival in an interval [t, t + Δt [ does not depend on what happened before the instant t. This is the so-called memoryless property.

(b)

the probability of the arrival of a client is proportional to Δt, and the probability of more than one event is ‘negligible’ (in the upper order of the infinitely small, in mathematical language). The proportionality factor is rated λ (process intensity).

These are the classical assumptions that lead to the Poisson process. How can the distribution of probability of the number of arrivals in a given time be estimated on the basis of the above axioms? The reasoning applied is typical of the theory: Let us denote as Pk (t) the probability of k arrivals in the interval [0, t[. One can describe how this probability varies over time: k clients will be observed in the interval [0, t + Δt[ if: (a)

k clients have been observed in [0, t[, and no arrivals have been observed in [t, t + Δt[;

(b)

k - 1 clients have been observed in [0,t[, and one arrival occurred in [t, t + Δt[;

(c)

k - n, n > 1 arrivals have been observed in [0, t[ , and n arrivals in[t, t + Δt[ , etc.

If these observations are put into an equation:

30

If these observations are put into an equation:

The development of the previous equation (making Δt → 0, etc.) leads to:

that is P0(t) = ae-λt, where a is a constant, still unknown at this point.

The general equation will be similarly developed. It should be noted that in the passage to the limit that leads to the derivative (that is, Δt → 0), the terms in o(Δt)/Δt disappear:

Thus, The basic resolution method proceeds using a recurrence:

The obvious condition P1(0) = 0 leads to b = 0; the reader will be able to write the second iteration, which gives the intuition of the general solution:

This can be verified with the general equation. It can now be noted that, as a certain number of arrivals have taken place in the interval, it is essential that:

31

which gives a = 1 . Finally, the probability of observing k arrivals in an interval of length t amounts to:

The above discrete distribution is the Poisson law, which is sometimes written, by noting A = λt:

A is then the mean traffic offered during the period considered. The distribution function of the probability law for the interval for two successive arrivals is derived from the distribution: the probability of an interval between arrivals greater than t is the probability that no arrival occurs between 0 and t: A(t) = 1-e-λt The average number of arrivals observed in any interval of length t is: m = λt and its variance is also σ2 = λt It should be noted that the Poisson process is a renewal process.

32

2.5.3

Traffic Analysis For interactive TELNET traffic, connection arrivals are well-modeled as

Poisson with fixed hourly rates. However, the exponentially-distributed inter arrivals commonly used to model packet arrivals generated by the user side of a TELNET connection grievously underestimate the burstiness of those connections, and high degrees of multiplexing do not help. Using the empirical Tcplib (Danzig and Jamin, 1991; Danzig et al., 1992) distribution for TELNET packet inter arrivals instead results in packet arrival processes significantly burstier than Poisson arrivals, and in close agreement with traces of actual traffic. From these findings then construct a model of TELNET traffic parameterized by only the hourly connection arrival rate and show that it accurately reflects the burstiness found in actual TELNET traffic. The success with this model of using Tcplib packet inter arrivals confirms the finding in Danzig et al. (1992) that the arrival pattern of user-generated TELNET packets has an invariant distribution, independent of network details. For small machine-generated bulk transfers such as simple mail transfer protocol (SMTP - email) and network news transfer protocol (NNTP), connection arrivals are not well-modeled as Poisson, which is not surprising since both types of connections are machine-initiated and can be timer-driven. Previous research has discussed how the periodicity of machine-generated IP traffic such as routing updates can result in network-wide traffic synchronization (Floyd and Jacobson, 1994), a phenomenon impossible with Poisson models. For large bulk transfer, exemplified by file transfer protocol (FTP), the traffic structure is quite different than suggested by Poisson models. As with TELNET connections, user-generated FTP session arrivals are well-modeled as Poisson with fixed hourly rates. However, FTP data connections within a single FTP session (which are initiated whenever the user lists a directory or transfers a file) come clustered in bursts. Hereafter we will refer to these data connections as FTPDATA connections and the corresponding bursts as FTPDATA bursts. Neither FTPDATA connection nor FTPDATA - burst arrivals are well-modeled as Poisson processes. Furthermore, the distribution of the number of bytes in each burst has a very heavy

33

upper tail; a small fraction of the largest bursts carries almost all of the FTPDATA bytes. This implies that faithful modeling of FTP traffic should concentrate heavily on the characteristics of the largest bursts. Poisson arrival processes are quite limited in their burstiness, especially when multiplexed to a high degree. However, show that wide-area traffic is much burstier than Poisson models predict, over many time scales. This greater burstiness has implications for many aspects of congestion control and traffic performance.

2.6

Self-similar Traffic

2.6.1

Self-similarity Self-similarity and fractals are notions pioneered by Mandelbrot (1982). They

describe the phenomenon where a certain property of an object - e.g., a natural image, the convergent sub domain of certain dynamical systems, a time series (the mathematical object of our interest) - is preserved with respect to scaling in space and/or time. If an object is self-similar or fractal, its parts, when magnified, resemble - in a suitable sense - the shape of the whole. For example, the 2-dimensional Cantor set living on A = [0,1] x [0,1] is obtained by starting with a solid or black unit square, scaling its size by 1/3, then placing four copies of the scaled solid square at the four corners of A. If the same process of scaling followed by translation is applied recursively to the resulting objects ad infinitum, the limit set thus reached defines the 2D Cantor set. This constructive process is illustrated in Figure 2.9. The limiting object - defined as the infinite intersection of the iterates - has the property that if any of its corners are ‘blown up’ suitably, then the shape of the zoomed-in part is similar to the shape of the whole, i.e., it is self-similar. Of course, this is not too surprising since the constructive process - by its recursive action - endows the limiting object with the scale-invariance property.

34

Figure 2.9

2-dimensional Cantor set.

The 1-dimensional Cantor set, e.g., as obtained by projecting the 2-D Cantor set onto the line, can be given an interpretation as a traffic series X(t) € {0,1} - call it ‘Cantor traffic’ - where X(t) = 1 means that there is a packet transmission at time. This is depicted in Figure 2.10 (left). If the constructive process is terminated at iteration n ≥ 0, then the contiguous line segments of length 1/3n may be interpreted as ‘on-periods’ or packet trains of duration1/3n, and the segments between successive on-periods as ‘off-periods’ or absence of traffic activity. Non uniform traffic intensities maybe imparted by generalizing the constructive framework via the use of probability measures. For example, for the 1-dimensional Cantor set, instead of letting the left and right components after scaling have identical ‘mass’, they may be assigned different mass, subject to the constraint that the total mass be preserved at each stage of the iterative construction. This modification corresponds to defining a probability measure µ, on the Borel subsets of [0,1] and distributing the measure at each iteration non uniformly left and right. Note that the classical Cantor set construction - viewed as a map - is not measure-preserving. Figure 2.10 (left) shows such a construction with weights αL = 2/3, αR = 1/3 for the left and right components, respectively. The probability measure is represented by ‘height’; we observe that scale-invariance is exactly preserved. In general, the traffic patterns producible with fixed weights αL, αR are limited, but one can extend the framework by allowing possibly different weights associated with every edge in the weighted binary tree induced by the 1-dimensional Cantor set construction. Such constructions arise in a more refined characterization of network traffic - called multiplicative processes or cascades. Further generalizations can be obtained by defining different affine transformations with variable scale factors and translations at every level in the ‘traffic tree’. The corresponding traffic pattern is self-similar if, and only if, the infinite tree can be compactly represented as a finite directed cyclic graph (Barnsley, 1988).

35

Figure 2.10

Left: 1-dimensional Cantor set interpreted as on/off traffic. Middle: 1-

dimensional non uniform Cantor set with weights αL = 2/3, αR = 1/3. Right: Cumulative process corresponding to 1-dimensional on/off Cantor traffic. Whereas the previous constructions are given interpretations as traffic activity ‘per unit time’, we will find it useful to consider their corresponding ‘cumulative’ processes which are non decreasing processes whose differences - also called increment process - constitute the original process. For example, for the on/off Cantor traffic construction (cf. Figure 2.10 (left)), let us assign the interpretation that time is discrete such that at step n ≥ 0 it ranges over the values t = 0, 1/3n, 2/3n, … , (3n - l)/3n, 1. Thus we can equivalently index the discrete time steps by i = 0, 1, 2, ... , 3n. With a slight abuse of notation, let us redefine X(.) as X(i) = 1 if, and only if, in the original process X(i/3n) = 1 and X(i/3 n - ε) = 1 for all 0 < ε dest is reserved cir1 and cbs1.. all # overlimit ones are classified with CP 11 which enjoys same DiffServ # as CP 0 (Best Effort) from S1 $qE1C addPolicyEntry [$s2 id] [$dest id] TokenBucket 10 $cir1 $cbs1 $qE1C addPolicerEntry TokenBucket 10 11 $qE1C addPolicyEntry -1 [$dest id] TokenBucket 0 $cir0 $cbs0 $qE1C addPolicerEntry TokenBucket 0 0 $qE1C addPHBEntry 10 0 0 $qE1C addPHBEntry 11 0 1 $qE1C addPHBEntry 0 0 1 # Differentiated Forwarding Assurances # 0 0 gets Better DS $qE1C configQ 0 0 20 40 0.02 Page: 2

File: E:\NS-2\ns-tutorial\examples\thesis3.tcl 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182

4/15/2007, 2:15:08AM

# 0 1 gets Worse DS $qE1C configQ 0 1 10 20 0.10 # Set $qE2C $qE2C $qE2C

DiffServ RED parameters from Edge2 to Core: meanPktSize $packetSize set numQueues_ 1 setNumPrec 2

$qE2C addPolicyEntry [$dest id] [$s2 id] TokenBucket 10 $cir1 $cbs1 $qE2C addPolicerEntry TokenBucket 10 11 $qE2C addPolicyEntry [$dest id] -1 TokenBucket 0 $cir0 $cbs0 $qE2C addPolicerEntry TokenBucket 0 0 $qE2C $qE2C $qE2C $qE2C $qE2C

addPHBEntry addPHBEntry addPHBEntry configQ 0 0 configQ 0 1

10 0 0 11 0 1 0 0 1 20 40 0.02 10 20 0.10

# Set $qCE1 $qCE1 $qCE1

DiffServ RED parameters from Core to Edge1: meanPktSize $packetSize set numQueues_ 1 setNumPrec 2

$qCE1 $qCE1 $qCE1 $qCE1 $qCE1

addPHBEntry addPHBEntry addPHBEntry configQ 0 0 configQ 0 1

10 0 0 11 0 1 0 0 1 20 40 0.02 10 20 0.10

# Set DiffServ RED parameters from Core to Edge2: $qCE2 setSchedularMode WRR # Get the WRR parameters from ratio of TotPkts on qCE2 stats. $qCE2 addQueueWeights 0 1 $qCE2 addQueueWeights 1 16 $qCE2 $qCE2 $qCE2 $qCE2 $qCE2

meanPktSize $packetSize set numQueues_ 2 setNumPrec 2 addPHBEntry 10 0 0 addPHBEntry 11 0 1

# BE is put in a separate queue 1 0, to provide # better B/W sharing b/w S1 and S2 side traffic $qCE2 addPHBEntry 0 1 0 $qCE2 configQ 0 0 20 40 0.02 $qCE2 configQ 0 1 10 20 0.06 $qCE2 configQ 1 0 10 20 0.10 #---------------------------------------------------------------#---------------------------------------------------------------# Set up one CBR connection between each source and the destination: set null0 [new Agent/LossMonitor] $ns attach-agent $dest $null0 for {set i 0} {$i < $numP} {incr i} { set udp($i) [new Agent/UDP] Page: 3

File: E:\NS-2\ns-tutorial\examples\thesis3.tcl 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243

4/15/2007, 2:15:08AM

$ns attach-agent $p($i) $udp($i) $udp($i) set class_ [expr $i+1] $ns color $i Blue # self-similar traffic generator #set pareto($i) [new Application/Traffic/Pareto] #$pareto($i) set packetSize_ $packetSize #$pareto($i) set burst_time_ 500ms #$pareto($i) set idle_time_ 500ms # Set rate apply #$pareto($i) set rate_ 200k #$pareto($i) set shape_ 1.5 #$pareto($i) attach-agent $udp($i) # Poisson traffic generator set poisson($i) [new Application/Traffic/Exponential] $poisson($i) set packetSize_ $packetSize $poisson($i) set burst_time_ 0ms $poisson($i) set idle_time_ 500ms # Set rate apply $poisson($i) set rate_ 200k $poisson($i) attach-agent $udp($i) $ns connect $udp($i) $null0 } set null1 [new Agent/LossMonitor] $ns attach-agent $dest $null1 for {set i 0} {$i < $numA} {incr i} { set udp2($i) [new Agent/UDP] $ns attach-agent $p1($i) $udp2($i) $udp2($i) set class_ [expr $i+1] $ns color $i Red # self-similar traffic generator set pareto($i) [new Application/Traffic/Pareto] $pareto($i) set packetSize_ $packetSize $pareto($i) set burst_time_ 500ms $pareto($i) set idle_time_ 500ms # Set rate apply $pareto($i) set rate_ 200k $pareto($i) set shape_ 1.5 $pareto($i) attach-agent $udp2($i) # Poisson traffic generator #set poisson($i) [new Application/Traffic/Exponential] #$poisson($i) set packetSize_ $packetSize #$poisson($i) set burst_time_ 0ms #$poisson($i) set idle_time_ 500ms # Set rate apply #$poisson($i) set rate_ 200k #$poisson($i) attach-agent $udp2($i) $ns connect $udp2($i) $null1 }

Page: 4

File: E:\NS-2\ns-tutorial\examples\thesis3.tcl 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303

4/15/2007, 2:15:08AM

set udp1 [new Agent/UDP] $ns attach-agent $s2 $udp1 $udp1 set class_ numP $ns color numP Blue # CBR or MPEG4 source set cbr1 [new Application/Traffic/CBR] $cbr1 attach-agent $udp1 $cbr1 set packet_size_ $packetSize $udp1 set packetSize_ $packetSize $cbr1 set rate_ $cir1 # generate the video trace file ("Verbose_Jurassic_64.dat" is only an example): #set original_file_name Verbose_Jurassic_64.dat #set trace_file_name video.dat #set original_file_id [open $original_file_name r] #set trace_file_id [open $trace_file_name w] #set last_time 0 #while {[eof $original_file_id] == 0} { # gets $original_file_id current_line # if {[string length $current_line] == 0 || # [string compare [string index $current_line 0] "#"] == 0} { # continue # } #

scan $current_line "%d%s%d" next_time type length

#

set time [expr 1000*($next_time-$last_time)]

#

set last_time $next_time

# puts -nonewline $trace_file_id [binary format "II" $time $length] #} #close $original_file_id #close $trace_file_id #set tfile [new Tracefile] #$tfile filename $trace_file_name #set stream [new Application/Traffic/Trace] #$stream attach-tracefile $tfile #$stream attach-agent $udp1 # S2 is sinked in a LossMonitor # Create three traffic sinks and attach them to the node dest set sink0 [new Agent/LossMonitor] $ns attach-agent $dest $sink0 $ns connect $udp1 $sink0 #----------------------------------------------------------------

Page: 5

File: E:\NS-2\ns-tutorial\examples\thesis3.tcl

4/15/2007, 2:15:08AM

304 #---------------------------------------------------------------305 # Define a procedure which periodically records the bytes/packets 306 # received by the traffic sink sink0 & null0 and writes it to the files. 307 308 proc record {} { 309 global null0 null1 sink0 f2 f3 310 # Get an instance of the simulator 311 set ns [Simulator instance] 312 # Set the time after which the procedure should be called again (make this 10s) 313 set time 1.0 314 # Set the total number of bytes received at sink 315 #set ss0 [$null0 set bytes_] 316 #set strm0 [$sink0 set bytes_] 317 # Set the number of packets received at sink 318 set ss1 [$null0 set npkts_] 319 set ss11 [$null1 set npkts_] 320 set strm1 [$sink0 set npkts_] 321 # Set the number of packets lost at sink 322 #set ss2 [$null0 set nlost_] 323 #set strm2 [$sink0 set nlost_] 324 # Get the current time 325 set now [$ns now] 326 # Calculate the number of bytes received and write it to the files 327 #puts $f0 "$now [expr $ss0]" 328 #puts $f1 "$now [expr $strm0]" 329 puts $f2 "$now [expr ($ss1+$ss11)]" 330 puts $f3 "$now [expr $strm1]" 331 #puts $f4 "$now [expr $ss2]" 332 #puts $f5 "$now [expr $strm2]" 333 334 #$null0 set bytes_ 0 335 #$sink0 set bytes_ 0 336 $null0 set npkts_ 0 337 $null1 set npkts_ 0 338 $sink0 set npkts_ 0 339 #$null0 set nlost_ 0 340 #$sink0 set nlost_ 0 341 # Re-schedule the procedure 342 $ns at [expr $now+$time] "record" 343 } 344 #---------------------------------------------------------------345 346 #---------------------------------------------------------------347 proc finish {} { 348 global ns f2 f3 nf 349 $ns flush-trace 350 # Close the output files 351 #close $f0 352 #close $f1 353 close $f2 354 close $f3 355 #close $f4 356 #close $f5 357 # Close the trace file 358 close $nf 359 # Execute nam on the trace file 360 #exec nam thesis1.nam & 361 # Call xgraph to display the results 362 #exec xgraph bytenode.tr -geometry 800x600 -t "# of Bytes Received

Page: 6

File: E:\NS-2\ns-tutorial\examples\thesis3.tcl

363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419

4/15/2007, 2:15:08AM

fm Nodes" -x "secs" -y "# bytes" & #exec xgraph byteqos.tr -geometry 800x600 -t "# of Bytes Received fm QoS Customer" -x "secs" -y "# bytes" & exec xgraph rxnode.tr -geometry 800x600 -t "# of Packets RX fm Nodes" -x "secs" -y "# packets" & exec xgraph rxqos.tr -geometry 800x600 -t "# of Packets RX fm QoS Customer" -x "secs" -y "# packets" & exit 0 } #---------------------------------------------------------------#---------------------------------------------------------------$qE1C printPolicyTable $qE1C printPolicerTable $qE2C printPolicyTable $qE2C printPolicerTable $ns at 0.0 "record" for {set i 0} {$i < $numP} {incr i} { #$ns at 0.0 "$pareto($i) start" $ns at 0.0 "$poisson($i) start" } for {set i 0} {$i < $numA} {incr i} { $ns at 400.0 "$pareto($i) start" #$ns at 400.0 "$poisson($i) start" } $ns at 0.0 "$cbr1 start" #$ns at 0.0 "$stream start" # Attacks go here (only simplified samples shown for brevity) # Attacks on Delay and Bandwidth (ldrops) through WRR parameters #$ns at 200.0 "$qCE2 addQueueWeights 0 1" #$ns at 600.0 "$qCE2 addQueueWeights 0 5" # Attacks through RED parameters causing early drops (edrops) # virtual queue/min/max/max drop probing #$qCE2 configQ 0 0 20 40 0.10 #$qCE2 configQ 0 1 20 30 0.30 #$qCE2 configQ 1 0 10 20 0.10 for {set i 0} {$i < $numP} {incr i} { #$ns at $testTime "$pareto($i) stop" $ns at $testTime "$poisson($i) stop" } for {set i 0} {$i < $numA} {incr i} { $ns at 600.0 "$pareto($i) stop" #$ns at 600.0 "$poisson($i) stop" } $ns at $testTime "$cbr1 stop" #$ns at $testTime "$stream stop" #$ns at $testTime "$qCE1 printStats" $ns at $testTime "$qCE2 printStats" $ns at [expr $testTime + 1.0] "finish" Page: 7

File: E:\NS-2\ns-tutorial\examples\thesis3.tcl

4/15/2007, 2:15:08AM

420 421 $ns run 422 #----------------------------------------------------------------

Page: 8