Artificial Immune System and Its Applications

Artificial Immune System and Its Applications Prof. Ying TAN National Laboratory on Machine Perception Department of Intelligence Science Peking Unive...
Author: Roland Owen
11 downloads 2 Views 587KB Size
Artificial Immune System and Its Applications Prof. Ying TAN National Laboratory on Machine Perception Department of Intelligence Science Peking University, Beijing 100871, P.R.China

2005-12-13

Y. Tan---Artificial Immune Sys.

1

Contents • • • • •

Biological Immune System Artificial Immune System Basic Algorithms of AIS AIS design procedure Case Studies – Malicious Executable Detection – Film Recommender

New • Immuneocomputing – IC • Danger Theory • Future 2005-12-13

Y. Tan---Artificial Immune Sys.

2

The Immune System is… Immune system: a system that protects the body from foreign substances and pathogenic organisms by producing the immune response Immunity: state or quality of being resistant (immune), either by virtue of previous exposure (adaptive immunity) or as an inherited trait (innate immunity) 2005-12-13

Y. Tan---Artificial Immune Sys.

3

Why is the Immune System? Immune system has following appealing features: • Recognition – Anomaly detection – Noise tolerance

• • • • • • • • •

Robustness Feature extraction Diversity Reinforcement learning Memory; Dynamically changing coverage Distributed Multi-layered Adaptive

2005-12-13

Y. Tan---Artificial Immune Sys.

4

Role of Biological Immune System • Protect our bodies from pathogen and viruses • Primary immune response – Launch a response to invading pathogens

• Secondary immune response – Remember past encounters – Faster response the second time around

2005-12-13

Y. Tan---Artificial Immune Sys.

5

Immune cells • There are two primarily types of lymphocytes: – B-lymphocytes (B cells) – T-lymphocytes (T cells)

• Others types include macrophages, phagocytic cells, cytokines, etc.

2005-12-13

Y. Tan---Artificial Immune Sys.

6

Where is it? P r im a r y l y m p h o i d o r g a n s

S e c o n d a r y lym p h o id o r g a

T o n s ils a n d a d e n o id s

T hym us S p le e n

P e y e r ’s p a t c h e s A p p e n d ix Lym ph nodes

B o n e m a rro w

L y m p h a tic v e s s e ls

2005-12-13

Y. Tan---Artificial Immune Sys.

7

Multiple layers of the immune system Pathogens

Skin Biochem ical barriers Phagocyte Innate im m une response

Lym phocytes

Adaptive im m une response 2005-12-13

Y. Tan---Artificial Immune Sys.

8

Antigen • Substances capable of starting a specific immune response commonly are referred to as antigens • This includes some pathogens such as viruses, bacteria, fungi etc .

2005-12-13

Y. Tan---Artificial Immune Sys.

9

Biological Immune System vs

Innate

Cell Mediated

Acquired

vs

Humoral

T Cell (Helper) B Cell Secretes

T Cell (Killer)

2005-12-13

Antibody Y. Tan---Artificial Immune Sys.

10

How does IS work: A simplistic view M H C

p r o te in

A n tig e n ( I )

A P C

P e p tid e ( II ) T - c e ll

( IV A c tiv a te d

( V

B - c e ll

( II I )

)

)

L y m p h o k in e s

T - c e ll

( V I )

A c t iv a t e d ( p la s m a

B - c e ll c e ll)

( V II )

2005-12-13

Y. Tan---Artificial Immune Sys.

11

Self/Non-Self Recognition • Immune system needs to be able to differentiate between self and non-self cells • Antigenic encounters may result in cell death, therefore – Some kind of positive selection – Some element of negative selection

2005-12-13

Y. Tan---Artificial Immune Sys.

12

Immune Pattern Recognition BCR or Antibody

B-cell Receptors (Ab) Epitopes Antigen B-cell

• The immune recognition is based on the complementarity between the binding region of the receptor and a portion of the antigen called epitope. • Antibodies present a single type of receptor, antigens might present several epitopes. – This means that each antibody can recognize a single antigen

2005-12-13

Y. Tan---Artificial Immune Sys.

13

Clonal Selection Clonal deletion (negative selection) Self-antigen

Proliferation (Cloning)

M

M

Antibody

Memory cells

Selection Differentiation

Plasma cells

Foreign antigens

Self-antigen Clonal deletion (negative selection)

2005-12-13

Y. Tan---Artificial Immune Sys.

14

Main Properties of Clonal Selection (Burnet, 1978) • Elimination of self antigens • Proliferation and differentiation on contact of mature lymphocytes with antigen • Restriction of one pattern to one differentiated cell and retention of that pattern by clonal descendants; • Generation of new random genetic changes, subsequently expressed as diverse antibody patterns by a form of accelerated somatic mutation 2005-12-13

Y. Tan---Artificial Immune Sys.

15

Immune Network Theory • Idiotypic network (Jerne, 1974) • B cells co-stimulate each other – Treat each other a bit like antigens • Creates an immunological memory Suppression Negative response

Paratope

A g 1 2

Idiotope

3

Antibody Activation Positive response

2005-12-13

Y. Tan---Artificial Immune Sys.

16

Reinforcement Learning and Immune Memory • Repeated exposure to an antigen throughout a lifetime • Primary, secondary immune responses • Remembers encounters – No need to start from scratch – Memory cells

• Continuous learning 2005-12-13

Y. Tan---Artificial Immune Sys.

17

Learning (2)

Antibody Concentration

Lag Lag

Response to Ag1

Lag

Response to Ag1

... ... Antigen Ag1

2005-12-13

Cross-Reactive Response

Secondary Response

Primary Response

...

Response to Ag1’=Ag1 + Ag3

Response to Ag2

Antigens Ag1, Ag2

Y. Tan---Artificial Immune Sys.

... Antigen Ag1 + Ag3

Time

18

Back

Immune System: Summary

• Define host (body cells) from external entities. • When an entity is recognized as foreign (or dangerous)- activate several defense mechanisms leading to its destruction (or neutralization). • Subsequent exposure to similar entity results in rapid immune response. • Overall behavior of the immune system is an emergent property of many local interactions.

2005-12-13

Y. Tan---Artificial Immune Sys.

19

Immune metaphors

Back

Other areas Idea!

Idea ‘

Immune System Artificial Immune Systems 2005-12-13

Y. Tan---Artificial Immune Sys.

20

What is an Artificial Immune System? Definition Dasgupta’99: “Artificial immune systems (AIS) are intelligent and adaptive systems inspired by the immune system toward real-world problem solving” de Castro and Timmis: “Artificial Immune Systems (AIS) are adaptive systems, inspired by theoretical immunology and observed immune functions, principles and models, which are applied to problem solving” http://www.cs.kent.ac.uk/people/staff/jt6/aisbook/ •Using natural immune system as a metaphor for solving complex computational problems. •Not modelling the immune system 2005-12-13

Y. Tan---Artificial Immune Sys.

21

AI models and their corresponding natural prototypes Natural prototype Biological level Natural language

Left hemisphere of brain

Brain nervous net

Cells

Biological cells

Cells

Molecules of proteins Genetic code

Molecular

2005-12-13

Molecular

AI model Formal logic Formal linguistic Neural computing (NC) Neural networks (NN) Cellular automata (CA) Artificial immune systems (AIS) Genetic Algorithms (GA)

Y. Tan---Artificial Immune Sys.

22

Some History • Developed from the field of theoretical immunology in the mid 1980’s. – Suggested we ‘might look’ at the IS

• 1990 – Bersini first use of immune algorithms to solve problems • Forrest et al – Computer Security mid 1990’s • Hunt et al, mid 1990’s – Machine learning • More…… 2005-12-13

Y. Tan---Artificial Immune Sys.

23

AIS’ Scope • • • • • • • • • • • • •

Pattern recognition; Fault and anomaly detection; Data analysis; Data mining (classification/clustering) Agent-based systems; Scheduling; Machine-learning; Autonomous navigation and control; Search and optimization methods; Artificial life; Security of information systems; Optimization; Just to name a few.

2005-12-13

Y. Tan---Artificial Immune Sys.

24

Back

Typical Applications of AIS

• Computer Security(Forrest’94’96’98, Kephart’94, Lamont’98’01,02, Dasgupta’99’01, Bentley’00’01,02) • Anomaly Detection (Dasgupta’96’01’02) • Fault Diagnosis (Ishida’92’93, Ishiguro’94) • Data Mining & Retrieval (Hunt’95’96, Timmis’99’01, ’02) • Pattern Recognition (Forrest’93, Gibert’94, de Castro ’02) • Adaptive Control (Bersini’91) • Job shop Scheduling (Hart’98, ’01, ’02) • Chemical Pattern Recognition (Dasgupta’99) • Robotics (Ishiguro’96’97,Singh’01) • Optimization (DeCastro’99,Endo’98, de Castro ’02) • Web Mining (Nasaroui’02,Secker’05) • Fault Tolerance (Tyrrell, ’01, ’02, Timmis ’02) • Autonomous Systems (Varela’92,Ishiguro’96) • Engineering Design Optimization (Hajela’96 ’98, Nunes’00) 2005-12-13

Y. Tan---Artificial Immune Sys.

25

Basic Immune Models and Algorithms • • • • •

Bone Marrow Models Negative Selection Algorithms Clonal Selection Algorithm Immune Network Models Somatic Hypermutation

2005-12-13

Y. Tan---Artificial Immune Sys.

26

Bone Marrow Models • Gene libraries are used to create antibodies from the bone marrow • Antibody production through a random concatenation from gene libraries • Simple or complex libraries An individual genome corresponds to four libraries: Library 1 A1 A2 A3 A4 A5 A6 A7 A8 A3

Library 2

Library 3

B1 B2 B3 B4 B5 B6 B7 B8

Library 4

C1 C2 C3 C4 C5 C6 C7 C8

B2

D1 D2 D3 D4 D5 D6 D7 D8

C8

A3

B2

C8

D5

A3 B2 C8 D5

D5

= four 16 bit segments = a 64 bit chain

Expressed Ab molecule

2005-12-13

Y. Tan---Artificial Immune Sys.

27

Negative Selection (NS) Algorithms • Forrest 1994: Idea taken from the negative selection of T-cells in the thymus • Applied initially to computer security • Split into two parts: – Censoring – Monitoring Self strings (S)

Generate random strings (R0)

Match

D e te c to r S e t (R )

No

Detector Set (R)

P ro te c te d S trin g s (S )

Yes Reject

No

Yes N o n -s e lf D e te c te d

Censoring 2005-12-13

M a tc h

Y. Tan---Artificial Immune Sys.

Monitoring 28

Clonal Selection Algorithm (de Castro & von Zuben, 2001) 1. Initialisation: Randomly initialise a population (P) 2. Antigenic Presentation: for each pattern in Ag, do:

2.1 Antigenic binding: determine affinity to each P 2.2 Affinity maturation: select n highest affinity from P and clone and mutate prop. to affinity with Ag, then add new mutants to P 3. Metadynamics: 3.1 select highest affinity P to form part of M 3.2 replace n number of random new ones 4. Cycle: repeat 2 and 3 until stopping criteria (e.g. Max Generation) 2005-12-13

Y. Tan---Artificial Immune Sys.

29

CLONALG for PR, Learning, Optimization

Agj Ab{d} Abj*

Ab {r} Ab {m} fj

Select Select Fj* Ab {n} Cj* Clone L.N. de Castro, et.al., Learning and optimization using the clonal selection principle, IEEE Trans. Evolutionary computation, vol.6, no.3, June 2002, pp.239251

2005-12-13

Select Y. Tan---Artificial Immune Sys.

Cj 30

Discrete Immune Network Models (Timmis & Neal, 2001) 1. 2.

Initialisation: create an initial network from a sub-section of the antigens Antigenic presentation: for each antigenic pattern, do: 2.1 Clonal selection and network interactions: for each network cell, determine its stimulation level (based on antigenic and network interaction) 2.2 Metadynamics: eliminate network cells with a low stimulation 2.3 Clonal Expansion: select the most stimulated network cells and reproduce them proportionally to their stimulation 2.4 Somatic hypermutation: mutate each clone 2.5 Network construction: select mutated clones and integrate 3. Cycle: Repeat step 2 until termination condition is met

2005-12-13

Y. Tan---Artificial Immune Sys.

31

Immune Network Models • Timmis & Neal, 2000 • Used immune network theory as a basis, proposed the AINE algorithm Initialize AIN For each antigen Present antigen to each ARB in the AIN Calculate ARB stimulation level Allocate B cells to ARBs, based on stimulation level Remove weakest ARBs (ones that do not hold any B cells) If termination condition met exit else Clone and mutate remaining ARBs Integrate new ARBs into AIN 2005-12-13

Y. Tan---Artificial Immune Sys.

32

Immune Network Models • De Castro & Von Zuben (2000c) • aiNET, based in similar principles At each iteration step do For each antigen do Determine affinity to all network cells Select n highest affinity network cells Clone these n selected cells Increase the affinity of the cells to antigen by reducing the distance between them (greedy search) Calculate improved affinity of these n cells Re-select a number of improved cells and place into matrix M Remove cells from M whose affinity is below a set threshold Calculate cell-cell affinity within the network Remove cells from network whose affinity is below a certain threshold Concatenate original network and M to form new network Determine whole network inter-cell affinities and remove all those below the set threshold Replace r% of worst individuals by novel randomly generated ones Test stopping criterion 2005-12-13

Y. Tan---Artificial Immune Sys.

33

Back

Somatic Hypermutation

• Mutation rate in proportion to affinity • Very controlled mutation in the natural immune system • Trade-off between the normalized antibody affinity D* and its mutation rate α, 1

0 .9 0 .8 0 .7

ρ =

0 .6

α

5

0 .5

ρ =

0 .4

1 0 ρ =

0 .3

2 0

0 .2 0 .1 0

2005-12-13

0

0 .1

0 .2

0 .3

0 .4

0 .5

D *

0 .6

Y. Tan---Artificial Immune Sys.

0 .7

0 .8

0 .9

1

34

General Framework of AIS Solution Immune Algorithms Affinity Measures Representation

Problem 2005-12-13

Application Domain Y. Tan---Artificial Immune Sys.

35

Representation – Shape Space • Describe the general shape of a molecule A n t ig e n

A n t ib o d y

•Describe interactions between molecules •Degree of binding between molecules

2005-12-13

Y. Tan---Artificial Immune Sys.

36

Representation • Vectors

• • • •

Ab = 〈Ab1, Ab2, ..., AbL〉 Ag = 〈Ag1, Ag2, ..., AgL〉 Real-valued shape-space Integer shape-space Binary shape-space Symbolic shape-space

2005-12-13

Y. Tan---Artificial Immune Sys.

37

Define their Interaction • Define the term Affinity • Affinity is related to distance – Euclidian

D=

L

2 ( Ab − Ag ) ∑ i i i =1

• Other distance measures such as Hamming, Manhattan etc. etc. • Affinity Threshold

2005-12-13

Y. Tan---Artificial Immune Sys.

38

Shape Space Formalism • Repertoire of the immune system is complete (Perelson, 1989)

• Extensive regions of complementarity • Some threshold of recognition 2005-12-13

Y. Tan---Artificial Immune Sys.



ε

V

´ Vε

´

ε

´

´ ´



ε

´

´

39

AIS Design

Back

• Problem description • Deciding the immune principles used for problem solving • Engineering the AIS – – – –

Defining the types of immune components used Defining the representation for the elements of the AIS Applying immune principle to problem solving The meta-dynamics of an AIS

• Reverse mapping from AIS to the real problem 2005-12-13

Y. Tan---Artificial Immune Sys.

40

Back

Case Studies of AIS

• Malicious Executables Detection ---

From Z.H. Guo, Z.K. Liu, and Y. Tan, An NNbased Malicious Executables Detection Algorithm based on Immune Principles, F.Yin, J.Wang, C. Guo (Eds.): ISNN 2004, Springer, Lecture Notes in Computer Science 3174, pp. 675-680, 2004. (http://dblp.uni-trier.de)

• Film Recommender --- From Dr. Dr Uwe Aickelin (http://www.aickelin.com), University of Nottingham, U.K. 2004 2005-12-13

Y. Tan---Artificial Immune Sys.

41

New!

Immuneocomputing -- IC

By Tarakanov, A. 2001. Aims of • A proper mathematical framework; • A new kind of computing; • A new kind of hardware. New concepts of formal protein (FP) ------- vs. neuron formal immune networks (FIN)------- vs. NN Refer to 2005-12-13

•A.O. Tarakanov, V.A. skormin, and S.P. Sokolova, Immunocomputing: Principles and Applications, Springer, 2003. Y. Tan---Artificial Immune Sys.

42

Problems of Traditional Self/Non-self View • • • • • •

No reaction to foreign bacteria in gut (friendly bacteria…). No reaction to food / air / etc. The human body changes over its life. Auto-immune diseases. How do we produce antibodies that react against antigens and yet avoid self? Is it necessary to attack all non-self or a specific self?

2005-12-13

Y. Tan---Artificial Immune Sys.

43

New!

The Danger Theory

• In the danger model, the idea is to recognise ‘danger’ rather than non self. • The screening is accomplished post production through an external ‘danger’ signal. Thus the production of autoreactive antibodies (which react to self) is allowed. • If an (e.g. autoreactive) antibody matches a stimulus in the absence of danger, it is removed. Thus harmless antigens are tolerated, and changing self accommodated. Matzinger (2002). The Danger Model: A renewed sense of self , Science 296: 301-304. 2005-12-13

Y. Tan---Artificial Immune Sys.

44

Danger Theory (con’t) • Danger Theory – Not self/non-self but Danger/Non-Danger – Immune response is initiated in the tissues. Danger Zone. – This makes it context dependant • Matzinger (2002) The Danger Model: A renewed sense of self Science 296: 301-304 • Aickelin & Cayzer (2002) The Danger Theory and Its Application to Artificial Immune Systems, Proc. International Conference on AIS (ICARIS 2002) 2005-12-13

Y. Tan---Artificial Immune Sys.

45

Danger Zone Stimulation Danger Zone

Match, but too far No match away

Antibodies Antigens Cells Damaged Cell Danger Signal 2005-12-13

Y. Tan---Artificial Immune Sys.

46

Towards a ‘dangerous’ IDS “The danger theory suggests that the immune system reacts to threats based on the correlation of various (danger) signals, providing a method of ‘grounding’ the immune response, i.e. linking it directly to the attacker.” Aickelin U, Bentley P, Cayzer S, Kim J and McLeod J (2003): 'Danger Theory: The Link between AIS and IDS?', Proceedings ICARIS-2003, 2nd International Conference on Artificial Immune Systems, LNCS 2787, pp 147-155 2005-12-13

Y. Tan---Artificial Immune Sys.

47

Other ways of using danger Danger = Crime, Antigen = Suspect or...

Danger = Context ? It could also be useful for data mining, where the ‘danger’ signal is a proxy measure of interest ‘Danger Zone’ can be spatial or temporal Andrew Secker, Alex Freitas, and Jon Timmis (2005) “Towards a danger theory inspired artificial immune system for web mining” in A Scime, editor, Web Mining: applications and techniques, pages 145-168 (Idea Group) 2005-12-13

Y. Tan---Artificial Immune Sys.

48

Back

Some Recent Applications of Danger Theory

• Anjum Iqbal, Mohd Aizaini Maarof, “Danger Theory and Intelligent Data Processing,” International Journal of Information Technology, Vol.1, No.1, 2004. • Andrew Secker, Alex A. Freitas, and Jon Timmis, “A Danger Thory Inspired Approach to Web Mining,” Computing Lab. University of Kent, Canterbury, Kent, UK.2005 • So on. 2005-12-13

Y. Tan---Artificial Immune Sys.

49

The Future • • •

More formal approach required? Wide possible application domains. What makes the immune system unique? More work with immunologists:



– Danger theory. – Idiotypic Networks. – Self-Assertion. 2005-12-13

Y. Tan---Artificial Immune Sys.

50

Reference for further reading Books • Artificial Immune Systems and Their Applications by Dipankar Dasgupta (Editor) Springer Verlag, January 1999. • L.N. de Castro and J. Timmis, Artificial Immune Systems: A New Computational Intelligence Approach, Springer, 2002. • A.O. Tarakanov, V.A. skormin, and S.P. Sokolova, Immunocomputing: Principles and Applications, Springer, 2003. Related academic papers • J. Timmis, P.Bentley, and Emma Hart (Eds.): Artificial Immune Systems, Proceedings of Second International Conference, ICARIS 2003, Edinburgh, UK, September 2003. LNCS 2787, Springer. 2005-12-13

Y. Tan---Artificial Immune Sys.

51

New Events: • Special Session on Artificial Immune Systems at the Congress on Evolutionary Computation (CEC), December 8-12, 2003, Canberra, Australia. • Special Session on Immunity-Based Systems at Seventh International Conference on Knowledge-Based Intelligent Information & Engineering Systems (KES), September 3-5, 2003, University of Oxford, UK. • Second International Conference on Artificial Immune Systems (ICARIS), September 1-3, 2003, Napier University, Edinburgh, UK. • Tutorial on Artificial Immune Systems at 1st Multidisciplinary International Conference on Scheduling: Theory and Applications (MISTA), 12 August 2003, The University of Nottingham, UK. • Tutorial on Immunological Computation at International Joint Conference on Artificial Intelligence (IJCAI), August 10, 2003, Acapulco, Mexico. • Special Track on Artificial Immune Systems at Genetic and Evolutionary Computation Conference (GECCO), Chicago, USA, July 12-16, 2003 2005-12-13

Y. Tan---Artificial Immune Sys.

52

AIS Resources • Artificial Immune Systems and Their Applications by D Dasgupta (Editor), Springer Verlag, 1999. • Artificial Immune Systems: A New Computational Intelligence Approach by L de Castro, J Timmis, Springer Verlag, 2002. • Immunocomputing: Principles and Applications by A Tarakanov et al, Springer Verlag, 2003. • Third International Conference on Artificial Immune Systems (ICARIS), September 13-16, 2004, University of Catania, Italy. • 4th International Conference on Artificial Immune Systems(ICARIS), 14th-17th August, 2005 in Banff, Alberta, Canada

2005-12-13

Y. Tan---Artificial Immune Sys.

53

First Page

That’s all

2005-12-13

Y. Tan---Artificial Immune Sys.

54

Case Study 1:

Malicious Executables Detection based on Artificial Immune Principles* From Z.H. Guo, Z.K. Liu, and Y. Tan, An NN-based Malicious Executables Detection Algorithm based on Immune Principles, F.Yin, J.Wang, C. Guo (Eds.): ISNN 2004, Springer Lecture Notes on Computer Science 3174, pp. 675-680, 2004. (http://dblp.uni-

trier.de)

* This work was supported by Natural Science Foundation of China with Grant No. 60273100. 2005-12-13

Y. Tan---Artificial Immune Sys.

55

Outline • • • •

Definition of Terms Goal and Motivation Previous Research works Immune Principle for Malicious Executable Detection • Malicious Executable Detection Algorithm • Experiments and Discussion • Concluding Remarks 2005-12-13

Y. Tan---Artificial Immune Sys.

56

Back

Definition of Terms • Malicious Executable is generally defined as a program that has some malicious functions, such as compromising a system’s security, damaging a system or obtaining sensitive information without the permission of users. It includes virus, trojan horse, worm etc. • Benign Executable is a normal program without any malicious function.

2005-12-13

Y. Tan---Artificial Immune Sys.

57

tens of thousands of new viruses / year Appear!

But: Current antivirus systems

attempt to detect these new malicious programs with heuristics by hand (costly and ineffective)

Dos/Win32 viruses

Computers / Information Systems

Trojan horses

Worms

eMail attached viruses

Current Task: Devise new methods for detecting new ME

Malicious executables 2005-12-13

Y. Tan---Artificial Immune Sys.

58

Definition of Symbols and Structures

Back

B: binary code alphabet, B={0,1}. Seq(s,k,l): short sequence cutting operation. Supposing s is binary sequence, and s=b(0)b(1)…b(n-1), b(i)∈B, then Seq(s,k,l)=b(k)b(k+1)…b(k+l-1). E(k): executable set, k∈{m,b}, m denotes malicious executable, b benign executable. E: whole set of executables, i.e., E= E(m)∪E(b). e(fj,n): executable as binary sequence of length n, and fj is executable identifier. ld: detector code length. lstep: step size of detector generation. dl: detector, dl = Seq(s,k,l). Dl: set of detector with code length l, i.e., Dl ={ dl (0), dl (1),…, dl (nd-1)}, |Dl|= nd.

2005-12-13

Y. Tan---Artificial Immune Sys.

59

Back

Goal and Motivation

• Aiming at developing an automatic detection approach of new malicious executables. • Aiming at trying to use artificial immune system (AIS) and artificial neural networks (ANN), to detect malicious executable with a high Detection Rate (DR) with low False Positive Rate (FPR) over others. 2005-12-13

Y. Tan---Artificial Immune Sys.

60

Back

Previous Related Works • Signature-based Methods • Expert Knowledge-based Methods • Machine Learning Methods

2005-12-13

Y. Tan---Artificial Immune Sys.

61

Back

Signature-based Methods It creates a unique tag for each malicious program so that future examples of it can be correctly classified with a small error rate. And relies on signatures of known malicious executable to generate detection models. Drawbacks: • Can not detect unknown and mutated viruses. • As increase of the number and type of viruses, its detection speed become slow dramatically. At the same time, the analysis of the signatures of viruses become very difficult, in particular, for the encrypted signatures. (refer to IBM Anti-virus Group’s report: R.W. Lo, K.N. Levitt, and R.A. Olsson. MCF: a Malicious Code Filter. Computers & Security, 14(6):541–566., 1995.) 2005-12-13

Y. Tan---Artificial Immune Sys.

62

Back

Expert Knowledge-based Methods Using the knowledge of a group of virus experts to construct heuristic classifiers for detection of unknown viruses.

Drawbacks: • Time-consuming analysis method. • Only discover some unknown viruses, but its false detection rate is very high. For detecting unknown virus based on ANN, IBM Anti-virus Group also proposes one method to detect Boot Sector viruses only. (refer to W. Arnold and G. Tesauro. Automatically Generated Win32 Heuristic Virus Detection. Proceedings of the 2000 International Virus Bulletin Conference, 2000.) 2005-12-13

Y. Tan---Artificial Immune Sys.

63

Back

Machine Learning Methods

• M.G. Schultz developed a framework that used data mining algorithms, i.e., Multi-Naïve Bayes method, to train multiple classifiers on a set of malicious and benign executables to detect new examples (unknown ME). (refer to M.G. Schultz.,E. Eskin and E. Zadok . Data Mining Methods for Detection of New Malicious Executables. IEEE Symposium on Security and Privacy, May 2001.)

2005-12-13

Y. Tan---Artificial Immune Sys.

64

Biologically-motivated Information Processing Systems • Brain-nervous systems – Neural Networks (NN) • Genetic systems – Genetic Algorithms(GA) • Immune systems – Artificial Immune Systems(AIS) or immunological computation. NN and GA have extensively studied with wide applications but AIS has relative few applications

2005-12-13

Y. Tan---Artificial Immune Sys.

65

Natural prototypes vs. their models Natural Biological prototype level Natural language Left hemisphere of brain Brain nervous Cells net Biological cells Cells Molecules of proteins Genetic code 2005-12-13

Molecular Molecular

Computing model Formal logic Formal linguistic Artificial Neural networks (ANN) Cellular automata (CA) Artificial immune systems (AIS) Genetic Algorithms (GA)

Y. Tan---Artificial Immune Sys.

66

Comparison of Three Algorithms GA (Optimisation)

NN (Classification)

AIS

Components

Chromosome Strings

Artificial Neurons

Attribute Strings

Location of Components

Dynamic

Pre-Defined

Dynamic

Structure

Discrete Components

Networked Components

Discrete components / Networked Components

Knowledge Storage

Chromosome Strings

Connection Strengths

Component Concentration / Network Connections

Dynamics

Evolution

Learning

Evolution / Learning

Meta-Dynamics

Recruitment / Elimination of Components

Construction / Pruning of Connections

Recruitment / Elimination of Components

Interaction between Components

Crossover

Network Connections

Recognition / Network Connections

Interaction with Environment

Fitness Function

External Stimuli

Recognition / Objective Function

2005-12-13

Y. Tan---Artificial Immune Sys.

67

Back

Immune Principles for Malicious Executable Detection • Non-self Detection Principle • Anomaly Detection Based on Thickness • The Diversity of Detector Representation vs. Anomaly Detection Hole 2005-12-13

Y. Tan---Artificial Immune Sys.

68

Non-self Detection Principle • For natural immune system, all cells of body are categorized as two types of self and non-self. The immune process is to detect non-self from cells. • To realize the non-self detection, the maturation process of lymphocytes T cell undergoes two selection stages of Positive Selection and Negative Selection since antigenic encounters may result in cell death. Some computer scientists inspired by these two stages had proposed some algorithms used to detect anomaly information. Here, we will use the Positive Selection Algorithm (PSA) to perform the non-self detection for recognizing the malicious executable. 2005-12-13

Y. Tan---Artificial Immune Sys.

69

Back

Non-self Detection by PSA Detector Set Dl

Short sequence to be detected (Its length is l)

N Match ?



Y self

non-self

Process of anomaly detection with PSA

2005-12-13

Y. Tan---Artificial Immune Sys.

70

Back

Anomaly Detection Based on Thickness • Anomaly recognition process is one process that immune cells detect antigens and are activated. • The activated threshold of immune cells is decided by the thickness of immune cells matching antigens.

2005-12-13

Y. Tan---Artificial Immune Sys.

71

The Diversity of Detector Representation vs. Anomaly Detection Hole • The main difficulty of anomaly detection is utmost decreasing the anomaly detection hole. The natural immune system resolves this problem well by use of the diversity of MHC (Major Histocompatibility Complex) cell representations, which decides the diversity of anti-body touched in surface of T cells. This property is very useful in increasing the power of detecting mutated antigens, and decreasing the anomaly detection hole. • According to the principle, we can use the diversity of detector representation to decrease the anomaly detection hole. As was illustrated by following schematic drawings. 2005-12-13

Y. Tan---Artificial Immune Sys.

72

Schematic diagram of abnormal detection holes (cont’) Self Space Abnormal detection holes Nonself Space

Detectors

2005-12-13

Y. Tan---Artificial Immune Sys.

73

back

Reduction of abnormal detection holes by use of the diversity of detector representations

Detector Representation 1

Detector Representation 2

Detector Representation 3

Combination of detectors 2005-12-13

Y. Tan---Artificial Immune Sys.

74

Malicious Executable Detection Algorithm (MEDA) MEDA based on AIS includes three parts, • Detector generation, • Anomaly information extraction , • and Classification. 2005-12-13

Y. Tan---Artificial Immune Sys.

75

Back

Flow Chart of Malicious Executable Detection Algorithm (MEDA) Gene (…01101001…)

Generating detector set

Extracting property

Update Gene (…10101101…)

2005-12-13

anomaly

Executable to be detected (…00111101…)

Y. Tan---Artificial Immune Sys.

MEDA

Classifier

Output

76

Generation of Detector Set Detector generation algorithm: • Begin initialize lstep、ld、k=0 • Do cutting e(fk,n) from Eg(b) • i=0; • While i ltrain

Searching P(Fi/C)

t

Depend Computing lf float on P(Fi/C) Joint Probs. multiplicaP (C )∏ P ( F / C ) tions

Y. Tan---Artificial Immune Sys.

0.4Gb 1Gb

n

i =1

2005-12-13

Store Space

i

95

Back

Remarks

• For short binary sequence and single detector set for the detection of malicious executables, the performance of D24 is the best, giving out DR 80.6% with FPR 3%. • For long code length of detector and multidetector set, our method obtains the best performance of DR 97.46% with FPR 2%, over current methods. • This result verifies – diversity of detector representation can decrease anomaly detection holes. – “non-self” thickness detection. 2005-12-13

Y. Tan---Artificial Immune Sys.

Back

96

Case Study 2:

Film Recommender From Dr. Dr Uwe Aickelin (http://www.aickelin.com) University of Nottingham, U.K., z

Prediction: – What rating would I give a specific film?

z

Recommendation: – Give me a ‘top 10’ list of films I might like.

2005-12-13

Y. Tan---Artificial Immune Sys.

97

Film Recommender (con’t 1) • • • • • • •

EachMovie database (70k users). User Profile: set of tuples {movie, rating}. Me: My user profile. Neighbour: User profile of others. Similarity metric: Correlation score. Neighbourhood: Group of similar users. Recommendations: From neighbourhood.

2005-12-13

Y. Tan---Artificial Immune Sys.

98

Film Recommender (con’t 2) Antigen

ss

ul

re

Antibody – Antigen Binding Antibody – Antibody Binding

pp

at io

User Profile: set of tuples {movie, rating} Me: My user profile. Neighbour: User profile of others. Affinity metric: Correlation score. Su

n

Antibody

• • • •

n

im

St

io

• Neighbourhood: Group of similar users.

Group of antibodies similar to antigen and dissimilar to other antibodies

• Recommendations: From neighbourhood Weighted Score based on Similarities.

2005-12-13

Y. Tan---Artificial Immune Sys.

99

Film Recommender (con’t 3) • Start with empty AIS. • Encode target user as an antigen Ag. • WHILE (AIS not full) && (More Users): – Add next user as antibody Ab. – IF (AIS at full size) Iterate AIS.

• Generate recommendations from AIS.

2005-12-13

Y. Tan---Artificial Immune Sys.

100

Film Recommender (con’t 4) Suppose we have 5 users and 4 movies: – – – – –

u1={(m1,v11),(m2,v12),(m3,v13)}. u2={(m1,v21),(m2,v22),(m3,v23),(m4,v24)}. u3={(m1,v31),(m2,v32),(m4,v34)}. u4={(m1,v41),(m4,v44)}. u5={(m1,v51),(m2,v52),(m3,v53), (m4,v54)}.

• We do not have users’ votes for every film. • We want to predict the vote of user u4 on movie m3. 2005-12-13

Y. Tan---Artificial Immune Sys.

101

Algorithm walkthrough (1) AIS

Start with empty AIS: DATABASE u1, u2, u3, u4, u5

User for whom to predict becomes antigen: DATABASE

u4

u1, u2, u3, u5 2005-12-13

Y. Tan---Artificial Immune Sys.

AIS

Ag 102

Algorithm walkthrough (2) Add antibodies until AIS is full… AIS DATABASE

u1

Ag

u2, u3, u5

Ab1 AIS

DATABASE

u2,u3

Ag Ab1

u4

Ab2 Ab3

2005-12-13

Y. Tan---Artificial Immune Sys.

103

Algorithm walkthrough (3) • Table of Correlation between Ab and Ag: Ab3 Ab1 Ag Ab2

2005-12-13

– MS14, MS24, MS34.

• Table of Correlation between Antibodies: – MS12 = CorrelCoef(Ab1, Ab2) – MS13 = CorrelCoef(Ab1, Ab3) – MS23 = CorrelCoef(Ab2, Ab3) Y. Tan---Artificial Immune Sys.

104

Algorithm walkthrough (4) • Calculate Concentration of each Ab: – Interaction with Ag (Stimulation). – Interaction with other Ab (Suppression). AIS

AIS Ag

Ag Ab1

Ab2 Ab3

2005-12-13

Ab1

Ab2

Ab2 Ab 2 Ab1 Ab2 Ab2

Y. Tan---Artificial Immune Sys.

105

Algorithm walkthrough (5) • Generate Recommendation based on Antibody Concentration. AIS Ag Ab1

Ab2

Ab2 Ab 2 Ab1 Ab2 Ab2

2005-12-13

Recommendation for user u4 on movie m3 will be highly based on vote on m3 of user u2

Y. Tan---Artificial Immune Sys.

106

Film Recommender Results • Tested against standard method (Pearson k-nearest neighbours). • Prediction: – Results of same quality.

• Recommendation: – 4 out of 5 films correct (AIS). – 3 out of 5 films correct (Pearson). Back 2005-12-13

Y. Tan---Artificial Immune Sys.

107