A distributed security management system based on mobile agents

Diplomarbeit A distributed security management system based on mobile agents ausgeführt am Institut für Informationssysteme Abteilung für Verteilte ...
Author: Regina Sutton
2 downloads 0 Views 1MB Size
Diplomarbeit

A distributed security management system based on mobile agents

ausgeführt am Institut für Informationssysteme Abteilung für Verteilte Systeme der Technischen Universität Wien

unter Anleitung von o.Univ.-Prof. Dipl.-Ing. Dr. techn. Mehdi Jazayeri und Univ.-Ass. Dipl.-Ing. Clemens Kerer als betreuendem Assistent

von Admir Kulin Neilreichgasse 72/9 1100 Wien Matr.Nr. 9427427

Wien, im April 2001

_______________________

Kurzfassung Ein Großteil der Sicherheitsanforderungen für einzelne Computer können von existierenden Sicherheitssoftwarewerkzeugen erfüllt werden. Trotzdem gibt es praktisch keine Softwarewerkzeuge die ermöglichen, dass alle Sicherheitsvorschriften (Security Policy) in einer heterogenen und verteilten Umgebung durchgeführt werden können. Solche Werkzeuge sollen versichern, dass die Erfüllung von allen Sicherheitsvorschriften im ganzen System konsistent ist, und soll beständig mögliche Sicherheitslücken und Widersprüche im ganzen System überwachen, speziell wenn Ressourcen hinzugefügt oder entfernt werden. Deshalb wurde dieses Mobile Agenten System entworfen, dass die Durchführung von Sicherheitsvorschriften überwacht, und die Eindringlingserkennung (Intrusion Detection) in einem heterogenem Computernetz durchführt. Detaillierte Beschreibung der wichtigsten Teilen, der benutzte Technologie, die Architektur und das Design des kompletten Systems ist in dieser Diplomarbeit gegeben. Meine ganze Arbeit ist includiert und sponsoriert von der Europäische Union in dem SPARTA Projekt (http://www.infosys.tuwien.ac.at/sparta). Dieses System benutzt mobile Agenten, eine vielversprechende Technologie für die Durchführung von Softwarewerkzeugen, die mit verteilten Informationen operieren, um die Sicherheitsaufgaben zu erfüllen. Verschiedene Typen von mobilen Agenten führen verschiedene Arten von Sicherheitsmanagement, Eindringlingserkennung und Reaktionsaufgaben durch, und zusätzlich zur Mobilität ist die Systemintelligenz in die Agenten eingebettet, welche die Informationen von verschiedenen Maschinen analysieren. Die mobile Agenten sind ein mächtiges Werkzeug, aber ihr praktischer Einsatz war bis jetzt von Sicherheitserwägungen behindert. Deshalb, basiert der Schutz von Maschinen, Agenten, und Benutzern in diesem System auf Kryptographischen Mechanismen und verschiedenen Vertrauensmodelen die öffentliche Schlüssel (X.509) und Attributzertifikate benutzen. Sowohl der Agententransfer als auch die Kommunikation zwischen der Benutzeroberfläche und dem System ist auf diese Weise gesichert. Jeder Agent durchwandert das Netz als ein mobiler Sensor und führt spezifische Aufgaben oder Tests durch. Wenn die Tests auf ein mögliches Eindringen hinweisen und das Vermutungsniveau genug gestiegen ist, wird der tatsächlicher Alarm gegeben. Als ein proof-of-concept, der Web Agent, ein Java-basierter und Aufgaben-spezifischer Agent ist implementiert und detailliert beschrieben worden, wo der Agent vorherrschend nach Eindringlingen von "nicht-vertrauenswürdigen" IP Adressen sucht. Dieses WebBeispiel ist derart programmiert, dass alle Webserver in einem Netz und deren Protokolldataien auf verschiedenen Rechnern überprüft werden.

2

Abstract Most of the security requirements for a single computer can be satisfied by using existing security software tools. However, there are practically no software tools enabling security policy management in a heterogeneous distributed environment. Such tools should also ensure that the security policy is consistent in the whole system, and should constantly supervise for possible holes and inconsistencies, especially when resources are added to, or removed from the system. Thus, the the mobile agent system designed here, which monitors the implementation of security policies, and performs intrusion detection in a heterogeneous computing network. Detailed description of the major parts, the technology that is used, the architecture and design of the complete system is provided in this thesis. My work is included and sponsored by the European Union in the SPARTA project (http://www.infosys.tuwien.ac.at/sparta). This system uses mobile agents, a promising technology for implementing the tools that operate with distributed data sources to perform its security tasks. Different types of mobile agents will fulfill different types of security management, intrusion/misuse detection and response tasks, and in addition to the mobility, the intelligence of the system is embedded in the agents that analyze data from different hosts. While mobile agents are a powerful tool, their practical implementation has been hindered by security considerations. Thus, the protection of hosts, agents and users in this system is based on cryptographic mechanisms and different trust models using public key certificates (X.509) and attribute certificates. The transfer of agents as well as the communication between the Graphical User Interfaces and the system is secured on this way. Each agent roams the network like a mobile sensor and performs specific tasks or tests. When the tests indicate the possibility of an intrusion and the suspicion level has been raised high enough the actual alarm is given. As a proof-of-concept, the Web Agent, a new Java-based task-specific mobile agent is given and described in detail, where the agent is predominantly looking for intruders from "not-trusted" IP addresses. This use case is implemented to check all webservers in a network and their logfiles on the various computers.

3

Acknowledgements The experience from research in the SPARTA European Union project (IST - 12637) at the Distributed System Group of the Information Systems Institute at the Technical University of Vienna was the most valuable source for writing this thesis. I owe special thanks to my advisors Prof. Mehdi Jazayeri and DI Clemens Kerer, who supported me and gave me worthwhile advice. I also want to thank all my colleagues at the Distributed System Group and whole SPARTA team for their support and fruitful discussions during the elaboration of this work. Most of all I would like to thank my family. My parents provided a perfect environment for my studies and supported me throughout all the years. Finally, I want to thank all my friends for their patience and mental support during the stressful but beautiful time of my studies.

4

Contents 1

INTRODUCTION......................................................................................... 9

1.1

Problem Description .................................................................................................................... 9

1.2

Goals ........................................................................................................................................... 10

1.3

Structure of the Thesis .............................................................................................................. 11

2

SECURITY PRIMER ................................................................................. 12

2.1 Intrusion Detection (ID) ............................................................................................................ 13 2.1.1 Intrusion Detection Systems (IDS)..................................................................................... 13 2.1.2 Attacks ............................................................................................................................... 14 2.1.3 Intrusion Response Systems (IRSs).................................................................................... 15 2.2 Mobile Agents (MAs)................................................................................................................. 17 2.2.1 MA Advantages.................................................................................................................. 20 2.2.2 MA Disadvantages ............................................................................................................. 22 2.3 Relationship between Mobile Agents and Intrusion Detection.............................................. 23 2.3.1 Selection of mobile agent platform .................................................................................... 26 2.3.2 Mobile Agent and Intrusion Detection Conclusions........................................................... 27

3 3.1

FUNCTIONAL SYSTEM COMPONENTS................................................. 29 System Introduction .................................................................................................................. 31

3.2 User Side Components............................................................................................................... 33 3.2.1 Security Policy Elements.................................................................................................... 34 3.3 Mobile Agent Platform: Gypsy................................................................................................. 38 3.3.1 Gypsy Mobility .................................................................................................................. 40 3.3.2 Gypsy Security ................................................................................................................... 40 3.4 Secure Infrastructure ................................................................................................................ 41 3.4.1 LRA Description ................................................................................................................ 43 3.4.2 CA Description................................................................................................................... 43

4

SYSTEM ARCHITECTURE ...................................................................... 44

4.1 Agents (A)................................................................................................................................... 51 4.1.1 Design of SG Agents.......................................................................................................... 52 4.2 Agent Server (AS) ...................................................................................................................... 54 4.2.1 Agent Security Manager (ASM)......................................................................................... 55 4.2.2 Communicator .................................................................................................................... 56 4.2.3 Place................................................................................................................................... 57

5

4.3

Home Server (HS)...................................................................................................................... 59

4.4

Secure Information Space (SIS) ............................................................................................... 60

4.5 Security Model ........................................................................................................................... 63 4.5.1 The Structure of Permissions ............................................................................................. 65 4.5.2 System security protocols................................................................................................... 68 4.6

Data Manager (DM) .................................................................................................................. 70

4.7

Data Analyzer Module (DAM) ................................................................................................. 72

4.8 Security Policy Editor (SPE)..................................................................................................... 73 4.8.1 Rules Database (RDB) ....................................................................................................... 76 4.9 Overall Design of SG Agent Platform...................................................................................... 77 4.9.1 Security of SG Agent Platform........................................................................................... 81 4.9.2 Secure Logging on to the System ....................................................................................... 85 4.9.3 Security Policy Language (SPL) ........................................................................................ 88 4.9.4 Generation and Execution of Agents.................................................................................. 90 4.10

5

Graphical User Interface (GUI)........................................................................................... 91

IMPLEMENTATION OF THE WEB AGENT ............................................. 96

5.1

How to use a Web Agent? ......................................................................................................... 96

5.2

Description of Web Agent implementation.............................................................................. 99

6

CONCLUSIONS...................................................................................... 103

BIBLIOGRAPHY ............................................................................................ 105

6

List of Figures 2.1: Mobile code design paradigms................................................................................. 18 2.2: A simple view of the structure models of mobile agent systems ............................. 19 2.3: A mobile agent's life cycle ....................................................................................... 19 3.1: Simplified system functionality ............................................................................... 29 3.2: System Use Case – Surveillance .............................................................................. 32 3.3: System Use Case - Intrusion Detection.................................................................... 33 3.4: The Security Policy Elements .................................................................................. 36 3.5: Global secure infrastructure ..................................................................................... 42 4.1: System Main Components Interaction ..................................................................... 44 4.2: Agent's Hop.............................................................................................................. 46 4.3: System with two agents and an intruder................................................................... 47 4.4: Overall System Architecture .................................................................................... 50 4.5: Main types of agents ................................................................................................ 52 4.6: SG Agents Design as UML Class Diagram ............................................................. 54 4.7: Agent Server (AS ..................................................................................................... 55 4.8: Communicators Design as UML Class Diagram ..................................................... 57 4.9: Places Design as UML Class Diagram .................................................................... 59 4.10: Home Server (HS).................................................................................................. 60 4.11: Overview of the system security as UML Class Diagram...................................... 64 4.12: Calling mechanisms related to permissions ........................................................... 66 4.13: The general syntax for agent permissions .............................................................. 67 4.14: An example of the configuration file with agent permissions ............................... 68 4.15: Design of Data Manager as UML Class Diagram.................................................. 71 4.16: SPE Subsystems as UML Use Case Diagram ........................................................ 73 4.17: Data Analysis-SPE Subsystem as UML Class Diagram ........................................ 74 4.18: Data Manager-SPE Subsystem as UML Class Diagram........................................ 75 4.19: Rules Editor-SPE Subsystem as UML Class Diagram .......................................... 75 4.20: Example Organization of Rules Database (RDB).................................................. 76 4.21: Design of SG Agent Platform as UML Class Diagram.......................................... 78 4.22: Agent reception as UML Sequence Diagram......................................................... 79 4.23: Agent's function call as UML Sequence Diagram ................................................. 80 4.24: Agent sending as UML Sequence Diagram ........................................................... 81 4.25: All SIS Classes as UML Class Diagram ................................................................ 82

7

4.26: MainGUI Page........................................................................................................ 92 4.27: AdvancedGUI Page - Rules Editor ........................................................................ 93 4.28: AdvancedGUI Page – New Resource .................................................................... 95 5.1: Launching Web Agent ............................................................................................. 97 5.2: Input of Web placename .......................................................................................... 97 5.3: Input of Web Security file location .......................................................................... 98 5.4: Wrong Web Security file or location message......................................................... 98 5.5: Web Agent results .................................................................................................... 99

8

1 Introduction Networking and the Internet are essential parts of today's every day life. The local computer networks from almost all companies, universities, etc. are connected in a powerful and unimaginable famous distributed system called the Internet. Of course, such a working network is set out different kinds of misuse and attacks, e.g., the attacks against Yahoo and Microsoft in 2000, and lot of other attacks which are unknown, until now. From year to year, from hour to hour the Internet is being bigger and more complex and , as a matter of course, attacks are changing accordingly. In distributed systems with lots of different certified entities manual detection of misuse is both too slow and, since the systems are complex, there may not even be persons with the necessary knowledge to detect misuse unless it is done in a very obvious fashion. Consequently, the detection of misuse must be automated or outsourced to somebody capable of detecting it. However two problems arise: New attacks appear more rapidly than any system can be updated to deal with them. Having an expert in intrusion detection permanently stationed at a location is very expensive and unaffordable for most locations in question. 

In this thesis I want to develop a prototype which solves these problems using Mobile Agents (MAs). In our case, MAs are computer programs which can automatically travel through the network and play the role of appropriate security software. It is self-evident that such a system of MAs must be at least as well protected as the system it protects. Thus MAs for monitoring the security of a system must be certificated. Consequently, a Certification Authority (CA) is used as a high security facility, which guarantees the identity of certified entities, and manages all certificates and keys. The overall goal of our mobile agent system is to monitor the implementation of security policies, identify security problems and perform intrusion detection in a flexible way.

1.1 Problem Description Scenario Imagine a company or university with 10, 100 or more computers. The local network of connected computers should be secure and protected from all attackers. So each company must have security rules, for example only three false attempts of remote login in some part of network during a given period of time are allowed. Of course, a network administrator (user) in a company wants to check all network activity, or with other words, all security policy rules in his network system. All this information can be found in log files which are kept at each server, either by the system's security infrastructure (e.g., telnet log files), or by our MA system. Each

9

security event is marked in these files. Rather than performing the search himself, the user launches his mobile agent to do the work for him and report the results when they are available. The agent is launched from the user’s PC which runs the graphical user interface. The agent may be specialized by the user for this particular task or it may take default values from the user’s predefined profile. Once the agent is launched, the user may even disconnect his PC, because each agent has a place where it can return and wait for the user to be connected. Depending on the notification strategy, the user is informed via email or SMS message that the agent has finished its task. Finally, the graphical user interface or favorite HTML browser interprets and displays the result to the user on his PC. Problem With the increasing complexity of distributed systems it has become extremely difficult to define a flexible security policy that can accommodate a constantly growing set of possible threats as well as a frequently changing network environment. The most important difficulties in security policy enforcement in an enterprise are the following: How to detect intrusion in an internal network? How to detect misuse of access and other permissions? How to know if employees, business partners or customers are always using correctly defined standards, protocols etc., for their interactions with the enterprise? 





There are practically no software tools enabling security policy management in a heterogeneous distributed environment, which ensure that the security policy is consistent in the whole system. Attacks from the outside are an increasing problem. The Internet makes it possible for everybody to attack everybody else at any time independent of geographical distance and prior relations between attacker and victim. The publicly available tools for intrusion on the Internet are superior to the existing tools for intrusion prevention and detection, which do not diminish a computerized organization ability to work seriously. For example, not connecting a computer to any network fully protects it, but also in many situations reduces its laborsaving ability to close to nothing. Attacks from the inside of organizations and misuse of privileges have a longer history and have traditionally been the main source of losses due to electronic crime. There is no particular reason to believe that this sort of attacks will either increase or decrease significantly in the future. The tools for preventing and deterring such attacks are however not sophisticated enough to deal with the immense increase in the flow of data, which comes with increased computer power and communication capabilities.

1.2 Goals In order to support a high level of security monitoring, a large amount of distributed data has to be collected and analyzed. It is a common situation that a security administrator is collecting this data randomly through a remote connection, and analyzes it visually. This is both ineffective and unreliable, because for example, a administrator can overlook some important data or is not available 24 hours a day.

10

Due to their well-known advantages for adaptable network monitoring [Rfc1757] and administration tasks, mobile agents are a promising technology for implementing the tools that operate with the distributed data sources. In addition to the mobility, intelligence will be embedded in a tool that periodically analyzes data from different hosts. Different types of mobile agents will fulfill different types of security management, intrusion/misuse detection and response tasks. Protection of hosts and agents will be implemented based on cryptographic mechanisms and different trust models using public key certificates (X.509) and attribute certificates. A certification infrastructure will also be an important part of the overall security management system design. The main goals are to define a method for specifying a distributed system’s security policy, to provide a detailed design level description of a distributed security management system based on mobile agents following the above description, and to develop a prototype of the system. 





1.3 Structure of the Thesis This thesis is structured as follows. Chapter 2 gives a basic introduction to intrusion detection, mobile agents and the security-related relationship between these two concepts. Chapter 3 provides a short system high-level overview of the three main system parts: the user side components, the mobile agent platform, and the security infrastructure. Chapter 4 presents the design rationale and structure of the system components. Chapter 5 describes in detail the Web Agent, a system's proof-of-concept and implementation issues. Chapter 6 summarizes the contribution of this thesis and outlines directions for future research in this field.

11

2 Security Primer A well-secured and trusted computer system requires intrusion and misuse prevention as well as high-speed anomaly detection mechanisms that allow the system administrator to reach and maintain a high security level. Due to the dynamic nature and evaluation of security threats, the security policy has to adapt with the same dynamics. Security tools are available for the simplest and most needed purposes in application security such as encryption tools, firewall tools, anti virus scanner, but some areas of intrusion detection are almost uncovered like distributed intrusion detection. System security encompasses all security aspects related to users and their data and the key factor is the access control mechanisms and the protection of the operating system. Basic goal of system security is to provide secure connections between two or more information sharing entities and to provide data protection. This is called communication security. Communication security may essentially be applied at three different layers of a system/network: the application software layer, network protocol stack layer and data link interface layer. Where to apply cryptographic tools depends on preferences as to needs and costs. Public-key infrastructure (PKI) [PKI00] is the combination of software, encryption technologies, and services which enable enterprises to secure their communications and business transactions over a network. PKIs integrate digital certificates, public-key cryptography, and certificate authorities into a total, enterprise-wide secure infrastructure. Digital certificates are electronic files that act like a kind of online passport. They are issued by a trusted third party, a certificate authority (CA), which verifies the identity of the certificate's holder. They are tamper-proof and cannot be forged. Digital certificates do two things: 1. They authenticate that their holders - people, web sites, and even network resources such as routers - are truly who or what they claim to be. 2. They protect data exchanged online from tampering. Public-key cryptography is a technique that enables information sharers to exchange information using a key-pair consisting of a private and public key. The private key is kept secret at all times while the public key is distributed for the use by others. This technique thereby provides a very safe way of securing information sharing as the holder of the private key needs not trust anybody but himself and the Certification Authority (CA) who guarantees that those certified are who they claim to be. Certificate Authorities (CAs) are the digital world's equivalent of passport offices. They issue digital certificates and validate the holder's identity and authority. CAs embed an individual's or an organization's public key along with other identifying information into a digital certificate and then cryptographically "sign" it as a tamper-proof seal, verifying the integrity of the data within it and validating its use.

12

The following security techniques presented are the most widely dispersed and used in Internet cryptography: Point-to-Point security incl. PGP [PGP00], Secure Socket Layer [SSL00][SSL00][SSL00], Virtual Private Network [VPN00]. Common to all effective solutions based on asymmetric public key techniques is that a PKI must be in place. Because of their features, Mobile Agents seem to be a promising answer to provide system and communication security. They can be developed in and launched from a central, competent location, for example a company specialized in automated intrusion detection. They can then be sent out to various sites, monitor the systems and alarm the users and Certification Authority (CA). In some cases they will also by other means be able to correct faulty actions, for example by installing relevant intrusion software. The two key concepts in this thesis are Intrusion Detection (ID) and Mobile Agents (MAs). In the following paragraphs, I will describe the basic concepts in these two areas, and then I will introduce security relevant relationships between them.

2.1 Intrusion Detection (ID) 2.1.1 Intrusion Detection Systems (IDS) An intrusion can be defined as any set of actions that attempt to compromise the integrity, confidentiality or availability of a resource. Intrusion prevention techniques such as user authentication (e.g., using passwords or biometrics), avoiding programming errors and information protection (e.g., encryption) have been used to protect computer systems as a first line of defense. Intrusion prevention alone is not sufficient because as systems become more complex, there are often exploitable weaknesses in the systems due to design and programming errors. Intrusion detection is therefore needed as another wall to protect computer systems. The elements central to intrusion detection are: resources to be protected in a target system (user accounts, file systems, system kernels, etc.), models that characterize the "normal'' or "legitimate'' behavior of these resources, techniques that compare the actual system activities with the established models, and identify those that are "abnormal'' or "intrusive''. Many researchers have proposed and implemented different models, which define different measures of system behavior, with an ad hoc presumption that normalcy and anomaly (or illegitimacy) will be accurately manifested in the chosen set of system features that are modeled and measured [Balasubramaniyan98]. Intrusion detection techniques can be categorized into misuse detection and anomaly detection. Misuse detection uses patterns of well-known attacks or weak spots of the system to identify intrusions. Anomaly detection tries to determine whether deviation from the established normal usage patterns can be flagged as intrusions. Misuse detection systems, for example [Kumer95] and [Ilgun95], encode and match the sequence of "signature actions'' (e.g., change the ownership of a file) of known intrusion scenarios. The main shortcomings of such systems are: known intrusion patterns have to

13

be hand-coded into the system; they are unable to detect any future (unknown) intrusions that have no matched patterns stored in the system. Anomaly detection (sub) systems, such as IDES [Lunt92], establish normal usage patterns (profiles) using statistical measures on system features, for example, the CPU and I/O activities by a particular user or program. The main difficulties with these systems are that intuition and experience is relied upon in selecting the system features which can vary greatly among different computing environments; some intrusions can only be detected by studying the sequential interrelation between events because each event alone may fit the profiles. The approach in the recent research projects such as EMERALD and NIDES. [Emerald97] is to collect or receive data from other sources, like log files or messages from a component. This approach has a drawback that it makes intrusion detection dependent on the other component or other product.

2.1.2 Attacks Where do attacks occur? The answer to this question is that attacks occur anywhere where people have physical or network access to computer resources. Research institutions, companies, military institutions, and universities are all susceptible to a variety of threats ranging from a teenage hacker to disgruntled employees out to cause real damage. It is important to realize that attacks to networked computer systems come in a number of forms like installing backdoors, Trojan horses, denial of service attacks, etc. For the purpose of this thesis, I form two simple attack classes: Outsider attacks. This form of attack is launched by an unauthorized computer user. The attacker will use system vulnerabilities or misconfigurations, human engineering techniques, stolen or broken passwords to gain access to computers. The intruder may than engage in a wide variety of malicious activities. 

The majority of security problem analysis in this work will be devoted to outside attacks via the Internet. These attacks fall into four categories: 1. Attacks exploiting bugs in existing software 2. Attacks exploiting weaknesses in protocols 3. Attacks based on malicious code such as Trojan horses 4. Attacks due to inadequate access control 

Insider attacks. In this case, an intruder already has legitimate access to a computer system, but utilizes any of the previously mentioned techniques to gain additional privileges and/or to misuse or damage data the intruder may have legitimate access to. While such attacks receive less attention, they can be more pernicious and insidious than outsider attacks due to the information and system privileges available to legitimate users.

Unfortunately, very little is known about the nature of insider attacks and misuse of privileges from inside. However, general statistics of varying reliability have been 14

published and common to all of them is that the number of inside attacks used to be more frequent than outside attacks [Lee99]. There is no doubt that this has changed. Protection against insider attacks is normally provided by giving users only restricted access to data and by logging events (in order to be able to catch attackers from the inside after they have committed their crimes). Some users, mainly system administrators, usually have the means for bypassing this protection. Further, the traditional protection mechanisms like log files or similar mechanisms have their origin in the early days of computing and today have their limitations: As the flow of data gets greater, also log files have a tendency to become extremely long if they have to log events which happened during a longer period in reasonable detail. The structure on the job-market is becoming ever more complex and the amount of information available is growing exponentially. Thus finding out who needs to be able to read and alter which information in order to carry out her duties is an increasingly complex task, which normally can not be managed. 



Both classes of attack's may occur over a network connection or on-site. One special and particularly relevant form of attack that may be launched by either an insider or an outsider is the denial of service attack where the intention is to dramatically decrease the availability of computing resources. Intruders may be assisted in their work by computer programs that automate an attack. This includes the launching of worms or the implementation of Trojan horse programs. Additionally, some attacks may take the form of coordinated multistep exploitations using parallel sessions in which the distribution of steps between sessions is designed to obscure the unified nature of the attack or to allow the attack to proceed more quickly. To detect such coordinated activity, a system must correlate evidence from multiple sources. The main difference between intrusion detection systems (IDS) in general and intrusion response systems (IRS) is that an IDS just detects an attack and at the end notifies a dedicated user with an alarm, but it does not provide anything to prevent this attack. Thus, IDSs are designed just to inform system administrators of important attacks as soon as possible. IRS go one-step further as explained in the next section.

2.1.3 Intrusion Response Systems (IRSs) An Intrusion Response System (IRS) is an ID system, which detects an attack and immediately responds in order to kick the attacker out of the network. This sounds simple but in practice is very difficult to accomplish. For example, the response could be used to an attacker's benefit again. Current automated IRSs do attack filtering. Attack filtering systems actively stop an attacker. One popular technique to do this is to interrupt a TCP connection between an attacker and a target. Another popular attack filtering technique is to dynamically change the routing permission table in routers and firewalls. However, modern computer attacks are usually launched using automated attack programs. These programs break into computers very quickly using only a few packets and can penetrate a host before an ID system detects and responds to the attack. The attacking program can, for instance, quickly install a back door. Then the attacker

15

approaches the compromised machine from a new IP address, uses the back door, and the ID system does not detect any abnormal entry into the host. Another problem occurs when the attacker is launching denial of service attacks. In this case each packet may be spoofed with a different IP address thus making filtering of the attack packets impossible. At the end, if the attacker knows the behavior of IRS, the automated response itself can be used by attacker too. So, this means that also IRSs have their problems and are not ideal. The ideal IRS response gathers evidence of attacker's activity, removes the attacker's access to the network, undoes the damage, and reconfigures the network to resist the attacker's penetration technique. It is impossible in today's environment to automate this ideal response since humans themselves have great difficulty enacting it. I cannot automate what I myself cannot do. However, I can automate an approximation of this ideal response. The approximation should have the following capabilities: The ability to dynamically modify or shutdown the target. This capability enables the system to automatically remove the intruder from the target, protect it from further damage by shutting it down or perform an enhanced audit of the attacker's actions. The ability to dynamically modify or shutdown the attacking host. With insider attacks, this enables the system to automatically stop the generation of the attacks as well as record evidence of the attacker's action. The ability to determine the host which is launching the attack. When attack packets are being spoofed, they can only be traced to an Ethernet wire by querying each router as to the source of the packets. Once the correct Ethernet is found, each host on the Ethernet must be analyzed in order to determine which is responsible for launching the attack. Thus an ID system must provide the capability to trace the path of an attacker. The ability to monitor all network traffic to and from the target. It is necessary to record for evidence the packets that the attacker sends to the target. In addition, it is necessary to record packets leaving the target since it may be used as a jump to penetrate other hosts. 









The ability to modify the routing and firewall permission tables on every firewall and router. We often want to isolate the attacker or target in order to prevent further damage. Such isolation can limit legitimate traffic and so we want to optimally place the filters such that the attacker is constrained the most while allowing the most legitimate traffic.

This list of capabilities needed to approximate an ideal response implies that security services are installed on every host and network device. Mobile agent (MA) technology can solve the problem of installing and maintaining the security infrastructure, because you need to install a mobile agent platform only once, and who knows, perhaps in near future, each new computer will come with an installed MA platform. Using MAs as common software it is not necessary to install a security server on every device as the MAs can automatically travel through the network and play the role of the appropriate security software or install it on the appropriate types of network devices.

16

2.2 Mobile Agents (MAs) The appearance of software agents has given rise too much discussion of what such an agent is and how it differs from programs in general. Selker [Selker94] and Morreale [Morreale98] define an agent from a programmer's viewpoint: An agent is a software component (along with its state) that performs one or more communication tasks by acting in a preset manner. Aside from the general agent definitions, most people focus on autonomous agents as Franklin and Graesser [Fanklin96]: An autonomous agent is a system situated within and a part of an environment that acts on it, over time, in pursuit of its own agenda and so as to effect what it senses in the future. The various definitions listed above involve many properties of an agent. Franklin and Graesser define following agent properties: reactive (responds in a timely fashion to changes in the environment), autonomous (exercises control over its own actions), goaloriented (does not simply act in response to the environment), temporally-continuous (persistence of identity and state over long periods of time), collaborative (can work in concert with other agents to achieve a common goal), communicative (the ability to communicate with persons and another agents), adaptive (able to learn and improve with experience), mobile (able to migrate in a self-directed way from one machine to another), flexible (actions are not scripted), and character (believable personality and emotional state). Franklin and Graesser state that every agent satisfies the first four properties. Adding other properties produces potentially useful classes of agents, for example, mobile adaptive agents. The concept of an Agent can be summed up by following definition by Green [Green97]: An agent is a computational entity which acts on behalf of other entities in an autonomous fashion, performs its activities with some level of pro-activity and/or reactiveness, exhibits some degree of the key attributes of learning, co-operation and mobility. 





Magedanz [Magedanz96] classifies the existing systems in the context of single-agent systems and multi-agent systems. In single-agent systems, an agent performs a task on behalf of a user or some process. While performing its task, the agent may communicate with the user as well as with local or remote resources, but it will never communicate with other agents. In contrast, the agents in a multi-agent system may extensively cooperate with each other to achieve their individual goals. Of course, in those systems, agents may also interact with users and system resources. There is a significant difference between mobile agents and simple "traditional" mobile code. This difference can be described by two kinds of mobility: a) Remote Execution (which means that a program is sent to a remote location before its activation and remains at this location during its entire life time) and b) Migration (which means that a program/mobile agent is able to change its location during its execution. Mobile agents can be regarded as an alternative of the traditional client-server paradigm. While the client-server paradigm relies on remote procedure calls across a network,

17

mobile agents can migrate to the desired communication peer and take advantage of local interactions. In this way, several advantages can be achieved, such as a reduction of network traffic or a reduction of the dependency of network availability. The mobile agent paradigm is often regarded as a replacement of the client-server paradigm but a mobile agent based system can be viewed as an extension of distributed client-server system. The client-server design paradigm is well known and widely used. Design paradigms define architectural abstractions and reference structures that may be instantiated into actual software architecture. The most relevant design paradigms for current systems are Client-Server, Remote Evaluation, Code on Demand, and Mobile Agent. In the Client-Server paradigm a client requests and gets an answer from the server without any movement. The difference in the Remote Evaluation paradigm is the transfer of know-how to the remote location and that a remote component executes this code. In the Code on Demand paradigm the resource is transferred to a local location. At the end, in the Mobile Agent paradigm the know-how and whole component are moved to the remote location and this transferred component executes this code. Although all this paradigms are similar, the above differences are very significant for performance of distributed systems. Imagine the following scenario: Component A (located at site A) needs the results of a service, and this service component is located at another site B. Figure 2.1 shows the location of components before and after the service execution. For each paradigm, the component in bold face is the one that executes the code. Components in italics are those that have been moved. Figure 2.1: Mobile code design paradigms Design Paradigms Client-Server

Components location before Site A Site B A know-how resource B

Remote Evaluation

know-how A

resource B

Code on Demand

resource A

know-how B

Mobile Agent

know-how A

resource

Components location after Site A Site B A know-how resource B A know-how resource B resource B know-how A know-how resource A

A mobile agent system must contain all of the following models (see Figure 2.2):













An agent model. A life cycle model. A computation model. A security model. A communication model. A navigation model.

18

A mobile agent system is a software system which is distributed over a network of heterogeneous computers. Its primary task is to provide the execution environment and the implementation of the majority of mobile agent functions. It may also support access to other mobile agent systems and openness when accessing non-agent based software environments. Figure 2.2: A simple view of the structure models of mobile agent systems

Communication

Security Computational Life cycle control Intelligence

Navigation

In the middle of Figure 2.2 are models which are responsible for the agent's actual task realization. The external models are also essential for correct functionality of mobile agent system but the agent's actual task realization is not done there. Both the security and life cycle control models are structurally very close to the core, i.e., computational part (see Figure 2.2). Security issues permeate every aspect of a mobile agent system and therefore must be provided for the most basic level. The life-cycle model defines the valid states for an agent. The outer layer contains the communication, navigation, and intelligence models. The communication and navigation models are responsible for communication with and transport to another nodes of the mobile agent system. The intelligence model defines aspects of mobile agent system such as learning and collaboration functions which can be done: centralized - the system intelligence is implemented in some module code fixed on a host, or distributed- the system intelligence is implemented in mobile agent code direct.



Mobile Agents (MAs) have a well-defined life cycle. Figure 2.3 illustrates the six states of this life cycle:

Figure 2.3: A mobile agent's life cycle 19

Initializing

Starting Computing

Moving Stopping

Completing

Here's what happens in each of the six states: Initializing: performs one-time setup activities such as building initial data structures, Starting: start its calculations, Computing: calculate mobile agent's task, Stopping: stops its calculations, saves intermediate results, and stops all threads, Moving: moves mobile agent to the next node in network, Completing: performs one-time termination activities.











2.2.1 MA Advantages MAs hide the complexity of the network infrastructure and make the heterogeneity of the network's wide range of information sources and access protocols invisible for the user. Most distributed object technologies have objects that are distributed but stationary. Mobile computers do not have a permanent network connection and they are often disconnected for long periods. As a consequence, mobile computers change the network's structure. Mobile agents are one promising way to handle these network conditions. Some major benefits of using mobile agents are: MAs reduce network overload MAs searching for special information can migrate directly to a host where the data is actually stored, rather than having to move all of the data across the network for searching. Of course, this makes sense only if the data are larger than the code itself which usually is the case.

20

MAs overcome network latency MAs can be migrated to the point of failure to act directly at this remote point of interest. MAs, since they are distributed throughout the network, may take advantage of alternate routes around any problem communication links. They cannot process each system event in real time, but because they are mobile and flexible, they are often faster than other systems. MAs execute asynchronously and autonomously If we have a central controller in our security system, the critical role played by this controller makes it a likely target for an attack. MA frameworks allow ID systems to continue operation in the event of a failure of a central controller or communication link. MAs can be migrated from mobile devices to the network and act on the user's behalf, without forcing the user to stay online all the time, incurring expensive connection fees. MAs allow for a natural way to structure and design an ID system Rather than a monolithic static system, an ID system can be divided, for example, into data producer and data analyzer components which are represented as agents. The data producer provides an interface to the networks it sniffs or to audits trails it filters. Multiple analyzers, each responsible for detecting a single attack or a small set of attacks, interact with a producer to detect attacks. In such a framework with a lot of software components, MAs from multiple developers can be used to create an ID system. MAs are robust and fault-tolerant MAs' ability to react dynamically to unfavorable situations and events makes it easier to build robust and fault-tolerant distributed systems. Their support for disconnected operation and distributed design paradigms eliminate single point of failure problems and allow MAs to offer fault-tolerant characteristics. MAs provide a versatile and adaptive computing paradigm MAs can be retracted, dispatched, cloned, or put to sleep as network and host conditions change. For example, as better MA-based detectors for an attack are developed they can be sent out on the network to replace the older version, or if the computational load of the host platform is too high, the agent and its data can move to another machine that can better satisfy its computational needs. MA systems are scalable As the number of computing elements in the network increases, new agents can be sent or cloned and dispatched to new machines in the network.

MAs are able to operate in heterogeneous computing environments One of the greatest benefits of MAs is the implementation of interoperability at the application layer. Since MAs are generally computer and transport-layer independent, 21

and depend only on their execution environment, they offer an attractive approach for heterogeneous system integration. MAs' ability to operate in heterogeneous computing environments, however, depends on a virtual machine or interpreter on every host platform.

2.2.2 MA Disadvantages Despite the benefits described in 2.2.1 there are a number of issues that need to be solved before the MA technology can be used on a large scale: MAs will introduce vulnerabilities into the network The MA computing paradigm presents a number of security threats that are not addressed by conventional security techniques. Standard security techniques must be modified or new techniques invented to address these threats. The security threats can be classified into four broad categories: agent-to-agent, agent-to-platform, platform-toagent, and other-to-agent platform. The agent-to-agent category represents the set of threats in which agents exploit security weaknesses of other agents or launch attacks against other agents. The agent-to-platform category represents the set of threats in which agents exploit security weaknesses of other agents or launch attacks against an agent platform. The platform-to-agent category represents the set of threats in which platforms compromise the integrity of agents. The other-to-agent platform category represents the set of threats in which external entities, including other agents on other agent platforms, threaten the security of an agent platform. MAs may need to run with administrative privileges to produce results and other tasks. This can cause serious security risk if malicious MAs can be introduced into system by an attacker. Furthermore, an attacker may be able to alter the data of an MA and thereby cause it to perform malicious actions. Dynamically loaded classes need extra security Java-based MAs typically load their class files dynamically as needed from their home platform. The ability to dynamically load classes also has security implications. If the home platform is not available, these classes files may be provided by the local host or must be found and transferred from a remote trusted host, which raises a number of security issues. The class files may have been modified in such a way as to alter the functionality of the agent or even to allow for eavesdropping of the agent's transactions. Java addresses this problem by digitally signing each class. Class versioning problems may also yield problems from which the MAs may not be able to recover. All the systems are still proprietary Most systems are built for a special purpose or to solve a group of special problems. Currently, no system has a generic architecture or standard to support flexible and reusable components for modular system construction except MASIF [Masif97] and FIPA [Fipa00] which are not really accepted in practice until now. All systems fail to protect the agent from malicious hosts

22

Currently a host can manipulate an agent's data, for example to change a web access entry in a logfile. Lack of a priori Knowledge Large enterprise networks contain several different hardware platforms, running several different operating systems, each having different configurations and running different applications. Each mobile agent platform must have a priori knowledge about how all systems are configured or how data is arranged, and sometimes it is very difficult to get such systems' information. Most systems lack efficient mechanisms to control the termination of agents Most of today's MA systems allow creation and cloning of agents, but effective mechanisms for the termination of an agent and all clones is still missing, meaning that the agents or clones can roam around the network forever. Complex agents lack transactional support The size of MA code inclines to be large and complex and may limit the functionality of ID systems implemented using MAs (MA-ID) system because it will take a long time to transfer an agent between the hosts. Agents performing complex tasks currently have no overall transaction mechanism to guarantee the consistency of the achieved results.

2.3 Relationship between Mobile Agents and Intrusion Detection Implementing intrusion detection systems using mobile agents (MA-ID) is one of the new paradigms for intrusion detection. Relatively little work has been done on using a mobile agent architecture for the purpose of providing a security capability, such as intrusion detection. While MAs are a powerful tool, their implementation has been hindered by security considerations. These security considerations are especially critical for intrusion detection systems, with the result that most security research in this field has concentrated upon the architecture necessary to provide security for mobile agents. Current work in applying agents to ID is being conducted at a number of research labs, but this work is still not complete. Following projects represent such work: Autonomous Agents for Intrusion Detection (AAFID) effort at Purdue University [Balasubramaniyan98], Hummingbird developed at University of Idaho [Frincke98], Java Agents for Meta-Learning (JAM) effort at Columbia University, NY [Lee99], Intrusion Detection Agent system (IDA) is developed by Information-technology Promotion Agency (IPA) in Japan [Asaka99], etc. A MA-ID system: must continuously monitor and report intrusions, should be modular and configurable as each host and network segment will require its own tests, must be able to operate in a hostile computing environment, exhibit a high degree of fault-tolerance, and allow for graceful degradation,





23



















should be adaptive to network topology and configuration changes as computing elements are dynamically added and removed from the network, should have a very low false alarm rate and minimal overhead, must supply enough information to repair the system, determine the extent of damage, and establish responsibility for the intrusion, should be able to learn from past experience and improve its detection capabilities over time, should be able to be easily and frequently updated with attack signatures as new security advisories and new attacks and vulnerabilities are discovered, will be required not only to detect anomalous events, but also to take automated corrective action, need to be able to communicate with the hardware-based devices, to perform data fusion and to be able to process information from multiple and distributed data sources such as firewalls, routers and switches, must have the ability to detect and react to distributed and coordinated attacks, needs to support post event analysis to identify compromised machines before the network can be restored to safe condition, should detect anomalous events in real-time and report them immediately to minimize the damage to the network and the loss or corruption of data, must use agents that are cognizant of the consumption of network resources for which they are competing, must be scalable, since the MA-ID system must be able to handle the additional computational and communicational load as new computing devices are added to the network, and of course, the MA-ID system itself must also be designed and implemented with security in mind, and must not create additional vulnerabilities.

MAs have many characteristics that enable them to enhance ID technology. Mobility is obviously one of the most important capabilities. However, other agent capabilities also lend themselves to ID technology. MA technology and MA applications mimic collections of autonomous and intelligent individuals. Classes of individuals have special purposes and each can operate independently from others. MAs are by nature autonomous, collaborative, self-organizing, and mobile. These features are not found in traditional distributed programs, and enable ID systems to implement completely new approaches for doing intrusion detection, some of which are based on analogies found in nature and in society. ID systems perform multi-point detection by analyzing events at multiple locations in order to detect distributed or staged attacks. The events may come from multiple hosts, applications, or network interfaces. Multi-point detection is especially useful in detecting attacks on a network, as opposed to attacks on a host, because host-based system will only realize that individual components are under attack, and cannot speculate and learn on the larger strategy. A collaborative multi-agent system can be self organizing and thus adaptive to attack. Some areas for research and experimentation include:

24







Completely distributed and decentralized ID systems architectures where no single points of failure and numerous redundant information pathways exist. A standard hierarchical ID system where an MA backs up each node and restores any lost functionality out of sight of an attacker. MA-ID system that relocate the resources which are attacked when any suspicious activity is detected.

Each agent may perform specific tests (much like a mobile sensor) and randomly roam the network. When the tests indicate the possibility of an intrusion, the agent may ask for additional tests at the site. Only after the suspicion level has been raised high enough the actual alarm is given. Notice that the attack is confirmed by executing only relevant tests. Since MAs roam throughout the network, they may not be constantly resident at every node. Consequently, those nodes without a resident agent are vulnerable until an appropriate agent arrives. This situation, however, is not as deleterious as it appears at first glance. An attacker may successfully break in a host with a conventional host-based ID system and not be detected immediately. This could happen either because the attack was too clever for the ID system or because the ID system only scans the host for attacks periodically due to performance considerations. In this event, the attacker now has free reign to inspect and alter the ID system, install backdoors, and remove evidence of the attack from the audit log. MAs offer some benefits for detecting such tampering. As each new MA arrives at the host, it embodies a fresh copy of the ID procedures. Some of these checks may make sure that the agent platform is unaltered. For example, an agent could calculate a checksum of a static system file or perform a similar integrity check on some aspects of the platform, and report the result upon return to a point where validity can be determined. An unexpected result would warrant remedial action. One way to teach agents different ways to detect attacks can be to give an agent base knowledge about an attack and have them automatically learn their own technique for detecting it. Thus, MAs can automatically learn attack signatures and learn different signatures. This will prevent the attacker from predicting the exact signatures used and thus enhance the distributed MA-ID system. MAs enhance a system's ability to automatically respond because MAs make it possible to make all network components part of the same security scheme. Responses can be initiated at any place in the network, which gives systems the capability to optimize the locations at which they initiate responses. Furthermore, MAs enhance an ID system's ability to trace an attacker through the attacked network, to respond on the target, respond on the attacker, and to collect network/host evidence about the attack. So, at this place your question will be probably: "How?" and above all "Why do I need MAs in this system?" . Well, the answer and the advantages are as follows: new attack – I do not need to install a special security tool on each computer in a network; I just need to create a new agent and send it to all computers, complex distributed security attacks – with sending more than one mobile agent I can solve this security problem too, 



25









system changes – normal users do not know about which system features are checked or about new agents in their network data reduce transfer – e.g., the intrusion detection use case of my system, where the agent intelligence is mobile and an agent is able to analyze data at remote host, complex distributed security rules – can be more easily checked with more than one mobile agents, attacker following or remote check– with a mobile agent I can follow attacker's remote address or check his remote log files.

2.3.1 Selection of mobile agent platform For mobile agents to be useful for intrusion detection, it is necessary that many, if not all, hosts and network devices must be installed with an MA platform. This is not a farfetched assumption because an MA platform is general-purpose software that enables organizations to implement many different applications. If MAs become popular, every new host may come preinstalled with a MA platform just as today most web browsers come boundled with a Java interpreter. Contrast this to many ID systems schemas that assume that a host-based ID system is installed on every host. It is generally too expensive to install a proprietary solution (like a host-based ID system) on every host in a network, but it is not unusual to install a general-purpose interpreter (like an MA platform and Java virtual machine) on every host. The Mobile Agent Platform selected should be Java-based in order to allow for platform independence and to take advantage of Java’s “write once run everywhere” architecture. Therefore, attention is limited only to platforms implemented in Java. What a MA platform should provide for ID? At first, it must be secure and operating system independent. Also it must be extensible, component-based, scalable, and flexible, since such system must be able to handle the additional computational and communicational load as new computing devices are added to the network or removed from the network. In this thesis, I restrict my work to an analysis of security related evaluation criteria for MA-ID platforms because the analysis of other evaluation criteria is beyond the scope of this work. Security Evaluation Criteria The ability to write code that can transfer itself to various systems and gain access to local resources (file system, CPU, memory, OS routines) can represent a major threat regarding security issues. Java’s sandbox model guarantees that if an appropriate security manager is set, certain sensitive system resources cannot be manipulated by untrusted classes. Denial of service attacks however cannot be prevented using the security features of the Java language alone. So, here are additive security evaluation criteria for a MA-ID system: Criteria 1: Higher-level authentication mechanisms e.g., with digital signatures must be employed that will allow an agent system to identify the person or organization that is responsible for an agent and screen incoming agents that do not meet security requirements.

26

Criteria 2: Cryptography extensions: Another aspect of security is the ability of agents that have the required security credentials to keep the nature of their interaction with other agents and more importantly their data, confidential. Cryptography extensions of the Java core API allow the programmer to manually implement such features but integration of these characteristics in the platform itself is more desirable. Criteria 3: Transmission protection: The secure network transfer of agents must be provided. Criteria 4: Resource protection: The protection of a host from attack or misuse by malicious agents must be restricted. Criteria 5: Agent protection: The protection of an agent from attack by a another malicious agent must be provided. Criteria 6: Host Protection: The protection of an agent from attack from a malicious host cannot be provided, because the host must have all control over an agent in order to execute this agent, but the detection of such attack must be as good as possible. After making an analysis of existing mobile agent platforms like Grasshopper [Grasshopper00], Voyager [Voyager00] or Gypsy [Gypsy00]. I decided for the mobile agent platform Gypsy because it is developed at the Technical University Vienna and it is important that I have access to Gypsy's code because a lot of existing mobile agent platform features should be changed or just newly implemented. Also, Gypsy satisfies the needs of a MA-ID system, because it is a flexible and dynamically extensible environment for experimenting with mobile agent programming. As many other MA platforms, Gypsy uses the Java security mechanisms for secure class loading. Furthermore, it uses Java Sandbox security model [Rubin98] which includes code signing [Gong98] and class loading. The existing mobile agent platform Gypsy is described in more detail in section 3.3. In the system which I will describe in next chapters, there are a lots of new parts and changes in the design and architecture of existing Gypsy platform. The existing Gypsy platform had not sufficient features for my work. Because of that, I will call this new part of the system "Secure Gypsy (SG)" in this thesis.

2.3.2 Mobile Agent and Intrusion Detection Conclusions At first glance, MA technology offers much to the field of ID. The idea of mobile and autonomous components intuitively seems useful in ID and many other areas. However, it is difficult to realize the benefits of MA technology in practice. Despite these difficulties, the technology appears to provide valuable extensions to current capabilities. MA will enter mainstream use because MAs may enhance the performance of ID systems and even offer them new capabilities. However, obtaining these benefits is not easy and thus there are three main research areas for using MAs to do ID: performance enhancements (design MA-ID systems that take advantage of mobility and autonomy to obtain better performance than equivalent non-mobile ID systems), design improvements (use MA technology to enable novel paradigms for detecting attacks), and response improvements (use MA technology to enable novel and efficient automated responses to attacks).

27

In this thesis, I will describe a complete system, i.e., all functional components, its architecture and a prototype design. The practical part of my thesis is going to concentrate on agents for intrusion detection in the Web area. I will implement an agent which checks the log files from different web servers.

28

3 Functional System Components The mobile agent system presented in this work monitors the implementation of security policies, identifies security problems and performs intrusion detection in a heterogeneous computing network. This chapter describes the main components of this system toolkit, as it is seen from the functional requirements point of view. At first I will describe the major parts, the technology that is used, and the idea of the complete system. My work is included in the European Union project called SPARTA (http://www.infosys.tuwien.ac.at/sparta). In this system, different types of mobile agents will fulfill different types of security management, intrusion/misuse detection, and security policy monitoring. OS data from each host are analyzed and correlatively response tasks are given. Security policy formalization in the form of rules that can be stored in a rule repository is a step that facilitates the use of agents for enforcing the policy. Interpreted policy rules and real system data will be input for an agent with an intelligent data analyzer that will have a capability to recognize “the abnormal” pattern. Furthermore, it is described how collecting of the real system data from various hosts will be performed. The mobile agent and data messaging structures are described and special attention is given to the security of the system structure itself. For this purpose, the security of agents and roles in the system will be described. A simplified version of the system functionality is given in Figure 3.1. Figure 3.1: Simplified system functionality Mobile Agent Platform

User Side Components OS

where users/administrator can monitor and react.

Files

infrastru cture

Secure infrastructure

Host 3

Network analysis

Mobile Agent Platform OS Files

infrastru cture

Secure

Mobile Agent Platform

infrastructure

Host 1

OS Files

infrastru cture

Secure infrastructure

Host 2

29

We can distinguish several issues in Figure 3.1 : user side components with Graphical User Interface (GUI) where users/administrator can monitor and react on security policy related issues like rules editing and mapping, agents dispatching, status obtaining, and data analyzing, the mobile agent platform, which is not shown in Figure 3.1, but it consists of agent servers (based on a host), agents-related issues like the runtime environment, information space, etc. and finally, secure infrastructure related issues like securing of the mobile agent platform and securing of all system communication and data. 





Consequently, this system consists of the following three major components: User Side Components A GUI (preferably a browser) provides support for launching agents and viewing its results and access to the security policy and intelligence of this system. A security policy defines the rules that regulate how an organization manages and protects its information and computing resources to achieve security objectives. One of the system's primary purposes is to automatically (with the help of mobile agents and data analysis) detect signs of abnormality and to define the range of threats, impact of a threat, trust level, etc. Rules for achieving such security objectives will be stored in a repository. Having policy rules in a repository, will help the intelligence of the system to detect signs of abnormal patterns and it provides the ability to exercise combined procedures in a timely, managed and controlled manner. User side components are described in detail in section 3.2. Mobile Agent Platform The built-in security in current agent platforms is not sufficient for systems, where the combination of a high degree of security and a high degree of flexibility is needed. Further, since agents which monitor security have to be as least as well protected as the system they protect, securing an agent platform properly is a necessary task. As mentioned earlier, the system uses the existing Gypsy platform as a basis for the mobile agent platform, which is a component-based and dynamically extensible environment for experimenting with mobile agent programming. Gypsy was developed at the Technical University Vienna and consists of two basic components: servers and component-based mobile agents which are described in section 3.3. Secure Infrastructure Sending an agent over the network has a number of security implications. The users and agent servers will have secret keys which are used for encrypting and signing messages as required. Since real life applications of this system will have many users and agent servers, a Public Key Infrastructure (PKI) is required to administrate all of those keys, and in particular to provide a system for ensure that keys are not compromised. The secure infrastructure is described in detail in section 3.4.

30

3.1 System Introduction As mentioned above the 3 basic components cooperate to monitor the implementation of security policies and to identify security problems as well as to perform intrusion detection in a heterogeneous computing network. The use of agents is well motivated by the desire for being flexible enough to change the security configuration and the checks that are actually carried out at run-time - without having to have skilled persons present at each monitored site and without interrupting the system’s activity. As a consequence, two main use cases in this system can be identified: Surveillance (of a given security policy) with one or more centralized monitors, where agents jump only once to the monitored site (subordinate host) and than immediately back to the monitor as described in Figure 3.2, Intrusion Detection, which is more important for my work is planned to be distributed though few parts of it will be centralized. In this use case agents can jump more than once between hosts in a network and than the results are shown as described in Figure 3.3. 



Surveillance Use Case In the surveillance use case the agents are launched from a central monitoring cite to which they also report back. The irregularities they search for are of the following form: General vulnerabilities of the individual computers and of the system as a whole. Breaches in an agreed or imposed security policy. 



The typical action taken by an agent, when it discovers an irregularity is that it reports about it to the monitoring station and to the local system administration managing the host with the irregularity. Of course, more direct intervention by the agents is possible. A setup to perform surveillance consists of one or more monitored sites (called subordinates) and a number of monitors. The monitors run programs (called data analyzers) which are located only on monitors in this use case, process gathered data and present results to users via a graphical interface. As the data is fetched from each host via mobile agents and analyzed on monitors, each subordinate and monitor needs an installed agent platform (see Figure 3.2). The advantages are still the system's scalability and the introduction of new attacks.

31

Figure 3.2: System Use Case - Surveillance

User

Host2

Host1

Subordinate

Monitor

Host3

Host5

Subordinate

Host7

Subordinate

Monitor

Host6

Host8

Subordinate

Subordinate

Intrusion Detection Use Case In the intrusion detection use case the agents are predominantly looking for intruders. The emphasis will be on checks for intrusion, which can not be discovered on a single host alone. An example is to check that distributed actions, where more computers have been involved, are logged consistently on the various computers. If this is not the case (e.g., the log files on the local and remote hosts are not consistent), an indication for a man-in-the-middle attack has been found. In contrast to the monitoring use case, there has to be no central site from which the agents are launched. They will be longer-lived and more autonomous than in the monitoring use case. In particular, they will jump directly between monitored hosts without visiting the monitoring station. A setup equivalent to existing systems for distributed intrusion detection is also supported by the design. In such a setup, agents carry intelligence to the monitored station where a dedicated module DAM (see section 4.7) performs the analysis. In some cases the approach of using agents for a distributed search for intruders minimizes the amount of data that has to be transported over the network. Also it minimizes the pressure on the central cite. The intrusion detection mechanism in this system is planned to be almost fully distributed. There is no centralized monitors and no distinction between subordinates and monitors, only the user side is not distributed. Different mobile agents travel from host to host (with agent servers) and scan them for different vulnerabilities and security violations. When enough suspicious activities are detected and a possible intrusion is assumed, special agents are started to investigate further and the site administrator is alerted (via a graphical interface running on machines called home servers). A setup for intrusion detection consists of a set of agent servers and, additionally, one or more home servers may be present (see Figure 3.3).

32

Figure 3.3: System Use Case - Intrusion Detection

Host4

Agent Server Host5

Host3

Agent Server Host2

Host1







Agent Server

Home Server

Agent Server

User

Agent Server – A standard host with an installed agent platform. Here the mobile agents are executed to fulfill their tasks. The agent server is also responsible for transferring the agents to their next hop on their route. Home Server – A special agent server, where agents return their results and may alert the user. The home server has a component that can be used by graphical interfaces to obtain and display results as well as launch agents. User - a network administrator or a regular user.

The detailed description and integration of all system components, the presentation of the overall design of the high-level components, and the integration of low-level components into this framework is postponed to later sections and chapters.

3.2 User Side Components At first, this system provides remote user logon, archiving, filtering, and result interpreting. This all is done in user side components, and in the system's use case surveillance the statistical data analysis is also done here. User side components' other duties are risk analysis, risk management, and rules editing. Each duty is described separately in one of the next sections. As mentioned earlier, security policy is the set of laws, rules, and practices that regulate how an organization implements, manages, protects, and distributes its information and computing resources to achieve security objectives. A Security Policy Editor (SPE) component defines the rules for achieving such objectives.

33

The SPE specifies system policy parameters such as: Regular inspection and auditing intervals, Sources of recorded data (e.g., logs) that are used to identify evidence of abnormalities, What files are to be checked, Review policy intervals, Attack scenarios, security vulnerabilities, and methods for their detection, The user can add and modify existing data, Analysis of risks for different components belonging or connected with the information system (called resources or assets). The purpose is to estimate the impact that an insufficient security has for the host. Decision rules for pattern matching or other analysis algorithm: policy formalization, insert new rules in to database, change rules, and assign rules to an agent… 















For the purpose of the pilot system several security policy rules will be interpreted and translated to a format understandable for an intelligent data analysis agent. Candidate areas to consider include privacy (monitoring of electronic mail, access to files), access (acceptable use guidelines for users), accountability (responsibilities of users, auditing, incident handling), authentication (passwords, remote location), availability of resources (redundancy and recovery), system and network maintenance (ability to perform maintenance), violations/incidents (what is to be reported and to whom). 













3.2.1 Security Policy Elements In fulfillment of an analysis and security control we use six basic security policy elements: Resources (Assets) Threats Vulnerability Impacts Risks Safeguards (Functions and Mechanisms) With these security policy items I denote all objects that form the basis for the creation of security policy rules. To identify these elements in each host is very important to know and value the security aspects of the domain, to understand the vulnerability, this is, the likelihood of attacks of each threat, and to value the impact as a consequence of the possible attack on every single resource.

34

Resources (Assets) Each host in a network has several assets (resources). The following list orders their vulnerability by likelihood of attacks on each asset: 1. Stored data (e.g., accessing files with all students' grades), 2. Communication data (e.g., controlling web access), 3. Supplies and data storage media (e.g., using memory space), 4. System computer programs and documentation (e.g., using OS programs), 5. Application computer programs and documentation (e.g., using bussiness software), 6. Information (e.g., getting information about user of the system), I distinguish the following types of information exchange that can occur in a network: E-mails (incoming and outgoing), Internet (HTTP, ASP, JSP…) pages from outside, Internet (HTTP, ASP…) to the outside world (own web server), Database access (ODBC…), Remote file access, E-commerce and e-business (payments, orders…), News (incoming and outgoing), different IP packets, chat, telnet, ssh, ... 

















In each network we can consider different levels of data confidentiality for each resource: Public, Copyrighted, Confidential (content is secret), Secret (content and existence is secret). Threats A threat is defined as an event that can cause an accident on the host, producing injuries or loss of its resources. I can consider several groups of threats, according to which security principle they go against: Authentication: A way to verify that message senders are who they say they are. Integrity: Ensuring that information will not be accidentally or maliciously altered or destroyed. Reliability: Ensuring that systems will perform consistently and at an acceptable level of quality. Confidentiality: to keep information secret. Privacy: The ability to control who sees (or cannot see) information and under what terms. Availability: The ability to access the information and communication services when they should be available. Examples for common threats are: 35





















Attempts to gain unauthorized access to a system or its data, Unintended and unauthorized disclosure of information, Service interruption: unwanted disruption or denial of service, Unauthorized use of a system for data processing or storage, Use of errors during the collection and transmission of data or use system's errors, Unauthorized logic accesses with alteration or subtraction of information or configuration. This is reduction of confidentiality, Unauthorized logic accesses with corruption or destruction of information. This is reduction of integrity and/or availability without direct exploitation, Indisposition of resources, Supplanting of sender or receptor (‘man in the middle’) or identity, Repudiation of origin or of the reception of information in transit ...

Vulnerability The vulnerability of a resource is the possibility that a threat is materialized on a resource. Vulnerability is a property of the relation between a resource and a threat. Impacts The impacts on a resource are the consequences of the materialization of a threat and is the result of the aggression on the resource. We can see it from a more dynamic view as the difference between the security considered before and after the event. Risks Risks are the possibilities that such impacts are caused on a resource, host or in the whole network. It is an indicator resulting of the combination of vulnerability and the impacts that produce the threats operating on the resource. The calculated risks allow taking rational decisions considering the security aim of the host. Residual risk will be the risk remaining after applying safeguards to various threats. Safeguards Safeguards are practices, procedures or mechanisms that protect against a threat, reduce vulnerability, limit the impact of an unwanted incident, detect unwanted incidents and facilitate recovery. In security mechanisms we include all types of actions that help in achieving the desired security target (for example, to achieve confidentiality, we might use encryption).

Figure 3.4: The Security Policy Elements

36

Safeguard The SIM model consists of the following main items: function

Impact level

Entity: Threat group contains

information regarding the host owners that place system agenthasserver. has This is one node in the network (one execution place of the agent). Resource: contains information about resources to be monitored (for example ingoing andhasoutgoing mail, files, directories etc). is related to Datasource Impact it contains detailed information about the categories of threats (intrusion, Threat group: Safeguard misuse, etc). mechanism Threats related to the specific threat. Examples Threats: contains all the necessary information, is related to are system fall down, stolen or given password, unencrypted confidential mail etc. has the functions of security Safeguard functions: contains is related to detailed information about Resource Vulnerability prevention measures (firewalls, operating system access, and communication has encryption…). contains information about the specific measures, Safeguard ismechanisms: related to has recommendations, standards etc (use digital signature, delete files from temp directories, check additional connection parameters…) has Entity data confidentiality, availability etc. Impact: contains information about type of risk: Impact level: contains information about level (scale from 1 to 10 for example). Risk scenario user builds risk scenario for the security policy monitoring. This scenario Risk scenario: has is stored in the database as the rule and later is interpreted. files, event Data source: contains the sources of collected data to be analyzed (log create records, etc.) Relationship: about type and/or level of vulnerabilities. Vulnerability: contains information One to One User

One to Many Figure 4.19 The RM Business Data Model Many to Many

In Figure 3.4 the relationships between all security policy items are shown: Entity: This is a node in the network which contains information regarding the host owners and an agent server respectively (execution place of the agent). Resource: contains information about resources to be monitored (for example incoming and outgoing mail, files, directories, etc). Threat group: it contains detailed information about the categories of threats (intrusion, misuse, etc.). Threats: contain all the necessary information related to the specific threat. Examples are system fall down, stolen or given password, unencrypted confidential mail, etc. Safeguard functions: contain detailed information about the functions of security prevention measures (firewalls, operating system access, and communication encryption,…).

37

Safeguard mechanisms: contain information about the specific measures, recommendations, standards, etc. (use digital signature, delete files from temp directories, check additional connection parameters…). Impact: contains information about the type of risk: data confidentiality, availability, etc. Impact level: contains information about level of damage (scale from 1 to 10 for example). Risk scenario: the user builds a risk scenario for the security policy monitoring. This scenario is stored in the database as a rule and is interpreted later during agent creation. Data source: contains the sources of collected data to be analyzed (log files, event records, etc.) Vulnerability: contains information about type and/or level of vulnerabilities. So, I can define the following prevention measures: Removal of possibly dangerous applications (e.g., telnet, FTP,…), Blocking of some commands (e.g., regedit,…), Block access coming from certain IP addresses, Restrict access to certain ports, Control filtering of network traffic based on protocol (e.g., TCP, UDP,...), Extensive traffic logging and monitoring, Disable IP forwarding, If you have a web server, check browser's source IP address, Possible authentication required from user, Encryption: A process of making information indecipherable except to those with a decoding key, Firewall protection: A filter between a corporate network and the Internet that keeps the corporate network secure from intruders but allows authenticated corporate users uninhibited access to the Internet, Blocking: The ability to block unwanted information or intrusions, Non-repudiation: Non-repudiation of origin defines requirements to provide about the identity of the originator of some information. The originator cannot successfully deny having sent the information because evidence of origin (e.g., digital signature) provides evidence of the binding between the originator and the information sent. The recipient or a third party can verify the evidence of origin. This evidence should not be forgeable. 

























3.3 Mobile Agent Platform: Gypsy This section will give a short description of the existing mobile agent platform Gypsy. Also the purpose of this section is to introduce Gypsy’s terminology and components at a high level. The Gypsy system consists of two basic components: simple servers and componentbased mobile agents. Gypsy has very simple servers:

38

Agent Server (AS) and Home Server (HS). 



All servers are processes with their own Java virtual machines and can basically start and stop two different kinds of agents: places and communicators. Places provide agents with an interface to the underlying agent system, databases, and all other operating system services. Communicators have the ability to transfer agents from one Agent Server to another. Each server maintains configuration settings, in the GypsyConfig class. This class implements default settings, which can be overridden by a configuration file. On the basis of this configuration the server knows the address of the local place registry for asking the route for the agent, and the local SMTP host for transferring e-mail to the user. Each server instantiates at least one Communicator for transferring and receiving agents, an AdminCommunicator, where the system administrator can control and maintain this server through the remote administration tool, and a Place e.g., LogPlace. When the server is up and running, the system administrator can send special functionality to the server by creating new places and transferring them to the AdminCommunicator. There they will be registered at the server and started. The whole configuration and administration is done by a remote administration tool, which supports the set-up and shutdown of servers and agents. A special classloader fetches the code when it is currently not installed on the server from given codebases. The Gypsy environment supports three different types of agents: _

one-hop agents 

_

multi-hop agents 

_



embedded agents

One-hop agents can hop just once from one server to another. Gypsy supports two types of one-hop agents: communicators and places. Communicators are one-hop agents, which can communicate with other communicators of the same type over a network. Mobile agents can only be executed at places. The places provide the interface to the underlying operating system services for mobile agents. Multi-hop agents are the default agent model in Gypsy. They have the ability to hop between different locations on the basis of a fixed present travel list. Embedded agents cannot travel of their own volition and they are used as stationary agents on a server or are plugged into a mobile agent. Agents in the Gypsy environment travel between places. These places are located at servers where they provide special services to the visiting agents. A server is a process on a host and can run several different places. The server also provides underlying services like database systems for special places. Each server also has or knows a special place registry, which is accessible by the agents through the place interface. These place registries can be queried by mobile agents to find new interesting places to go to. The agent asks the current place to transfer it to the new place. At the moment, the agent's route is implemented as a fixed list. The place hands over the agent to the server, which has at least one communication interface to transfer and receive mobile agents from the network. In the current implementation the agent itself does not know how it is transferred. One of the main design goals of the Gypsy environment is the

39

implementation of a flexible and dynamically extensible environment for experimenting with agent programming.

3.3.1 Gypsy Mobility Mobile Agents are composed of the internal state and the executable code. Therefore agent mobility means the network transportation of both code and state. As stated earlier, agent mobility is accomplished by special one-hop agents called Communicators. An agent in Gypsy travels between places according to its itinerary. These places are identified by Location objects, which are stored in the place registry. When a user creates an agent, he must give it the first location to move to. He also defines the agent's reporting period, for example 5 hops. Then the agent queries the local place registry for interesting places and starts its travel. When an agent arrives at the place, it obtains a handle to that place and can at any time ask the place to transfer it to the next location on its route. Before the agent asks to be transferred, it registers its next destination on the place's blackboard. Then the place hands the agent over to the server, the server checks whether the next location is on this server and if so, transfers the agent to that place. If the next location is located on a remote server, the server looks for compatible communicators of the next location and hands the agent over to proper communicator. The agent terminates itself when no more locations are left in its routing table and it has no home URI. Otherwise, it returns to its home server. Gypsy currently implements two different communicators for transferring the agent: An RMIAgentCommunicator and an EmailAgentCommunicator. The agent communicator checks the incoming state of an agent for its serialization uniform identifier (UID). If the classfile with the same serialization UID is already installed on this host, the communicator creates an instance of the agent and hands it over to the desired place. If the serialization UIDs of the data of the data and the classfile differ, the communicator invokes a special classloader to load the proper classfile from the given URL. These classes are loaded on demand on a just-in-time basis from a given codebase. Gypsy implements a GypsySecureJarClassloader which loads classes from signed JAR archives. These archives can be retrieved from the local harddisk or by making an HTTP request to a web server. The system administrator can define a list of trusted codebases from which archives are accepted. If the agent consists of classes not found in the list of these trusted codebases, the agent's access to the server is denied. Gypsy uses the Java security mechanisms for secure class loading as described in [Gong98]. Because it uses Java, code of mobile agents in Gypsy is expressed and transferred as Intermediate (Byte) Code.

3.3.2 Gypsy Security The existing Gypsy platform protects the host from malicious agent only through standard use of Java Sandbox security model [Rubin98], because of this fact in "Secure Gypsy (SG)", a lot of new security features are added. Agents are be protected from attacks by other multi-hop agents by the underlying agent security manager (ASM) of the host, which takes care to deny any other access from visiting multi-hop agent. This security feature is upgraded and integrated in SG.

40

The servers are protected against hostile agents through a special Java ASM and a class loader. The server administrator can define runtime security policies such as file read/write access, hosts and ports to contact. To protect the server from malicious code, the administrator can define a list of URIs from trusted codebases which are used by the secure classloader in the AgentCommunicator. The incoming agent will be revoked if the code cannot be retrieved from the given list. The EmailAgentCommunicator sends serialized agents as MIME attachment to mails either directly or by encrypting the serialized data using the Pretty Good Privacy (PGP) program. Each email communicator has a special email address where a local monitor thread accepts incoming mails. Such a mail can contain the serialized state of an agent or a request for the public PGP key of this communicator to transfer an agent securely over the network. To provide easy system administration, a special Admin-Communicator is used to add, upgrade or remove places and communicators from the remote administration tool. Each Gypsy server (except the user front-end) starts by default an RMISysadminCommunicator, which waits for incoming one-hop agents. Only agents of type Place or Communicator are accepted. The incoming agent is checked, and if it does not exist on the server, it is started. If an instance of the incoming agent is already running on the server but has an older version number, notification is given to terminate it and the new agent is started. The underlying network must be protected from mobile agents who roam the network forever and clone themselves indefinitely. Therefore Gypsy does not support the cloning of agents, and each multi-hop agent in Gypsy has by default a deadline of 24 hours where this deadline can be adjusted by the user. Thus, a server can protect itself from malicious agents by using a special configurable security manager and can ensure untampered code through the use of special secure classloader.

3.4 Secure Infrastructure Figure 3.5 shows a global secure infrastructure which secures the existing mobile agent platform Gypsy and its functionality in this system. All new features described in this section are integrated in the existing Gypsy platform. This extended and integrated mobile agent platform is called "Secure Gypsy (SG)". The agent servers as well as the users of this system will have secret keys. A PKI provides a very safe way of securing information like agent servers' and hosts' private keys. Consequently, agent servers and users need not trust anybody but themselves and the Certification Authority (CA) which guarantees that those certified are who they claim to be. So, the CAs issue digital certificates and validate the holders' identity and authority. In this system I supposed to have a separate CA for the agent servers' certificates and a separate CA for the users' certificates. This separation is not necessary because both certificates are X.509 certificates but it helps to achieve a clear system design. This system part is called Certification Authority Space (CAS). A Local Registration Authority (LRA) is a PC application which allows a potential user of another system to be registered in the CAS and have a secret key generated. Possessors of secret keys generate their keys themselves after that they are registered in the CAS through the LRA offices. They can then send a certification request to the CAS and receive a certificate. The overall picture of the security infrastructure is shown below. 41

To each Agent Server (AS) belongs an AS-API, which accesses the Internet, mainly in order to contact CA servers, and it will as much as possible resemble the current TCP/IP API used by Gypsy. In this way Gypsy becomes as replaceable as possible in this system. The encryption/decryption of agents and the validation of certificates is done locally by the Secure Information Space (SIS) where all local keys are stored. SIS communicates with the Agent Servers via the AS-API and it is also responsible for all agent permissions. For more information on the secure infrastructure and especially the SIS module see sections 4.4 and 4.5. This allows the system to remain independent of the current PKI implementation. The CAS-API manages the communication between the SIS and because of the stable CAS-API this system solution is also independent of the chosen CAS. Figure 3.5: Global secure infrastructure

CAS LRA

CA

CAS API

CA

LRA

CA

CAS API

SIS

SIS

AS API

AS API

Agent Server (AS)

Agent Server (AS)

The users use the CAS and the LRA for generating and administrating a high number of secret keys and certificates (e.g., from each user, each server, and each agent in the system). Following, these components must itself satisfy strict security requirements. As least as strict as the security requirements for the system, which is to be secured. The LRAs communicate with the CA servers via TCP/IP. All communication between the LRA and the CA is encrypted and authenticated. At a LRA application a user or owner of an agent server can have a secret key generated and registered at the CA using a preregistration which is sent from the LRA to the CAS. So, the user or agent server owner can then bring the secret key to agent server application. There he must register his new key. In particular a request for a certificate is sent to the CAS, and the final certificate is generated and sent to the user in this step. This request can be a encrypted request as used by Microsoft and Netscape webbrowsers, respectively.

42

3.4.1 LRA Description The version of the LRA used in this system is a PC application, which is run by a Local Registration Authority Administrator (LRAA). It allows a potential user of another system to be registered and have a secret key generated. Normally this registering will only take place after the potential user has been cleared by the LRAA. In the setup used in this system, when the user is cleared, he will receive a one-time identity and an initial authentication key (IAK). The user generates his own private/public key pair on his own application. The information received at the LRA allows him to make a certification request to the CA, and he will receive a certificate registered in the CA by contacting the CA.

3.4.2 CA Description The CA servers needed for this system will be run by one CAS installation. The internal components of a CAS are the CA server itself, a database and the RDS (Retrieve Data System) which is used for some of the communication with the LRA. As the heart of a high security facility, the CAS is protected against attacks from insiders with physically separated and hard-coded encrypted keys as well as against intruders. In particular the persons operating CAS have specific roles, designed to guarantee easy access to simple operations, combined with only highly trusted persons being able to do critical operations. In addition to the strict access control, an audit log of critical operations is stored in a database. Each entry in this audit log is strictly protected and entries in the audit log have consecutive numbers. Thus changes in the audit log done by directly accessing the database are immediately revealed. Two particularly trusted persons are necessary in order to carry out the most critical tasks. The System Administrator is responsible for ensuring that CAS and its environment are working properly. The Certification Authority Administrator handles registrations and certificates.

43

4 System Architecture Figure 4.1 shows the main components and their interactions in the system, namely: 













Agents (A) – section 4.1 Agent Server (AS) – section 4.2 Home Server (HS) – section 4.3 Secure Infrastructure with Secure Information Space (SIS) – sections 4.4 and 4.5 Data Analyzer Module (DAM) – section 4.7 Security Policy Editor (SPE) – section 4.8 Graphical User Interface (GUI) – section 4.10

Because the agent hops from server to server and on each server the secure infrastructure is installed, these two components are not shown in Figure 4.1. Rules Data Base (RDB), Security Policy Editor (SPE), and Graphical User Interface (GUI) modules are also described in following sections. The Data Analyzer Module (DAM) is a part of the Monitor as shown in Figure 4.1, but only in the surveillance use case. For the intrusion detection use case the DAM is part of each agent, which is very important for this work. Figure 4.1: System Main Components Interaction

Monitor Home Server (HS) 1

Agent Hop

Agent Server (AS) 1

RDB

DAM

SPE

User/Admin

GUI Browser

Agent Server (AS) 2

Login

44

Both use cases (surveillance and intrusion detection) can be easily supported with this architecture. The basic piece of the system architecture is a host (computer) with an installed agent platform. The agent platform accepts agents, executes them and sends them to another platform installed on a different host. Such a host can be a monitoring station as well as a regular node of a network, that is defended by an intrusion detection (ID) system, and hence is called Agent Server (AS). The whole installation consists of monitored stations with installed Agent Servers which are connected by a network. Mobile agents travel between these servers and fulfill their tasks. A simple AS can be augmented by additional components which extend its functionality. When a component, that allows a user to interact with the agent system (e.g., via a graphical user interface) is installed, the Agent Server is called a Home Server (HS). Under certain circumstances, the simple interface of the Home Server is not sufficient to perform the desired tasks (e.g., when agents need to be configured according to user wishes or when agent results need to be post-processed). Under such conditions, the Security Policy Editor (SPE) has to be inserted. The SPE allows a user to configure a security policy by selecting certain constraints that have to hold within the installation. These constraints (e.g., no more than three invalid login attempts are allowed per user before his account is disabled, NFS drives are not allowed to be exported world write-able and public) are stored in a Rule Database (RDB) and are handed over to agents in a suitable form. The agents are eventually started and roam the network. When they return, their results are checked against the RDB and when violations are detected, any misbehavior is reported to the user. In the special case of surveillance, the gathered data is transferred to the Data Analyzer Module (DAM) where the data are locally analyzed. In this case, a host that runs a HS as well as a SPE and DAM module is called a monitor. Such a setup is shown in Figure 4.1 above. It is preferable, that the communication between the user and the SPE (which can be also remote) is done in HTML or XML via a secure channel (e.g., HTTPS), so that a simple browser can be used as a GUI. As mentioned above, the DAM is not stationary located in the monitors for intrusion detection, but in the mobile agents themselves. Thus, the data analysis and the intrusion detection will be done fully distributed. In this system an agent consists of two parts: 1. Agent State – agent’s data together with management information (e.g. user id) 2. Agent Code - its source code as Java class file which is separately downloaded from a code base server. Only the state is transferred during the agent's hops from one host to the other host. After that, the code is downloaded from the Code Base Server (CBS) if is not locally available as shown in Figure 4.2. An agent is running on an Agent Server in a certain place. A place provides a run-time environment for an agent by allowing it to call certain functions. It can be described as an interface to the underlying operating system to access needed resources. Each place provides a function to transfer an agent to a different place located at the same agent server or at a different one. When an agent calls the transfer function of a place, its

45

execution is suspended and it is handed over to a communicator. A communicator is an Agent Server module, which is responsible for sending and receiving agents (respectively their states). A number of different communicators may exist, that support different transfer technologies (e.g., plain socket connections, RMI or Email). Each Home and Agent Server has at least one communicator present which the agent can use for jumping. The agent itself does not necessarily know about the mode of transport and most of the time will not be informed. When an agent state has been successfully transferred, the agent’s code is loaded from the agent’s code base. The code base itself consists of all locally available classes and references to available Code Base Servers. When code for an agent is locally available it is taken from there, otherwise loaded from a CBS. Sending an agent over the network has a number of security implications, which are touched in the following paragraphs and detailed in sections 4.4 and 4.5. The security management is done by the SIS module which utilizes a number of key pairs (public/private). An important aspect of the system is the transfer of an agent between different agent servers. Such a transfer is called hop and described below in Figure 4.2. Figure 4.2: Agent's Hop

Agent (A) State

Agent (A)

Place

Place

Agent Server (AS) 1

Agent Server (AS) 2

Communicator

Agent's Hop

SIS with key pair

Communicator SIS with key pair

Agent (A) Code

Code Base Server (CBS) 1 SIS with key pair

The following security considerations apply when an agent is transferred from one Agent Server to another: 1. The state of the agent could be modified or monitored during transit and 2. The agent’s code could be modified when sent from the CBS to the agent's destination host.

46

3. The information the agent carries can be confidential (security leaks in other systems) and can be tapped during transport. 4. Agents can, deliberately or by accident, not terminate as desired or consume too many resources. To combat these threats, a special communicator is designed. This communicator uses secure PKI for transferring agents, ensuring that 1, 2, 3 and 4 above do not happen. The professional PKI cryptography used in this system involves three Certification Authorities (CA’s), one certifying users, one certifying servers and one certifying code bases. The design is further made in a way that it allows the introduction of more CA’s, for example for more types of users and hosts, with different permissions. At present there exist no efficient methods of securing agents against attacks from hosts. Also, it seems unlikely that such methods will become available. Thus the security of the agents relies on the PKI, which makes sure that only trusted hosts and users are on the system. This is more explained in section 4.5. A host in this system distinguishes between three different types of users, namely Administrators, Regular users, Owners of agents (which are or are not able to log in directly). 





Usually, user interact with a monitor host only, but some users can be logged in monitored host too and such users can start an agent from there if they have a GUI there. The first two types of users can hold a certificate by a trusted CA (hence be certified) or not. The user accounts are administrated locally on each host and those permissions are saved in the local permissions table on each host, too. The differences between an administrator and a regular user are as follows: 1. An administrator has the right to install new users on the local host. 2. An administrator may add/remove places or communicators to/from an AS. 3. An administrator may modify the local permission table (explained below). The third type of users is administrated globally by a CA. A host will not execute an agent belonging to an uncertified user, and it will also not execute an agent coming from an uncertified host. Overall System Design In the following paragraphs, I will show the simplified overall system design and than I will try to explain the functionality and the whole architecture in detail. The components of the whole system with two agents and an intruder are illustrated in Figure 4.3.

Figure 4.3: System with two agents and an intruder 47

CAS CA 1

CA 2

Host 1

Host 2

LRA

GUI

Host 3 Intruder

Legende:

Agent's hop

Local User

Communication Users' registration

System User

The missing arrow between the hosts 1 and 3 illustrates the configurability of the system. Some jumps are not allowed in this configuration, which is suitable for monitoring of hosts located in different organizations for example. Each component is described below. Figure 4.4 shows the overall system architecture in detail. The agent platform used for further discussion is the Secure Gypsy (SG) platform. Whereas the fundamentals of this system (e.g., the type of data access, the processing of the data, the overall system security) are independent of this platform, the actual implementation has to be integrated into existing Gypsy system. Thus, the conceptual framework and several features have been adopted. In the first version of this system, each Agent Server is equipped with a SystemTCP/IP Communicator to transfer mobile agents and System Places to provide functions for surveillance and intrusion detection. The communicators, places and Agent Servers are described in detail in section 4.2 and the Home Server in section 4.3. This system is independent from the mobile agent platform because the user and the data managing modules (i.e., SPE) are separated from SG platform by the Home Server (HS)-API. This HS-API can be used to launch agents as well as receive their results in a basic way. Agents can roam the network and return to their home server (to special user places) even when the user is offline and results can be claimed later on (via HS-API). An Agent Server needs a way to access data needed by running agents. This should be

48

done in an OS independent fashion and is realized by the Data Manager (DM). The DM is an embedded agent and is responsible to directly access system resources (e.g., log files, processes) and to provide access for agents via the generic Data Manager (DM)API. This module can also pre-process log data, monitor its integrity or collect data itself. A detailed description is given in section 4.6. The user can access the system with a GUI (preferable a browser). The GUI which is described in detail in section 4.10 should provide at least support for launching agents and viewing its results. Additionally, the SPE and the DAM can be manageable via the same GUI. The communication between the SPE and GUI is secured and provides a login facility to authenticate users. The SPE uses a RDB to store the current security policies. The SPE is described in section 4.8 and the DAM is described in section 4.7. The SPE module translates all security rules into a suitable form. After that, the rules are checked against the results obtained by agents. Any violations are reported and forwarded as results to the user GUI. Both components the SPE and the DAM in the surveillance use case communicate with Home Servers via the HS API and this communication is explained in detail in Chapter 5. The SPE uses the current security policy to configure and start agents with correct constraints based on given security rules. The SPE uses the HS-API to receive agent results. As mentioned earlier, this system consists of the public key infrastructure (PKI), realized by Certification Authorities (CA), that are grouped together as a Certification Authority System (CAS). The encryption/decryption of agents and the validation of certificates is done locally by the Secure Information Space (SIS) which communicates with the Agent Servers via the Agent Server (AS)-API. This allows the system to remain independent of the current PKI implementation. The CAS-API manages the communication between the SIS and the CAS and the Local Registration Authority (LRA) is used by administrators to obtain certificates for users. This part of the system is provided by external software and can be changed to any PKI system which has the same or similar functionality as described in this work.

49

Figure 4.4: Overall System Architecture

Host 5 Host

EE API

SIS

Host 1

AS API

System User Places

System

local perm. table

TCP/IP Comm.

Home Server (HS)

CAS

5 CA

CA

LRA

1 CAS API

Host 2

HS API

Code Base

SIS

RDB

local

perm. table

A-Part 2

AS API

Code Base Server 1 SPE

DAM

Keys certification

https

R/W

Download agent code [Part 2 .jar]

GUI Browser

Admin

Agent (A) CAS API

[Part 1: State, ID…]

SIS

Host 3 Login

System Place

AS API

A API

Agent SM

Agent Server (AS) 1 User/Admin

local

perm. table

Host Spec.

System TCP/IP Communicator

DM API

Data

Data Manager

R/W Admin OS Port Sensor

Legend:

OS File

OS File

Agent's Hop Only for surveillance use case

Two way interaction

Host 4



System TCP/IP Communicator

Agent Server (AS) 2

50

4.1 Agents (A) Agents are computational entities which act on behalf of other entities in an autonomous fashion. They perform their activities with some level of pro-activity and/or reactivity and exhibit some degree of the key attributes of learning, cooperation and mobility. Mobile agents can move from one location to another, perform some work and react on some environmental conditions and eventually deliver useful results to the user. This system supports three different types of SecureGypsy (SG) agents: 



one-hopSG agents multi-hopSG agents embeddedSG agents _



One-hopSG agents can hop just once from one server to another. When these special agents arrive at the destination server, they will be started and fulfill their task until they are upgraded, removed or reach their deadline. Because their travel is limited, they can send only a simple status report back to their owner. Places and communicators are examples of one-hop agents in this system because during creation they can jump to the remote host and they can be administrated there remote. Multi-hopSG agents are the default agent model in this system. They have the ability to hop between different locations on the basis of a fixed present travel list or search for new, interesting locations to visit on their route. This system uses multi-hop agents to implement the surveillance and intrusion detection functionality. These agents are equipped with algorithms to scan hosts for vulnerabilities and security policy violations and return results back to Home Servers. There are two immediate subclasses of multihop agents that are interesting in this system, namely the Surveillance Agent and the Intrusion Detection (ID) Agent. Each class of agents can in turn be subclassed to implement agents which offer extended functionality. EmbeddedSG agents also fulfill specific tasks but they cannot travel on their own volition and are used as stationary agents on a server or are plugged into a mobile agent. An example of an embedded agent is the Data Manager, explained detailed in section 4.6, which manages the access to local resources (e.g., files or processes). Figure 4.5 gives an overview of the main types of agents in this system.

51

Figure 4.5: Main types of agents

SG Agent

Multi-hopSGAgent

One-hopSGAgent

EmbeddedSGAgent

Place

Communicator

Data Manager

System Communicator

System Place User Place.

Surveillance Agents Log Place

SystemTCP/IP Communicator

Legend:

Intrusion Detection Agents

is an agent of type

In the current version, this system uses RMI communicators but a TCP/IP communicator using sockets could be implemented as well. Also, a system place for executing agents on each Agent Server (see section 4.2) and a user place on each Home Server (see section 4.3) has to be installed. It is very important for all task specific agents (surveillance agents and intrusion detection agents) that they get properly configured via the HS-API. The SPE module is responsible for starting agents with the correct parameters on user places by means of the user communicator that means to extract constraints from the RDB and pass them to the agent system (over the HS-API). Results are returned (over the HS-API) directly to a user GUI or to the DAM module for surveillance.

4.1.1 Design of SG Agents A mobile agent is an object that visits agent-enabled servers in a computer network. A mobile agent in this system is a Java Runnable object, therefore it can be executed as a

52

dedicated thread at special places. As mentioned above, the Secure Gypsy (SG) environment supports three different types of agents: one-hopSG, multi-hopSG, and embeddedSG agents. For all these three types of agent exists one SGAgent superclass which extends ThreadAgent and which in turn extends Java.lang.Thread. Figure 4.6 show the basic agent interfaces and classes in a UML class diagram. All onehopSG and multi-hopSG agents have a SGAgentInfo object containing attributes that describe the agent, its requirements and purpose to the server. These include a unique agent identifier, the version number, the manufacturer, the creation time, the owner name, the email address of the owner, the URI of the agent's codebase and in the case of the multi-hopSG agents, the URI of the home server of the owner. Because the travel of one-hopSG agents is limited, they can send only a simple status report back to the owner. Thus they do not need a Result object. One-hopSG agents are mainly used by the remote administration tool to transfer new functionality or upgrade existing functionality on the server. The SystemPlace provides execution environment for the SGAgents. As mentioned above, SGAgent is a superclass of all multi-hopSG agents which defines the common functionality of each agent in this system. Multi-hopSG agents have a route vector containing a list of locations to visit and a history log to keep track of the status and failures during the agent's travel. Each multi-hopSG agent also has a Result object for storing the results obtained during its travel. The result is organized in a hierarchical tree containing the agent's ID, the server URI, the place on the server and the obtained result itself. The result can be interpreted by a special embeddedSG agent called ResultWriter. EmbeddedSG agents are used as stationary agents on a server (e.g., a result writer) or are plugged into a mobile agent. An example of an embedded agent is a DataManager which is described in section 4.6.

53

Figure 4.6: SG Agents Design as UML Class Diagram

java.lang.Thread



ThreadAgent

java.lang.Runnable



SGAgent

java.io.serializable

EmbeddedSGAgent

OneHopSGAgent

1

1

SGAgentInfo

1

1

MultiHopSGAgent 1

1

1 Result ResultWriter

Place

SystemPlace

1

SystemIDAgent

SystemSurveillanceAgent

DataManager

4.2 Agent Server (AS) An Agent Server is the basic piece of the agent platform and is implemented as a software component written in Java that provides the necessary infrastructure for agents. Each host that wants to participate in the intrusion detection or surveillance system has to run at least one Agent Server. If a host has more than one agent server, this host is an overlapping point for more than one such secure systems in this or more networks.

54

The Agent Server provides the necessary functionality for executing an agent, transferring it to another agent server and making sure to secure the local host against evil agents as well as protecting agents from network attacks while they are transferred. As shown in Figure 4.7, an Agent Server (AS) provides agents with places, communicators and an Agent Security Manager (ASM), which are each explained below in more detail. Places form the environment in which agents are executed, while Communicators are needed to transfer an agent from one agent server to another one. Different places can offer different services to agents, so some agents require special places to run. Places and communicators are realized as special types of agents, which - as described later - cannot move once they are set up. Figure 4.7: Agent Server (AS) Host

DB

SMTP

DB

SMTP

Agent Communicator ASM

receive() transfer()

Files

Files

Places Special Place

Common Place

Log Place

Host Services

Admin Communicator receive() removePlace() removeComm() getStatus() shutdown()

Agent AgentServer Server(AS) (AS)

4.2.1 Agent Security Manager (ASM) The Agent Security Manager (ASM) has to care for the security of the system. Especially attacks from agents, which are directed against the Agent Server or the underlying host, have to be prevented. This is achieved by defining the Agent Security Manager to be the security manager for the Java Runtime Environment (JRE) that runs the Agent Server. The ASM checks whether sensitive calls of a user agent are legitimate or not. User agents are not allowed to read or write files, to access the network device or start processes directly. All these OS functions are offered and performed by the Data Manager, which is explained later in Section 4.6. In the ASM, a check is made every time an agent tries to perform a security critical operation and it is determined whether the calling thread is allowed to continue the operation (e.g., file read operation) or not. The permissions for each agent are assigned when the agent arrives at a host, so the ASM must be configurable at runtime. The permissions are determined according to the corresponding entry in the local permission table, which is retrieved by the SIS module immediately after an agent has arrived at a host. When an agent arrives at an Agent Server (i.e., at its communicator), it

55

is started in a new thread at a suitable place. At the same time, the thread’s (i.e., agent’s) permissions are transmitted to the ASM. For more information about this system's part see section 4.5

4.2.2 Communicator Communicators have the ability to transfer agents from one Agent Server to another. This is done by first serializing the agent’s state (marshalling) and optionally encrypting the resulting byte-stream. It is important to mention for security issues, that only the state of the agent is marshalled and encrypted, not the whole class (with its bytecode). In this system, the encryption of the state is done using the SIS, which is explained in section 4.4 and can be seen as a black box from the communicator's point of view. The communicator sends the resulting bytestream to another host, where it is received by another communicator of the same type. The type of the actual transmission may vary (e.g., TCP/IP, RMI, Email), but for every type of communication a dedicated communicator has to exist. Clearly, the sending and receiving communicators have to be compatible (of the same type). After the receiving communicator has fully received the agent state, the agent state has to be decrypted using SIS. After that it is determined whether the agent-class is available locally or not. In the latter case the signed class has to be downloaded from the Code Base Server (CBS). Next, the permissions of the agent are determined. They are requested from SIS using the methods supplied by the AS-API, and passed to the Agent Security Manager. Finally, the agent is restored and handed to a suitable place where it can continue its operation. Each server has one or more different communicators. Because of the component-based design it is possible to implement new communicators for upcoming transfer technologies. These new communicators can be sent to upgrade running servers dynamically, which allows servers to support new transfer technologies without affecting their running places and other agents. The following communicators must be implemented in this system:







A SystemTCP/IPCommunicator for secure transfer of agents integrated with the PKI. Agent Communicators provide the basic functionality for sending a regular (multi-hop SG) agent from one Agent Server to a remote location. A SystemUserCommunicator used for launching – and in particular for constructing – agents from commands from the SPE and scripts. It is further used for returning data to the SPE. User Communicators are used to interact with the agent system and are installed at Home Servers (see next section for details). They can be used to transmit result data from an agent to its user and to launch new agents. A SystemAdminCommunicator for installing places and communicators in this system. Admin Communicators enable the system administrator to maintain a server from a remote place, that means to create/remove new places and new communicators. This is the reason why the communicators and places are one-hop agents (as shown in Figure 4.8). A SystemAdminCommunicator in this system is used also for adding and removing users, generating keys, getting certificates etc.

56

Figure 4.8: Communicators Design as UML Class Diagram





java.rmi.Remote

java.io.serializable







Admin

User

Communicator

Communicator

SGAgent

OneHop SGAgent

Communicator

SystemAdmin

SystemUser

System

TCP/IP

Communicator

Communicator

Communicator

Communicator

SystemTCP/IP Communicator

4.2.3 Place Places provide agents with an interface to the underlying agent system, databases, and all other operating system services. The Agent Server (AS) hands an incoming agent (an agent transferred from another Agent Server) to the place. The place offers its services to the mobile agent by giving a handle (reference to itself) to the mobile agent, which can then be used to call methods of the place. The place runs the agent in a separate thread. This allows for concurrency and provides the services of the place simultaneously to an arbitrary number of agents. When an agent has finished its work, it can request the place to start a transfer to a different place (by calling a transfer method, which has to be supported by every place). The place hands the agent to a communicator, which performs the transmission of the agent, as described in the section above. All places are shown with an UML Class Diagram in Figure 4.9 Places Design. Each place is one-hop SG agent and derives from the abstract class Place. Following places exist:

57











SystemReturnPlace: A place for executing returned agents in the User GUI. LogPlace: A default special place which is dedicated for logging the activity of agents. CommonPlace: A default place for executing agents. SystemUserPlace: A place for storing and managing all agents of a special user until the user front end wants to receive them. SystemPlace: A place for executing agents of this system.

58

Figure 4.9: Places Design as UML Class Diagram

java.lang.Thread

ThreadAgent

java.io.runnable java.lang.Runnable

SGAgent

java.io.serializable

OneHopSGAgent

Place

SystemPlace

SystemReturnPlace

SystemUserPlace

CommonPlace

LogPlace

4.3 Home Server (HS) A Home Server (HS) is a special Agent Server, which is permanently connected to the network and which has two main duties. 1. It allows agents that finished their work to return to a special place (called User Place). This place is responsible for storing the agents, but does not execute them. Such a User Place should be set up for each agent system user, where returning agents wait to be retrieved by their owner. The incoming agents are stored and the user can optionally be notified by email or SMS message that the agent has returned, if it is configured to do so (see Figure 4.10).

59

2. The Home Server provides an interface for the user GUI (or other components like SPE or DAM) to access returned agents or to launch new ones (namely via the HSAPI implemented by the User Communicator). Home servers support detached computing. This means that the user GUI might be disconnected from the network, while the agents are performing their work. After the agent has finished its work, it waits at the User Place until queried by the user. Figure 4.10: Home Server (HS) Host

DB

Agent Communicator receive() transfer()

SMTP DB

SMTP SMS

Files

User Communicator

Places

forward() status() launch()

User Place store() notify()

Host Services

Admin Communicator receive() removePlace() removeComm() getStatus() shutdown()

Home Agent Server (HS) (AS)

The system uses a user GUI to start agents and display their results The user chooses the task to be fulfilled and fills out the corresponding forms, including special constraints and the name of the desired starting place. Then the agent is launched. When it returns to the HS, it is transferred to a special Return Place. This place assigns the agent a handle to a Result, which can interpret, format, and store the results and the history log of an agent. SG provides a way to store log messages in a centralized way by providing a log server. A log server is a distinguished server to which all Log Messengers Agents can be forwarded. There, all log messages can be stored in a database for further evaluation and system analysis. Each entry contains the log message itself with additional data like date and time as well as the server URL where the log message originated.

4.4 Secure Information Space (SIS) The Secure Information Space (SIS) is a module, which is used by the system communicator for keeping track of the security of a host. It mainly has to determine which agents are allowed to run and with which permissions. The CAS is supposed to administrate which users and hosts are allowed to launch agents. Thus the main functionality of the SIS is to administrate which CAs the host trusts and to acquire information from the CAs about users and servers who want to launch agents on that host.

60

A special feature of SG is that not the code of an agent, but only its identity and a location, where the code is stored, is transmitted to the host when the agent jumps. Thus the SIS must administer a list of Code Base Servers. Finally, the SIS must administrate which users can log on to the host directly, i.e. without sending multi-hop agents. The types of users have been categorized into System Administrators and Users. This is similar to what is used on most networks today, but less than what is used on systems, which are protected against attacks from insiders. A natural extension to Gypsy would be to introduce security administrators, of which two would be needed in order to change security settings. The SIS provides three APIs: AS-API for a local agent communicator implementing the external protocols necessary for the agent to jump. An API for the application GUI, for keeping track of users and for administrative functions like setting up new users and generating a new key. CAS-API for handling the communication with the CAs in the system. 





At the lowest level of SIS is the static encrypted information about the security of the host. This includes: A list of certificates of users of the host. A list of certificates of the system administrators of the host. A secret key of the host. A certificate of the host corresponding to the secret key. A certificate of a CA certifying the hosts in the system. A certificate of a CA certifying the users of the system (users or system administrators on all hosts in the system). A certificate of a CA certifying the providers of agent code of the system. (Code Base servers.) A password, which allows the host to use its secret key. A security configuration file. The current Gypsy security configuration file will be modified such that it contains information about specific permissions. In particular it must contain information about CAs. (Formatted text file.) A static session number, counting the number of startups of the host. (Free text.) A random seed. 





















The secret key is protected by a password. The password is allowed to reside in plain text in memory, but not on disk. When stored on disk, the password is protected under a key hard coded into the host. This is not unbreakable, but up to the standard of what is possible without using special hardware or requiring a user to be present when the system starts up. In order to avoid attacks of the type that certificates are added or replaced, all of the above structures are strictly protected by the Message Authentication Code (MAC) [Mac00]. MAC is a bit string, that is a function of both data (either plaintext or ciphertext) and a secret key. That is attached to the data in order to allow data authentication. 61

Note: The function used to generate the MAC must be a one-way function, and the data associated with an authenticated message allowing a receiver to verify the integrity of the message. When the host starts up, it loads the static security related information and checks this MAC. The MAC is updated each time a permanent change like the introduction of a new user or a new CA takes place. (Actually each time the host starts up, since the static session number is increased.) The SIS generates a log file. The entries are numbered and protected by MAC values based on the secret key of the host. (Thus missing or modified entries can be detected.) During its creation SIS tries to read the information it needs in order to initialize. If it can read it, it verifies the information. If the verification is successful it updates the information on disk. If it can’t read the information or verification fails, an exception is thrown depending on the type of problem. If an exception is thrown, the application is supposed to provide a GUI (on the local machine) for administrator's access. The data of the SIS module will only be damaged itself if the computer crashes in the middle of an operation, which changes static security settings. During normal operation the computer can be shut off brutally without damaging the static data of the SIS module. It is possible to improve the way of storage with an automatic backup facility, such that the SIS will never be hurt just because the computer crashes or the host is terminated abnormally. As explained in the next section, the SIS administrates two sorts of security information, dynamic and static. The mechanisms for handling these sorts of information are quite different. An Agent Security Manager (ASM) maintains a small database (only in memory) of running threads of agents and their permissions. The methods of the ASM mainly provide simple checks for whether either the current thread or a specific agent or user has a particular permission. The role of the ASM is to handle the thread and agent specific dynamic data in the SIS, which are closely integrated with Gypsy itself. The methods of SIS have a significant overhead. This is mainly because they involve a high number of cryptographic calculations. (Seed and MAC must for example always be computed and various signatures must be generated and verified.) This is no big problem since they are very rare events. The methods of the ASM must however be optimized for speed. Jumping is a special case because it involves a considerable overhead, but in the framework should preferably be a common operation. The jumping mechanisms in this system are optimized for security and flexibility. If they should also be optimized for speed, protocols for negotiating shared keys between hosts with much communication should be included. In this way secure highways can be made available on the most used jumping paths. Other possibilities are to increase average performance by using the possibilities in the PKIX protocols [Pkix00] for bundling requests to CAs and other hosts and to buffer the results of common security checks. All these optimizations are possible but non-essential and rather complicated to manage and have therefore been left out. When an agent jumps, it works in the following way: The host on which the agent is located issues a certification request to the host to which the agent is jumping. The host

62

to which the agent jumps then checks the request. The most essential steps taken by host to which the agent is sent, are the following: Decrypt the request, Check that the request is well formed, Get the certificate of the sending host. This is by sending an instant request to the host CA, Check the signature on the certificate of the sending host, Check the signature of the request, 









If one of the above steps fails, the request is rejected. Otherwise the request is good enough for an agent to run on the host. It must however be checked, which permissions can be granted. Because the permissions always depend on user, the following additional steps are therefore carried out: Check the user, Get the certificate of the user. 



4.5 Security Model In this section, I will first describe the complex security interactions in Secure Gypsy (SG) between an Agent Server (AS), consisting of places, communicators, the agent security manager (ASM) and the SIS, and then the structure of permissions will be described. Also, some ways of configuring the security features are given in this section (e.g., the ideal configuration for the intrusion detection which will be used for the system pilot). At the end of this section, I will describe all security protocols which are used in this system. Securing an agent platform such that the security structure models the reality is a complicated task, and in particular it is important that an agent or a normal user will not be exposed to the full complexity. For example, it is not intended to have agents check signatures. If that is needed, it is a separate subject which can be added, but this is beyond the scope of this work. Seen from the point of view of a system agent or place, jumping of an agent is a simple task, which can be carried out by calling a single method with only few parameters. However, seen from the point of view of the system communicator, which makes use of the SIS module, the world is rather complicated. Agents are coming in from other hosts or request to be transferred to remote hosts. Such transfers must be carried out simultaneously and securely. The main role of the communicator itself is to synchronize these actions, to call the SIS module, which implements the secure protocols, to transfer permissions between the SIS module and the agent security manager and to provide the interface between the secure protocols and the Gypsy platform. The integration of the ThreadAgent class into Gypsy involves that the interface GypsyAgent must no longer extend the runnable interface. Instead the Gypsy classes TaskSpecificAgent and EmbeddedAgent must extend the ThreadAgent. This also means that the way threads are created is changed. Threads in Secure Gypsy (SG) are usually created in a communicator and not in a place.

63

When an agent calls a function, the System place calls the corresponding method in the ASM in order to check whether an agent is allowed to call that specific function. The class-loader will have to be cryptographically secured. In the case that the public/private key signatures are too slow for this security, the cryptographic security can be built on symmetric keys. With the current structure of SG and the structure of the requests for transferring agents, it will not be difficult to incorporate this security into my system. When the agent terminates or jumps, the instance of the ThreadAgent will be destroyed. In Figure 4.11 the overview of the system security is shown. Figure 4.11: Overview of the system security as UML Class Diagram

ThreadAgent

Place

sends agent to 1

gets agent from

gets agent SISpermission 1

1..* Communicator 1 1

ASM ASMAgentPer mission class

1

contains generated by

1 1

sets agent permissions

SIS SISAgentPer mission class

So, compared to Gypsy, in the secured version the thread of an agent is constructed by the communicator. In fact all agents are sub-classes of the class ThreadAgent. The dynamic permissions are held by the ThreadAgent class itself, in such a way that they cannot be changed. The permissions are set when the ThreadAgent class is constructed. The methods of the ASM call a method of the ThreadAgent class in order to check the permissions. The normal calling mechanism is that the agent, running in its ThreadAgent thread, calls a method of the place. The place method calls a function in the ASM. The method of the ASM gets the current thread and calls a ThreadAgent method in order to check that it

64

has the relevant permission. (Note that this is constant time for a check, independent of the number of agents.) The communicator only works with serialized objects, except for the permissions structure, which is handed over directly. In particular the way of serializing agents and the particular secure protocols used are replaceable. The communication is done using a three-step protocol. A request for transfer is sent from the host where the agent is currently located to the host, where the agent wants to jump. The state of the agent as well as the permissions it wants to have is included in the request. The receiving host replies with a response, which can be affirmative or negative. If it is affirmative the agent has been allowed to run on the receiving host. Finally, the sending host sends a confirmation to the receiving host, after which the session is closed. The SIS module has access to a number of certificates and specific access rights, which allows it to distinguish between various types of users and other hosts and to do additional security checks. The SIS module manages the long term security like users and permissions. In particular it has a complicated initialization, which makes sure that certificates and keys stored on disk are difficult to modify or replace. Further it has methods for introducing new users, removing users and setting permissions. The main task of the SIS module is though to handle requests for jumping of agents coming in from other hosts and to check whether they can be allowed in. Similarly, it creates requests for transferring agents currently running on its own host. The SIS must then create a truthful request with maximal chance of being accepted by the next host. In addition to communicating by sending and receiving agents, the system host allows users to log on from the local machine and by remote logon. The SIS module must provide functionality for securing that as well. Securing the remote logon will not be prioritized in this system as long as the system design does not prevent the integration with a system for remote logon.

4.5.1 The Structure of Permissions The security builds on permissions. Permissions are of the type that an agent coming from a particular host and which was originally launched by a particular user is allowed to call a particular function. Both the names of host and the user can be replaced by their certificates. The permissions of an agent during its execution in a place on an Agent Server are determined by an entry in the local permission table, which resides on each host. The permissions (and hence the entry in the local permission table) depend on the host, where the agent came from and on the owner of the agent. Each agent carries a user identification with it. From that the system communicator can get information on the user from a CA, which is used to determine permissions. Each resource provides access restrictions (access control list), depending on the agent’s origin and the agent’s owner. The permissions are read from the local permission table by the SIS module and are enforced by the ASM. When an agent arrives at a host, the communicator uses the SIS to check the agent’s certificates (source host and user ID) and obtains the agent’s

65

permissions, which are handed over to the ASM. As the agent is executed, each sensitive operation is checked by the ASM and only allowed, when sufficient permissions are available. A more detailed view of the connections and protocols between the CAS and the Agent Server (communicators, ASM) is given in sub-section 4.5.2. In addition to the requirement that the user and host launching the agent must be trusted, also the code must come from a trusted code-base. This trust is established by having a CA certifying class files that are stored at code-bases. Permissions granted to particular users are normally granted to the users who can log in directly, whereas permissions granted to everybody certified by a CA are normally granted to everybody on a network, who can launch agents. This is however only a guideline, not a rule, which will be implemented. Thus the system can be configured with users, who can log in directly, but have only very few permissions, as well as with particular remote users with strong permissions. Every user, every host and every code-base must have a certificate and a private key. However the permissions given on individual basis make it possible to have users, whose certificates are not signed by a trusted CA. This makes sense mainly because it allows for system administrators of hosts, which can administrate the local host without being able to launch agents. Dynamic versus Static Permissions In addition to the static security information administrated by the SIS, the host administrates dynamic permissions. The dynamic permissions are of the type that a particular thread (thread of an agent) is allowed to call a particular function. The dynamic permissions are generated from the static permissions at the time when an agent is installed. The dynamic permissions, which are for each agent static in memory are handled by the ASM. Permissions for a thread are removed together with the thread when an agent terminates or jumps. Figure 4.12: Calling mechanisms related to permissions

SystemTCP/IP Communicator AS-API SIS

System Place Agent Security Manager (ASM)

The systemTCP/IP communicator receives permissions for an agent from SIS when the agent arrives. These permissions are handed over to the agent security manager before the agent is sent to its system place. The system place calls the ASM in order to check whether an agent is allowed to call a function. When the agent jumps, the permissions of 66

the agent are removed again by a call from the systemTCP/IP communicator to the agent security manager. This is shown in Figure 4.12. In fact the dynamic permissions are linked onto the thread of the agent, such that they are fast to get to and disappear together with the agent. As described above, the permissions for hosts and users can be separated into static and dynamic permissions. The static permissions are stored in a configuration file and are loaded by the SIS (Secure Information System) module by startup of the host. When an agent arrives, the static permissions are converted into dynamic permissions, which are of the type that a particular agent is allowed to call particular functions. Configuration of Permissions The syntax of the static permissions for agents is stored in a security configuration file administrated by SIS as shown in Figure 4.13. This file is read by the SIS module at startup. When a thread of an agent is installed, the dynamic permissions for the thread are generated. They are called “dynamic permissions” because they are removed together with the agent when an agent terminates or jumps. The part of the security configuration file used by SIS starts with a tag [Agent Permissions]. For each new line in the scope of that tag there will be a permission terminated by a semicolon. This is a common syntax which is usual in configuration files and which allows the SIS configuration file to also contain other configuration data. Figure 4.13: The general syntax for agent permissions

[Agent Permissions]

Agent coming from

this host host “X1” a host certified by “X1”

user “X2” owned by

a user certified by “X2” a local user a local system Adm.

may call function “X3”;

Later it will be possible to extend this syntax with e.g., IP-address and to apply a greater subset of the Rfc822 standard [Rfc0822], which specifies the format, mainly the headers, of Internet e-mail messages. An example of the configuration file with agent permissions: Figure 4.14 shows one example entry in this permissions file.

67

Figure 4.14: An example of the configuration file with agent permissions [Agent Permissions] Agent coming from host “co=commonNameHost” owned by a user certified by “co=commonNameCA” may call function “getNTEventlog”; Agent coming from this host owned by user “co=commonNameUser” may call function “getHostlog”; …

The whole configuration of the system security is done partly by the CAs and partly by the administrators of the hosts. A configuration of a system depends on the correct users, code bases and hosts certified by the correct CAs. Ideal Security Configuration for Intrusion Detection. The ideal security configuration for intrusion detection allows all system tasks to be carried out. Thus pilot sites, which want to use all of the system features, can configure themselves for intrusion detection as following: 







All hosts are certified by the same host CA, All users of monitoring hosts are certified by the same user CA. (Though having more user CAs is no problem.), At least some monitoring stations are also code-bases, All hosts trust the same CA,

The reason why this model is the best is that models for intrusion detection, in which agent based systems have a competitive advantage compared to traditional systems, must be allowed. In some cases greater efficiency can be achieved by agents, which jump directly between hosts being monitored rather than only between a central host and the hosts being monitored.

4.5.2 System security protocols Protocols for communication between SG and CAS The protocols are dictated by the Certification Authority Space (CAS), which is an existing system from third party. In this system the PKIX standards for Internet X.509 Public Key Infrastructure (Rfc2459, Rfc2510, Rfc2511) [Pkix00] are used together with the PKCS#10 standard for certification request syntax [Pkcs10]. Messages are in addition wrapped by a simple PKIX CMP (Internet X.509 Public Key Infrastructure Certificate Management Protocols) header [Pkixcmp00]. The CAS supports traditional Certificate Revocation Lists (CRLs). Certificates can expire or a CA can revoke them. Thus, CAS maintains CRLs and instant certificates. As the status of certificates may change, it is imperative that a user can get the correct status of the certificates at any time. This is e.g., done by making a query to the CA where the

68

certificate is published. As the answer to such a query must be digitally signed for security reasons, it will be necessary to verify two signatures in order to get the correct status of a certificate: the signature on the response and the signature on the certificate. Instant certificates support this by allowing a CA to resign a certificate at the time of the query - instantly. An extension in the certificates defines the status of the certificates and the exact time this status is guaranteed. Given these two alternatives, the instant certificate protocol is easier to handle than the revocation lists, has better scalability properties and gives a higher degree of security. The higher degree of security is obtained because the reaction time can be shorter. The alternative to using instant certificates is to use the PKIX Online Certificate Status Protocol [OCSP00], which differs from the instant certificates by that only the status of a certificate, rather than the certificate itself, is returned by the CA on request. Some of the protocols between agent servers take advantage of the instant certificate protocol in order to achieve more efficient solutions than could be achieved with OCSP alone. This is because a certificate neither needs to be signed nor sent anywhere unless it is actually needed. Thus what could look like additional overhead at a first glance turns out to be important for reducing the overhead in agent servers. Protocols for communication between the CAS and the LRA’s The protocols used for communication between the CAS and the LRA’s are out of scope since nothing from my system software uses those protocols. Protocols for transmission of Agents In this sub-section I define a protocol for sending encrypted agents over an insecure network. The protocol builds on PKCS#7 standard for Cryptographic Message Syntax [Pkcs7], which wraps messages for doing standard operations to agents. Sophisticated messages can be implemented by launching agents, so the purpose of this protocol is only to make available the most basic operations for the system administration in a standardized way. Not all of the protocols will be implemented. Launching and revoking agents and pinging hosts are the most basic which should for sure be implemented. The PKCS#7 messages can be signed and/or encrypted. It is required that the authentication and confidentiality is ensured by the PKCS#7 wrapper. However it is not required that this in ensured in any particular way. For example a package can be signed, but if a secret shared symmetric key is used for encryption, which proves the identity of the sender, it does not need to be signed. In this way it is possible to increase the performance by reducing the number of public/private key operations, when shared secret symmetric keys are available. Also some of the message types do not always need to be encrypted. Those, which do not contain confidential data do in general not need to be encrypted. However, a message, which is not encrypted, must always be signed. Hosts communicate in the way that one host opens a session by sending an AgentRequest. The host addressed will then return an AgentResponse. This request/response pattern may be repeated a number of times after which the sender of a

69

requests sends a confirmation. The addressed host will for some protocols reply with a response to the confirmation and the session is closed.

4.6 Data Manager (DM) The Data Manager (DM) is an embedded agent and this system component is the only component, which is operating system dependent, because it has to access to system resources. The resources can be log files, database records, results of scanning ports etc. The main functionality of the data manager is to provide methods for accessing and modifying data. In particular it provides the encapsulation of the methods used for accessing and modifying data independently of the mobile agent platform and from the particular type of security used. The DM provides a DM-API for accessing system dependent data. In some cases one may choose to let the data manager filter and convert data to a system independent format. It must be also possible to program new agents to handle new tasks. Significant parts of the data manager may however be platform independent. For example the same application program may be available on several platforms and produce platform independent log files. In order to protect sensitive data, which is already processed or accessed, against intruders and attacks from insiders, they ideally have to be encrypted and signed. Further it has to be logged securely which data are available. Such functionality will not be included in the system pilot but will be briefly discussed. When the system is maintained by a third party, then the responsibility of possible breaks in security is difficult to place. Storing data in encrypted form which the provider of the system support cannot decrypt solves this problem. For most types of data, however, less security is necessary. Therefore the natural way to arrange things is to have the system administrator of the host determine which data must be stored securely. Further, some data can only be read by the data manager, not modified. Some data can only be passed over to an agent in encrypted form because the agent can not be allowed to bring it to other hosts in plain text. In special cases an agent can be allowed to process information in plain text but can not be allowed to travel further with the result. In such cases the data manager must have security settings changed when an agent accesses data of this kind. The agent should be allowed to jump back to its home only, and only in encrypted form. The system modules are designed such that nothing prevents the addition of this feature, but it will not be implemented. If the data manager manages information stored in encrypted form, the encryption key must not be the same as the signature key of the host. The Design of the Data Manager (DM) As mentioned earlier, the Data Manager (DM) extends an embedded agent and implements a DataManagerAPI. So, we can have more than one implementation of DM and if we want to use Gypsy as mobile agent platform then DM should extend an 70

embedded agent. DM collects log data and at the same time scans some ports, remote logins per ftp or telnet for intrusion, and just writes this data in some scan files. These files are transferred to monitor stations in system use case surveillance or they are analyzed immediately at this host in the intrusion detection system use case. All this Data Manager functionality is configurable with a DataManagerConf file. This is shown in Figure 4.15. Figure 4.15: Design of Data Manager as UML Class Diagram

java.lang.Runnable

java.io.serializable

GypsyAgent

EmbeddedAgent DataManagerAPI

ResultWriter DataManager

DataManagerConf 1

HTMLResultWriter

1

SystemOutResultWriter

71

4.7 Data Analyzer Module (DAM) The data analysis supposes the use of resources such as relations, classifications and production rules called “if-then” rules. The “if-then” rules basically consist of the following grammar. Grammar: ConditionList: Condition:

’if’ ConditionList ‘then’ ActionList Condition

|

(Condition ‘and’ | ‘or’ Condition)

Asset Operator Value

The security policy is a set of such rules and directives, which describes the allowed and forbidden activities for different users in respect to certain assets. Thus, the data analysis in this system is based on these security policy rules and has three main stages: 1. The initial stage of the data analysis task is the detailed description of any information which might be monitored operationally and which might be of some interest for data security reasons. 2. Next is the archiving of existing knowledge – systems, structures, hacking... 3. Finally the data analysis algorithms are to be designed and realized by the approach according to the two basic tasks check and prevention. As mentioned in previous chapters, the intelligence of this system is stored in the DAM module. Location of DAM depends on system use case: 1. central on the monitors for surveillance system use case and 2. mobile in the agent code itself for the intrusion detection system use case. For the first case, DAM functionality supports the two way communication to the outside world through the SPE module and the HS API receiving and/or formulating data analysis tasks and sending/informing the data collecting requests, data analysis results and the decisions related to the security policy. The internal functionality consists of the following main blocks: translation of security related design time and operational information into the algorithmic form, real data analysis by means of pattern recognition, knowledge analysis and extraction, and the decision block which are completing the DAM tasks. For the second case, there were several considerations on how to move the intelligent data analysis and the above tasks to the hosts. I have taken the approach that the rules must be parsed into classes representing a compact representation of data and the intelligence of the system. The agent is built from such classes (the source code of these classes consist the DAM functionality) and will continue to jump until it has proven a security violation or concluded that the system is in a valid state. For this approach it is necessary that the system includes two additional parts which are described in next sections:

72

Security Policy Language (SPL) which provides a convenient way to model all system tasks, Interpreter that is able to process such SPL classes and programs. It is hard to predict the possible result of such an approach. Instead, the outlined static version of intelligence in a necessary preliminary stage is to check the power of intelligence in security policy provision. This is a necessary experience to plan the mobile intelligent agents’ environment for information security issues.

4.8 Security Policy Editor (SPE) The main goal of the Security Policy Editor (SPE) is to define the rules that regulate how an organization manages and protects its information and computing resources to achieve security objectives. To show graphically the main functionality of the SPE Use Case diagram with a unique extern person is used. In Figure 4.16 you can see user which is going to interact with the SPE tool to make a data analysis and data management of the system creating the security rules. Figure 4.16: SPE Subsystems as UML Use Case Diagram

Data Analysis

User

Security Policy Editor Data Manager

Rules Editor

The services the SPE provides are the following: Rules Editor: This component allows the user to define security policies. Data Manager: This component manages the insertion, deletion and updating of metadata (e.g., resources, threats, safeguards, etc.). Data Analysis: This component monitors the different data stored in the SPE database so that the user may analyze, for instance, the resource’s vulnerability, the different threats that could be applied to a resource, the different security solutions they could take to avoid a threat, etc.

73

Of course, each of this SPE subsystems has more than one class which are shown and described in following class diagrams. For better understanding, I want to repeat the concepts which have been already identified: Resource: The object (password, e-mail,…) that could be threatened. Resource Group: A resource group includes several resources. Threat: Event that can cause a damage in the resource ThreatGroup: A threat group includes several threats. Vulnerability: It defines how “secure” a resource is. Datasource: Place (usually files) where the agent should look for after agent “reads” the rule to check if the security rule has been broken. SafeGuardMechanism: Mechanism that protects a resource against a threat. SafeGuardFunction: A safeguard function includes several mechanisms Rule: It regulates how the organization implements, manages, protects and distributes its information and computing resources to achieve security objectives. Data Analysis-SPE Subsystem Figure 4.17: Data Analysis-SPE Subsystem as UML Class Diagram

DataAnaly sis Saf eGuardsAnaly sis ThreatsAnaly sis

ResourcesAnaly sis

RulesAnaly sis

DataSourcesAnaly sis

The DataAnalysis class (Figure 4.17) allows the user to view data stored in the database through the GUI. The ThreatsAnalysis class allows the user to view the threats and threat groups data through the GUI. The ResourcesAnalysis class allows the user to view the resources, resource groups and vulnerability data through the GUI. The DataSourcesAnalysis class allows the user to view through the GUI the data sources where the agent may go to check the security rule. The RulesAnalysis class allows the user to view the rules edited through the GUI. The SafeGuardsAnalysis class allows the user to view the safeguard functions and mechanisms through the GUI.

74

Data Manager-SPE Subsystem Figure 4.18: Data Manager-SPE Subsystem as UML Class Diagram

DataManager

ThreatsManager ResourcesManager

DataSourcesManager

Saf eGuardsManager

The DataManager class in SPE (Figure 4.18) allows the user to manage (insert, update, delete) data stored in the database. The ResourcesManager class allows the user to insert, update or delete the resources and resource groups stored in the database. The ThreatsManager class allows the user to insert, update or delete the threats and threat groups stored in the database. The DataSourcesManager class allows the user to insert, update or delete the data sources stored in the database. The SafeGuardsManager class allows the user to insert, update or delete the safeguard mechanisms and functions stored in the database. Rules Editor-SPE Subsystem Figure 4.19: Rules Editor-SPE Subsystem as UML Class Diagram

RulesEditor

RuleAsset

RuleActions

RuleAction

Rul eValue

Rule

RuleOperator

Primitiv eRuleCondition

RuleCondition

The RulesEditor class (Figure 4.19) allows the user to edit a rule. The RuleAsset class represents the asset that composes the rule line. The RuleOperator class represents the operator that composes the rule line. The RuleValue class represents the value that composes the rule line. The RuleAction class represents the action that composes the rule line. The PrimitiveRuleCondition class represents the condition that composes each rule line. The RuleCondition class represents all the conditions that compose the rule. 75

The RuleActions class represents all the actions that compose the rule. The Rule class represents the rule edited by the user.

4.8.1 Rules Database (RDB) Security policy rules are stored in RDB module of the system. The relationships between security policy elements can be stored in a different ways. In Figure 4.20 I want to show just an example (a screenshot from the Rational Rose software tool) how the RDB can be organized: Figure 4.20: Example Organization of Rules Database (RDB)

76

4.9 Overall Design of SG Agent Platform The UML diagram of the agent platform shown in Figure 4.21 gives a general overview of the involved classes, their relationships, and their interactions during important system activities. This UML class diagram involves the following classes with their respective methods: !

!

!

!

!

!

AgentServer AgentSecurityManager Place Communicator – UserCommunicator DataManager SIS

The methods shown represent only a basic interface. The actual system will have to subclass most of these classes to add application-specific functionality. Their role is explained below. Agent

A class that represents a regular multi-hop agent which migrates from host to host to perform different kinds of tasks (i.e., surveillance and intrusion detection).

SerialAgent The agent’s state (together with management information like codebase) represented as a bytestream. AgentInfo

An agent’s attributes which contain agent-specific information (e.g., its name, owner, creation date, a Home Server URL).

AgentType

A class which specifies the name of the Java class that contains the agent’s code.

Thread

A standard Java Thread class which encapsulates an agent, and can be used to run it in a ThreadAgent class which is extended from this class.

Operation

A sensitive operation which accesses operating system resources (e.g., file access, network access) and must be checked by the security subsystem.

Result

This class represents values which are returned by Operations.

Permissions An access control list which associates an access right with each operation. This class is used to determine, whether an agent (running in a thread) is allowed to carry out a certain operation. These are agent permissions defined in the agent permissions table.

77

Constraints A Constraint is a generic representation of a certain security policy. A constraint has to be valid (hold) throughout the system installation (e.g., when spoofed IP addresses are detected, raise an alarm). A Constraint object is used to initialize agents and to configure the Data Manager instances (which have to collect data according to the security policy). UserID

A user identification which allows the system to authenticate the user (with a password and/or a certificate). It is passed to agents, when they are initialized and allows them to obtain permissions at hosts which they visit.

Host

An identification of a host (i.e., a kind of resource locator like URL or RMI stub). Figure 4.21: Design of the SG Agent Platform as UML Class Diagram

User Communicator

launchAgent(UserID id, Type t, Constraints c): void

claimResults(UserID id): Result registerCallback(UserID id): void Place

AgentServer

1 transfer(Agent a): void

1

handleAgent(Thread t, AgentInfo ai): void

performOperation(Operation o):

transfer(Agent a): void runAgentAtPlace(Thread

n

t, Place p

AgentInfo ai): void

Result

n Communicator

1

getThread(Agent a): Thread

n

transfer(SerialAgent a, Host target, Place p): void

receive(SerialAgent a, Host from, Place p): void

n 1 DataManager

1

assembleThread(Agent a): Thread

1

n

AgentSecurityManager

1 doAccess(Thread T, Operation o): Result

setAgentPermissions(Thread t,

SIS

Permissions p): void

configure(Constraints c): void

checkPermission(Thread t, Operation o): boolean

encodeAgent(SerialAgent a, Host from): SerialAgent

0,1

decodeAgent(SerialAgent a, Host to): Agent

1

getPermissions(UsedID id, Host from): Permissions

78

Figure 4.22 describes the activities that take place when an agent is received by a communicator. The communicator receives the serialized agent’s state, the place where the agent wants to execute, and the host information where the agent originates from. Figure 4.22: Agent reception as UML Sequence Diagram

Communicator

Agent SecurityManager

SIS

AgentServer

Place

receive (SerialAgent, Host, Place)

decodeAgen (SerialAgent, Host) t

: Agent

deserialize (SerialAgent) : Agent

assembleThrea d (Agent) : Thread getPermission s (UserID, Host) : Permissions

setAgentPermission s (Thread, Permissions)

runAgentAtPlac e

(Thread, AgentInfo, Place)

handleAgent (Thread, AgentInfo)

79

Figure 4.23 describes the activities that are carried out when an agent calls a security sensitive function while it is executed at a certain place. The Result object, which is returned from the Data Manager to the Place (and finally passed to the Agent), could also include an exception (when access is denied by the security manager). Figure 4.23: Agent's function call as UML Sequence Diagram Ag ent

P la c e

D a ta M a n a g e r

Ag ent S e c u rityM a n a g e r

p e rfo rm O p e ra tio n (O p e ra tio n )

g e tT h re a d (A g e n t) : T h re a d

d o A cc e ss (T h re a d , O pe ra tio n )

c h e ck P e rm iss io n (T h re a d , O pe ra tio n )

: b o ole a n

a c tu a lly p e rfo rm ta s k : R e s u lt

: R e s u lt

: R e s u lt

80

Figure 4.24 describes the interactions of all involved components when an agent should be sent over the network. Figure 4.24: Agent sending as UML Sequence Diagram Place

AgentServer

Communicator

SIS

Communicator (remote host)

transfer (Agent)

transfer (Agent, Host, Place)

encodeAgent (Agent, Host)

: SerialAgent

receive (SerialAgent, Host, Place)

4.9.1 Security of SG Agent Platform The SIS module as described in previous chapters is a core of the security in the system prototype, too. Towards the outside the SIS consists of two APIs: one for communicating with agents and agent servers (called AS-API) and the other for communicating with CAS and changing security settings (called CAS-API). All SIS classes and interfaces are shown in Figure 4.25 as UML Class Diagram and are described below. The interface CAS-API provides cryptographic functionality and functionality for constructing messages complying with international standards. It does not keep any secrets. The class SISDB is a class which contains methods for storing, modifying and removing security related information. It is an abstract class. The class SISCore implements the CAS-API class. A SISCore class implements creating messages and doing cryptographic operations and it has to be declared as a private class. The SIS class extends the SISDB class and implements the interface AS-API. It has a private SISCore instance variable, too. The AS-API is the interface to the agent server. The SIS class communicates via the CAS-API with the CAS, but it only handles the messages related to the transfer of 81

agents - the system communicator establishes direct contact to the other hosts. The reason that the tasks are distributed in that way means that the agent platform becomes independent of the type of PKI solution used and the SIS becomes independent of the agent platform. Figure 4.25: All SIS classes as UML Class Diagram



SISDB

CAS API

SISCore 1

1

AS API

SIS

The CAS-API Following my design, the CAS-API and consequently the whole SIS and CAS modules must provide the following methods:

"

"

"

"

"

"

methods for initialization and exiting of CAS and SIS, method finalize() to clean up the resources allocated by the underlying code, methods for cryptographic functionality. The cryptographic functionality provided will be PKCS#7 [Pkcs7] (a standard for cryptographic message syntax which describes general syntax for data that may have cryptography applied to it, such as digital signatures and digital envelopes) encryption and signing as described earlier. The keys are PKCS#1 [Pkcs1] public keys. PKCS#1 is a standard for public-key cryptography based on the RSA algorithm, covering the following aspects: cryptographic primitives; encryption schemes; signature schemes with appendix and ASN.1 syntax for representing keys and for identifying the schemes. Private keys follow the PKCS#8 [Pkcs8] private-key information syntax standard. The PKCS#8 standard describes syntax for private-key information, including a private key for some public-key algorithm and a set of attributes. The standard also describes syntax for encrypted private keys. The intention of including a set of attributes is to provide a simple way for a user to establish trust in information such as a distinguished name or a top-level certification authority's public key. The PKCS#7 standard signs a message using a PKSC#8 private key protected by a password, methods for generating keys and providing random numbers, methods for generating keys. Generates a 1024 bit private/public RSA keypair, which is returned on PKCS#8 and PKCS#1 format, methods for reading certificates,

82

#

#

#

#

#

#

#

#

#

#

#

methods for communicating with the CAS. The protocols used by SIS to communicate with the CAS are PKIX CMP protocols [Pkixcmp00] together with the instant protocol as described earlier, methods for writing and interpreting these PKIX messages. The synchronization is done by the SISCore class, methods for creating a PKIX CMP certification request message, methods for interpreting the response of a PKIX CMP certification request message. It is checked that the response is correct, in particular that the sender was in possession of both the IAK and the secret key of the CA. The certificate is returned, methods for creating an instant request, methods for getting an instant certificate from the response to an instant request, methods for creating a revocation request. Parameters for these methods are the certificate to be revoked and the certificate of the CA, methods for interpreting a response to a revocation request. In particular it is checked that the response is signed by the CA, methods for creating a confirmation from a response (any of the responses), methods for launching agents. In their structure these functions are similar to those for the communication with the CA. The host, on which this SIS recedes, initiates a communication by creating a request. The other host responds to the request and the session is closed when this host sends a confirmation and the other host sends a reply to confirmation. The difference is that the state of an agent must be transferred, the code of the agent must be loaded, possibly from a third host, and the communication must be encrypted. First, a request for sending an agent object must be sent. The agent object is the serialization of an agent. The owner of the agent can be identified by passing the certificate of the owner as a parameter. The message must be PKCS#7 wrapped before it can be sent. The recipient host will respond by returning an AgentRuntimeResponse. If the response is negative, the function checkAgentRuntimeResponse will throw an exception. In the case of a successful transfer of an agent it is not essential that a confirmation and a response to a confirmation is exchanged. Thus the host will close the connection after this exchange. However, if the response is negative, a confirmation and a response to a confirmation must be exchanged, methods for receiving agents. The handling of these functions is more complicated than the handling of the functions for which the host itself starts the communication. This is because of two reasons: one is that it is always more difficult to read than to write because you have to synchronize all read and write operations. In particular this is true for the receipt of messages that the CA must be contacted. Another reason is the synchronization which forces the recipient to set up new threads for new incoming messages. Furthermore, a response is created and if a positive response can not be made, CreateResponse will throw an exception which can be used for creating a negative response. A negative response is created from a request and an exception. After a negative response, a confirmation is expected. This confirmation must be checked and a response to the confirmation must be created. The function CheckConfirmation returns a response to a confirmation or throws an exception. (If the confirmation is not correct and properly authenticated, the host at the other end may believe that the agent has been successfully launched.),

83

methods for requesting code. The parameters are the host from which to request the code, a path or a vector of paths in the location and a name of the class. The returned request is not PKCS#7 wrapped, methods for receiving requests for code and for returning code. None of the messages in this section are PKCS#7 wrapped, methods for making a positive and a negative response, methods for exchanging information. A number of methods similar to those for requesting and sending files must be implemented, too. $

$

$

$

The SISDB class The SISDB is an abstract class which is used for storing security related information. These information are among others, trusted certificates and the files containing the security permissions. The file containing the permissions is identical to the configuration file of the Secured Gypsy system, but that is no necessity for the SIS. There are international standards for certificate stores of which PKCS#12 [Pkcs12] is the most widely used. This personal information exchange syntax standard specifies a portable format for storing or transporting a user's private keys, certificates, miscellaneous secrets, etc. In this pilot system the certificate store will be made by simply storing the certificates and other security related objects as files but it can be extended to PKCS#12 standard. The protection is achieved by having a Message Authentication Code MAC values [Mac00] computed and stored as a file each time a change in the global security settings is made. The MAC is an authentication tag (also called a checksum) derived by applying an authentication scheme, together with a secret key, to a message. Unlike digital signatures, MACs are computed and verified with the same key, so that they can only be verified by the intended recipient. Since the certificate store is designed in this way, it is not necessary to have a special API for reading the files. Only adding and removing files requires special functions, which compute the new MAC values correctly. The get functions for MAC values of security files are mainly there as a service and for improving the performance. The CAS API takes into account that system administrators can be remote users. This is by using signed certificates. Thus the remote system administrator can sign the message he wants to pass and call a SISDB function by launching an agent or by calling it directly using RMI. In this way the remote user doesn't need to send a password over the net. Secret keys need not be protected by a MAC and the SISDB does not handle secret keys. Instead the secret keys are protected by being encrypted under a password. The SISCore class The SIScore class implements the CAS-API class. So, here only the additional methods not in the CAS-API are mentioned. methods for reinitialization, methods for computing current disk MAC, methods for getting a host's secret key, methods for getting a host's password, methods for shutting down the system. $

$

$

$

$

84

The AS-API As mentioned earlier, the AS-API is an interface for making the system independent from the agent platform and from the used SIS and CAS modules. Since the security checks are carried out by the SIS itself, the AS-API is simpler than CAS-API. So, the following methods are provided by the AS-API: methods for creating and interpreting messages for sending agents, methods for creating and interpreting messages for receiving agents, methods for creating and interpreting messages for the agent's permissions. %

%

%

The SIS Class The SIS provides methods similar to the CAS-API methods. The difference is that the SIS methods get the secret information from the SISCore class. The private key and password of the host must not be used outside SIS. Of course, equivalent methods as in the AS-API must be provided because the SIS class implements the AS-API: methods for launching agents. Notice that in this case, requests and responses will be PKCS#7 wrapped. Thus, even though the functions are almost the same as in the CAS-API, the functionality of these functions is much more complicated. In particular the functions in SIS automatically carry out security checks and may go online to communicate with the CAS directly, methods for receiving agents. Again the handling of these functions is more complicated than the handling of the functions in the CAS-API, for which the host itself starts the communication. Further, requests and responses are PKCS#7 wrapped, and these functions carry out security checks and may automatically contact the CA, methods for receiving requests for code and for returning code, methods for requesting files. It generates a request (not PKCS#7 wrapped), methods for receiving requests for files and for returning files, methods for making a positive and a negative response, methods for exchanging information. A number of methods similar to those for requesting and sending files will be implemented. %

%

%

%

%

%

%

4.9.2 Secure Logging on to the System When a user logs on to the system, he gives his user name and his password to the GUI he uses. The password is kept by the GUI. It is the responsibility of the GUI that the password will at least not be stored on disk. Also a password must always be harddeleted after use, i.e., overwritten. The best thing is if the GUI only stores the password in encrypted form. The secret key of the user will be stored under the name userkeys/username.key below the host's root directory. Similarly, the certificate of the user can be stored under the name usercerts/SISsecured/username.crt. The user name is the same as the common name in the certificate. When the GUI causes the host to perform an action, it calls a function with a PKCS#7 signed/encrypted parameter. 85

Now I detail on the example, where a new user must be added to the system. The certificate of the user is held in a CAS: In order to make the SIS add the certificate to the list of users, you have to prove that you are a system administrator. This is done by a signature. Finally, the signed certificate is passed over as a parameter to the function, which adds a user to the system (in this case a function from the SIS itself). Now the SIS should have accepted the new certificate and associated it with the rights of a user if the holder of the secret key is a system administrator. Various GUI’s are necessary. At least for the rules database, for system administration and for system initialization. Some of those share common GUI components, for example a common password prompt is used. The system initialization differs from the rest since it can be invoked without a user existing in advance. The Password Prompt The password prompt is used in order to get the user name, the secret key and the password of a user. The corresponding command interface module must be included, demonstrating the security checks which communicate with the SG to check that the password and the key fit together. All methods of the password prompt are secured such that they can only be called by threads in the root thread group. In particular no GUI set up directly by an agent can invoke the password prompt. The password prompt terminates if a user logs on successfully. In my system prototype, I have taken the approach that the security policy rules must be parsed into classes representing a compact representation of data and operations of the rule. These user's rules are directions for agents about what to check and what to do if the checks turn out to be positive. The agent is built from such classes and will continue to jump until it has proven a security violation or concluded that the system is in a valid state. It is vital that agents carry as little information as possible on their route between hosts in order to maintain the advantage of reducing the needed bandwidth. Therefore, an agent will only carry with it the capabilities it needs, and it will be able to do remote and distributed data analysis. HS-API In addition to the rules editor, a scripting language is provided. This is necessary, because the rules editor introduces some limitations as its design favored simplicity (easy and intuitive to use) over complete expressiveness. The scripting language allows to fully exploiting the agent’s potential. In normal operation there will be three levels to work on:

&

&

&

Monitor user level. Uses the rules editor. Final rules from predefined building blocks. Scripting level. Scripts for constructing more sophisticated rules than what is possible with the rules editor and for constructing building blocks for the rules editor. System level. New places, fundamental classes for building agents, etc.

86

The levels are listed in increasing order of the skills required in order to carry through the changes. Only the first one is possible for each user of a monitoring station. Each monitoring station should have at least one user capable of making scripts, whereas the expertise for making new fundamental building blocks will always only be present in specialized IT organizations. The HS API has two main tasks. First, it is used to create, initialize and start agents. Second, this API is needed to claim results reported by agents after they have finished. External entities (i.e., the Security Policy Editor) send agents to the agent subsystem, can get notified when an agent has returned, may ask about returned agents and are able to claim their results. The following functionality has to be provided: void start_agent(AgentId id, String script) '

This function is responsible to start an agent whose functionality is represented by the string script. The script is written according to the Security Policy Language presented above. Additionally, an identifier (id) is provided to the agent platform to uniquely identify the started agents when one has to claim results later. This function returns immediately. void register_callback(AgentId id, Notification callback) '

The register_callback function is used to attach a listener function (callback) to a certain agent, which has been started. As soon as the agent returns (or when the agent has already finished its task and is currently waiting at its home place), the SPE is notified by having the callback function invoked. This function returns immediately. AgentId[] check_waiting_agents() '

'

This function returns an array of agent identifiers, enumerating all agents that have already finished their task and are currently waiting to deliver some results. This function returns immediately. String claim_results(AgentId id)

Whenever the SPE wants to obtain results from an agent, the claim_result function can be invoked. By providing an agent identifier, results from a uniquely determined agent can retrieved. This function blocks until the agent with the appropriate identifier (id) has returned and delivered its results.

87

4.9.3 Security Policy Language (SPL) The Security Policy Language (SPL) for this system can be realized in more than one way. In this section I will only give the basic idea of the approach and will not describe the real implementation of this language. It can be implemented e.g., "Prolog-like" or "SQL-like" but its description is beyond the scope of this work. A security policy is a set of rules and directives which describe the allowed and forbidden activities of different users in respect to certain assets. Usually, a security policy is formulated in natural language and covers given constraints in an informal way. In order to actually implement and reinforce a certain security policy, its rules must be formulated in an unambiguous and precise language where statements can automatically be validated and checked against given facts. As the goal of this system is to validate whether a given security policy is followed, it is necessary to express these security constraints in a language that allows agents to verify them. Two different ways are possible: one approach describes legal system states, while the other one explicitly mentions offending states (i.e., policy violations). My system follows the latter approach as it has the advantage of being able to precisely identify conditions, which are forbidden. A simple approach for declaring a security policy consists of a collection of “if-then” rules. The “if-clause” specifies offending conditions while the “then-clause” represents appropriate actions that must be taken when such a condition occurs. Unfortunately, sophisticated, distributed cases need to consider a number of preconditions in parallel or recursive constructs to distinguish between legal and illegal cases. Therefore, my descriptive Security Policy Language (SPL) (which can be seen as a small subset of the logic oriented declarative language Prolog) provides a convenient way to model offending system states. The advantage of a declarative language is the possibility to express situations in a natural way. In contrast to a conventional programming language, which formulates algorithms by defining a sequence of instructions, a declarative approach allows to state facts (and rules) that describe valid conditions without an implicit ordering. An interpreter processes the given rules and performs a depth-first search within a fact database. Scenario Description When an agent is launched, the SPE chooses a new, unique agent identifier (which may easily be represented as an integer value in the actual implementation) and prepares a String with an appropriate security rule (following the grammar shown above). With these parameters, the start_agent function is invoked. The calling process now has two options, depending on whether “pull” or “push” notification should be realized. The SPE may periodically ask for agents that have returned (via a check_waiting_agents call) to implement a “pull” approach or register a callback function to be immediately informed of returning agents (via a register_callback invocation) when the “push” approach is preferred. As soon as the availability of the agent has been verified, the claim_results call can be used to receive the string holding the agent’s results. Alternatively, the SPE can also launch an agent and immediately call claim_results, which blocks until the agent has returned. Basically, external entities and the agent platform only exchange messages that use strings to represent data. Agents are sent to the platform written as scripts and results are returned as plain text messages. The communicator that offers the HS-API must contain an agent launcher component, which is able to translate the script representations into agents.

88

Configuring the security system The security policy language (SPL) allows the description of different threats. In order to find a compromise between flexibility and maintainability, the whole security system has to be configurable. As shown above, a security policy is represented as rules combined with a certain goal. Even if the same rules are used, different levels of security can be set up by modifying the parameters of a goal or the goal itself. To make the security system easier to administer, we partition the different checks into so called domains. The whole area of possible security checks can be divided into several domains. A domain can be seen as an entity that is covered by a set of rules. For example, there might be a domain called “file system”, which performs checks on the integrity of the local file system. Another domain might be called “NFS” and is responsible for checking NFS connections to other computer. The rules in the NFS domain control whether the assigned permissions in the NFS configuration files are consistent and only minimal permissions are assigned. Low level accesses to resources are done by the data manager and domains can perform operations on overlapping fact sets. Users Because of the fact that the rules in the database have to be designed and chosen carefully, not everyone has the possibility to modify the entries of the rules database. Sometimes, a rule’s meaning is not clear at first sight, and only trained persons should administer the rules database. In order to prevent any corruption of the rules, two major groups of users can be distinguished, each having different rights. Rules experts have the permissions to modify entries in the rules database. This allows them to modify rules as well as goals. On the other hand, normal users don't have the right to change any rules database entries. They only use the database created by rules experts. Normal users can customize the kind of security policy that they want to have reinforced. This can be done using a graphical interface, in which the user can select different security domains. Depending on the desired security level, appropriate goals are selected from the rules database and a new security policy for the host is monitored. When the user selects the different domains using the graphical user interface, he chooses implicitly which rules are selected from the rules database. Additionally, one could have rules using different goals, resulting in different levels of security. Security Profiles In order to impose as few administrative overhead as possible on the normal user, one can save the entered configuration. This allows to have different configurations among which user can switch easily. In case of a detected attack, this allows to make the local host secure, for example by choosing a configuration in which remote access on the local host is forbidden. This is an important feature, because the system shall not only detect intrusions or security policy violations, but also help to combat them.

89

4.9.4 Generation and Execution of Agents Generation Agents are created and launched into the mobile agent system by an Agent Launcher Component which is integrated into the communicators that can be contacted by external entities like the SPE. Such communicators offer functions via the HS-API to allow the SPE to send scripts (i.e., the actual building plans of agents) to the Launcher and to receive agent’s results. When an Agent Launcher receives a script, it has to build an agent and release it into the mobile agent system. The following two activities have to take place: 1. The 'goal' of the script specifies which hosts need to be evaluated. The language allows a keyword named all, which represents all machines on the network. When it is used, it has to be expanded to the actual list of hosts, which need to be visited. With the given information, the agent’s path is initialized. 2. The script has to be parsed and its syntax verified. The rules are stored in a compact form which enables the interpreter to efficiently operate on them. Then the action list and the goal are analyzed by the interpreter and if the script was OK the agent class is created. So, the agent’s state is filled with compact classes that represent rules, the goal and possible actions. The agent is then transferred to the first location in its path list and execution starts. Execution A basic agent simply implements (or utilizes) an interpreter, which is able to process programs (rules, facts and goals) in the given SPL. In contrast to a standard interpreter (e.g., Java Virtual Machine, Perl Interpreter), which sequentially executes instructions written in the appropriate scripting language and continuously modifies some sort of internal state (e.g., to resolve conditional statements or remember values in variables), my suggested interpreter utilizes a so called inference machine. An inference machine is a well-known concept in theoretical computer science and is used to process programs written in a declarative language. The basic principle is as follows. The execution of a sequential program is replaced by a search in a database of facts. The input to the program is given by providing the data itself (i.e., facts) and rules that describe how to combine facts to new information. The output of the program is the result of searches. As results can be more complex than simple yes/no answers, declarative languages can be used to formulate and solve actual problems. A good example is the language shown above where results indicate possible security violations. After an agent has been launched, it only holds rules and a statement that is searched for (i.e., the goal), while the database of facts is initially empty. Therefore, it starts to read the information stored at the first host (usually the agent’s launch-site) and commences to evaluate the goal statement. The facts are gathered using the Data Manager (DM). Instead of accessing the resources directly, the DM provides a generic interface to receive facts in a general format. When the action fails, the whole fact base is

90

transferred to the next host, where the agent again tries to validate the given goals. The agent’s journey continues until all hosts are eventually reached or a goal is found to be true. The whole process can be seen as deriving conclusions from a partially complete database. While the agent is roaming the network, it is continuously building up its fact knowledge. When every node has been visited, the complete state of the network is known and results can easily be obtained. Possible Optimizations The following optimizations describe possibilities to extend a basic agent to allow it to combat the disadvantages mentioned above. Special Agents It is usually a slow process to have a single agent moving from node to node and performing simple, basic tasks (like checking local values without dependence on other hosts). Therefore, one can detect basic patterns in rules and substitute them with optimized agents. These agents can utilize worker agents (a Gypsy notion of a subordinate agent, that can perform work on behalf of its supervisor) to parallelize tasks or use precompiled, optimized algorithms that do not need interpretation. An example would be an agent, which has to perform only checks for local conditions or agents that have to build sums of values of every node. Both tasks can easily benefit from multiple agents, which do not have to carry any facts or rules with them. (

Data driven path selection When an agent starts to collect data, it might turn out that only a small number of facts are missing to verify a given goal and detect a policy violation. When missing information can potentially be found on a small number of hosts (or even a single one), it would be rational for the agent to visit these nodes immediately. The agent can therefore minimize detection time by carefully choosing its path according to given knowledge. (

(

Useless Fact Elimination When an agent tries to prove a goal with only partial knowledge of the whole network state (i.e., has only visited some nodes), it might be able to detect a number of facts that can never be used to prove a goal (independent of possible facts stored at still unvisited nodes). Those useless facts can be forgotten (i.e., eliminated from the fact base) without influencing the result and thereby reducing network load.

4.10 Graphical User Interface (GUI) By using the Graphical User Interface (GUI) of the system, the users are able to edit rules, send agents and view results. In the system I distinguish two main user roles: 1. Normal users: They launch agents in a simple way, for small and simple tasks, with already existing rules.

91

2. Expert users: They launch agents for complex tasks with long rules. They can create new rules as well as make a full risk analysis of the system. When the user enters the system, the page shown in Figure 4.26 will be displayed: Figure 4.26: MainGUI Page

Here a normal system user can make a simple rule in the following way: He will select one or more domains (e.g., telnet domain to check all telnet connections in a whole network), a security level (e.g., high level for strict security control) and a host or group of hosts where the rule will be checked (e.g., subnetwork). After that, the user will click on the “Start” button to launch the agent which will check the rule. The results of the checking will be displayed in the results text area. An example of a simple task edited in this way could be the following: “Check the web server log files of host 1 with a high security level". If the expert user wants to edit for example a more complicated rule, a new script or new resource, he will click on the “Advanced” button and the page shown in Figure 4.27 and in Figure 4.28 protected with a password and will be displayed. As already explained in SPE, the advanced menu on the left side allows the expert user to: 1) Edit a rule or import a rule that already exists. This can be done with the “Rules editor” subsystem, which is shown in Figure 4.27 AdvancedGUI –Rules Editor Page. 2) Manage the data that is stored in the database. With the “Security data management” subsystem, the user will be able to insert, update or delete resources, which is shown in

92

Figure 4.28 AdvancedGUI –New Resource Page, or groups of resources, threats, safeguards functions, safeguards mechanisms, etc. 3) Make statistics from the data stored in the database. With the “Security Data analysis” subsystem the user will be able to analyze the different types of resources, the threats that could be applied to a resource, the vulnerability of a resource, etc. This analysis could be useful for the administrator to know the level of security of the different resources of the company and therefore, to apply a better security policy. Figure 4.27: AdvancedGUI Page – Rules Editor

The screen where the expert user may edit rules consists of two basic parts: 1) The rules 2) The script editor At first, an expert user gives a name to his role or script and than selects to which security domain its belongs. The user may select the asset, the operator and the value that will take the asset and the action to do when the rule is checked. For instance: “If the number of telnet accesses is greater than 3 then send an e-mail”. Of course, this rule belongs to the telnet security domain. One or more assets could be joined by the “And” or “Or” connector. One or two actions could also be specified. If the actions are joined

93

by an “And” connector it means do both, the upper statement is carried out first. Or means that if the upper statement fails, the lower one is also done. Each condition could be modified or deleted clicking on the “Modify Condition” or “Delete Condition” respectively. The action could also be modified or deleted clicking on the “Modify Action” or “Delete Action”. This rule will be translated into the Security Policy Language (SPL) if the user clicks on the “Translate Rule into Script” button. The script will be displayed in the script editor part. The edited rule could be saved as rule if the user clicks on the “Save Rule” button or as script if he clicks on the “Save Script” button. The user may also import a rule or a script that already exists and that is stored in the database clicking on the “Import” button. He will select a rule or a script that will be displayed in the rules or text editor part depending on what he selects (rule or script). Once the user has edited or imported a rule he should click on the “back” button and he will come back to the previous screen. There he will click on the “Start” button that will launch an agent with the rule. The results of the agent checking will be displayed in the “Result” text area on the MainGUI Page. If the expert user wants, for example, to add a new resource in the system, he should click on the "Resource" option in the security data management menu on the left side and then choose or create a new type of resource and the resource group to which this resource eventually belongs, as shown in Figure 4.28. How the rules are stored in the database? There are several tables in the database that are related to the rules: Rule, RuleData, Action, RuleAction and Script. The Rule table stores the information about the rule such as rule identifier or name. The RuleData stores information about each of the lines that a rule consists of. The Action table stores the information about the actions to be carried out when the rule is checked such as action identifier or description. At last, the RuleAction table stores the actions that have to be applied to each rule. When the rule is edited and the user clicks on the “Save Rule” button, the rule is stored in the following way: Each rule consists of one or more lines. The rule identifier as well as the rule name will be stored in the Rule table. Each line of the rule will be a row of the RuleData table. The action(s) to be done if the rule is carried out will be stored in the RuleAction table. On the other hand if the user clicks on the “Save script” button, the script will be stored in the Script table.

94

Figure 4.28: AdvancedGUI Page – New Resource

95

5 Implementation of the Web Agent The implementation of a Web Agent, a new Java task-specific agent in Gypsy was the practical part of the thesis. In this chapter I will describe how and for what you can use the programmed Web Agent and will describe implementation details. At first, I will describe how to use this practical part of my thesis, that means, will show a few screenshots from how to launch a Web Agent to the Web Agent results, and than I will describe the implementation of my practical part.

5.1 How to use a Web Agent? What is the task of the Web Agent? Typical webserver logfiles, in this case of the Apache webserver [Apache00], are checked for intrusions from not allowed IP addresses. Each webserver creates logfiles about each access to the sites which belong to this webserver. Such an access log entry contains the exact time of the access, IP address and browser type of the user who had accessed the site, and of course the URL of the web page which was accessed. So, each such web access is stored in a logfile and it is - as a matter of course - these logfiles can be very large. In a network more than one webserver can be present and each webserver normally has a lot of such logfiles. My Web Agent was tested in a network with three webservers and each webserver maintained two or three logfiles. Each logfile had about 2 MB or 15000 – 20000 access entries. Here is a sample scenario: An administrator in company or university has a list of "not trusted" machines, networks and IP addresses; he will not be able to check all the data in a short time. With a Web Agent this task can be done easily and in a very short time. At first, Gypsy must be started on each webserver which should be checked by Web Agent. All webservers and an user GUI should be started, and with this user GUI you can launch a Web Agent as shown in Figure 5.1.

96

Figure 5.1: Launching Web Agent

After the Web Agent in entry Launch menu is clicked, an input window dialog appears with a prompt to type the place name. A user has to type in the Web place as shown in Figure 5.2. If the name of another place is typed in, the Web Agent will not be able to perform its task. Of course, if the name of the place does not exist, the user is notified with a corresponding message window. Figure 5.2: Input of Web placename

After Web as name of the place is typed in, the next input dialog appears and prompts the user to type in the location of the Web Security file. This Web Security file contains only the list of "not trusted" IP addresses, and it should be stored only on the local machine where the user GUI is started. Of course, the administrator can have more than one Web Security file and with each file he can have different security level in his network. Of course, these files and their locations should be protected. One example is shown in Figure 5.3.

97

Figure 5.3: Input of Web Security file location

If the filename or directory which was typed in does not exist or is empty, the user is notified with a message window as shown in Figure 5.4. But if the given Web Security file with the "not trusted" IP addresses was not empty the Web Agent is created and starts doing its task. Figure 5.4: Wrong Web Security file or location message

In its task the Web Agent compares each access entry in each web logfile with all IP addresses from the Web Security file. When all intrusions are filtered and collected from a webserver, the Web Agent will jump to the next machine and will do the same task there. On each webserver the user is notified which Web Security file was checked by the Web Agent and how much time this task took on this machine. All webservers are registered by the place registry and that order is also the visiting order of the Web Agent. The locations and names of web access files for all webservers are stored at home server in a configuration file which should be also protected. If new access file should be checked, user should just add the IP address of the webserver and the location of this new web access file in the configuration file, and the Web Agent will check this new file at the given webserver, too. If a web access file at the given location does not exist or is empty, the Web Agent will continue with its task and the user will be notified in the results. Also, if the Web Agent detects some defect line in any web access logfile, the whole line with the exception is printed out in the results. At the end when the Web Agent has returned to the home server, the results can be queried with the user GUI. Each user must have its own password and only he is able to retrieve the results of the agent. A small part of such a Web Agent result is shown in Figure 5.5. These results are written by WebAgentResultWriter on the console where the user GUI was started. Figure 5.5 shows us all Web Agent results from the machine "gauss.infosys.tuwien.ac.at" and some of the results from the "w2.infosys.tuwien.ac.at" machine. Two more machines were included in this test, but for the moment, I will concentrate on the results from the "gauss.infosys.tuwien.ac.at".

98

As you can see in Figure 5.5 three web access files on "gauss" in directory "/tmp/WebData/" are checked by the Web Agent. In the first one "WebAccess1.log" one web intrusion is found. You can see from which machine the intrusion was done, at which time and what was the target of this intrusion. Also, one web entry was found with wrong access log information, so the whole access entry is printed out together with corresponding exception. In the second web access logfile on "gauss" named "WebAccess2.log" were no intrusions detected. At the end, the third webserver access file was checked by the Web Agent, but the file with the name "WebAccess3.log" was not found or was empty on the "gauss" machine. In any case the Web Agent has continued his task on the next machine with a name "w2" in this network. Figure 5.5: Web Agent results

5.2 Description of Web Agent implementation We have seen what the Web Agent can do on different machines but now want to see how it was implemented. The whole implementation was done in Gypsy, which was extended with new Java classes. Because of the suggestion that the GUI in the final

99

version of Gypsy will be based on a JavaBean enabled editor, I had also to implement the support for this feature in my part. The benefit of using JavaBeans is that the GUI will extract all agents automatically out of the generated jar-archive and integrate them into the GUI. This means that the programmer of agents does not have to care about the integration into the GUI. But the current Gypsy release, which I used for my thesis, does not contain a JavaBean enabled GUI, so I had to manually provide code for the integration of my Web Agent with the user GUI. At first, I generated a file WebAgent in the lightweight directory of Gypsy project where will be the implementation of my Web Agent and the corresponding bean description class WebAgentBeanInfo. I used a makefile to ensure that all necessary files will be complied, and are also part of the next distribution. The Web Agent is derived from the MultiHopGypsyAgent which enables the Web Agent to travel using a routing plan. This routing plan is the registration order of all webservers by Gypsy's place registry. A Web Agent constructor with parameters GypsyAgentInfo, fileLocation, secData and confFile is implemented. GypsyAgentInfo which is needed to identify the Web Agent contains common information about the agent and its owner like Agent ID or URI of corresponding home server. The fileLocation holds of the location of the Security Web file to be used, which is needed to notify user on each machine which Security Web file is used by this Web agent. And the read Security Web file data are stored in parameter secData to compare it on each webserver with each access logfile. The confFile maintains the information about all locations of web access files at each webserver. The method doTask() is implemented which is abstract in the super class with the functionality of the Web Agent. This method will be called automatically by the Web place which executes the agent. Locations of all access logfiles which should be checked by Web Agent are given in the method addAccessFiles(). To check a new web access file, just add it with the IP address of its webserver in a configuration file at home server (e.g., /tmp/WebSecurity/LocationFile.conf) and the current implementation of Web Agent will do everything else and the results from this new file will be added to the Web Agent results. If a access file does not exist on given webserver, the user is notified in the Web Agent results as shown with "WebAccess3.log" file in Figure 5.5, i.e., new files, not existing files, empty files and files with defect lines are detected and user is notified. However, the execution of a Web Agent is not interruptible, that means the agent in any case will do its tasks until the end. Each access file is read and compared with each Security Web file entry. Useful data are filtered and all intrusions from the given IP addresses are detected and all collected intrusions are saved in a tree map Java object. At the end of this hop, the constructs for collected intrusions are emptied before the Web Agent is transferred to the next webserver on his route list. Also, the serialization ID for this new Web Agent class is added, which is used by the Gypsy classloader to determine code changes in different versions of the agents. To enable the representation of the Web Agent in taskbars of a JavaBean editor at least two icons for this agent must be provided. An icon "Admir16" in 16x16 and one icon "Admir32" in 32x32 pixels is created and saved in the JPEG format into the ...agents\lightweight\images directory of Gypsy. Launch menu of the Web Agent and all

100

other launch menu entries of "Admir!" agents contain this icon as you can see in Figure 5.1. This icon is added in the method WebAgentBeanInfo.getIcon(). The WebAgentBeanInfo bean class will use the WebAgent class what is specified at the and of this bean class. As mentioned earlier, the Web place executes the Web Agents and this is implemented in the places directory of Gypsy. The WebPlace is derived from the super class Place and it has more constructors. It has also methods to run and terminate a Web Agent and an important method to read all access data as Vector with the call of the method DataManager.readSecurityData(fileLocation). As for the WebAgent the serialization ID and the WebPlaceBeanInfo for the WebPlace was created. Data Manager is implemented as an embedded agent as described in section 4.6. DataManager has also more constructors and two methods for accessing local data: one which returns the content of a file as String called readLog(fileLocation) and one which returns the read file as Vector called readSecurityData(fileLocation). The WebAgentResultWriter is also created in the embedded directory and is used to write all collected Web Agent results. It has several methods for nice printing the Web Agent results as shown in Figure 5.5. Also, you have to define all JavaBeans of the jar-archive in the Manifest file in the gypsy directory of Gypsy project to enable JavaBean editors to recognize and retrieve them. The current Gypsy version does not contain a JavaBean enabled user front-end, so the user GUI, as shown in Figure 5.1, is implemented using Java Swing classes, which come along with Java 2. For handling GUI events, such as launching a Web Agent, the special action listeners have to be implemented. Since this GUI user interface is just temporary, the launching of agents is done in one special class in the swing directory called AgentLaunchAction. First the name of the Web Agent for the representation in a launch menu item list must be provided. Than this class must react on this special event (e.g., the user clicks on the launch Web Agent item) which is implemented in the method actionPerformed(). If the incoming event matches with the action name, the user choose the type of the execution place through a input dialog. The default is a Common place, but in our case a Web place is used and the launchWebAgent(webPlace) method is called. The GUI front-end has a special version of a Gypsy server called the AgentManager. This server knows a local place registry which can be queried by the getPlaceList(webPlace) method. Returned waypoints (route) consist of the name, the place and a list of the corresponding communicators. The user has to type in the location of the Web Security file, which is read by the Data Manager's readSecurityData(fileLocation) method. If the security file on the given location does not exists or is empty the user is notified, otherwise an instance of the WebAgent is created with the current user settings. If there is a need for the Web Agent to return home, GypsyAgentInfo with the agent's name and home URI is created, which means that the agent returns to the home server after the last place on his route.

101

The AgentManager class is derived from the GypsyServer. It has methods for registering and unregistering places and methods like getPlaceList() or getAgents(). But the most interesting method is the createDefaultPlaces() method which creates the minimum set of places from the super class. This method also creates the WebAgentResultWriter and the ReturnPlace for the Web Agent and registers them. In the super class GypsyServer the default places are created by starting one Gypsy Server. One of these places is also the Web place, which is registrated in the method createDefaultPlaces() too. Finally, the action listener and the corresponding menu items have to be added to the GUI interface.

102

6 Conclusions In this thesis I developed an architecture for a secure mobile agent system and designed a prototype of an application for intrusion detection and monitoring security policies using mobile agents. As a proof-of-concept, a mobile agent was implemented which checks all web server log files in a given network. Mobile agents have well-known advantages for implementing tools that operate on distributed data sources. Thus, mobile agents are a promising technology to ensure that the security policy is consistent in the whole system, and to constantly supervise for possible holes and inconsistencies. The key idea is to use the mobile aspect of MA to implement an intelligent agent which periodically analyzes data at different hosts. Different types of mobile agents will fulfill different types of security management, intrusion/misuse detection and response tasks. The data analyzed by the agents in this system can be very sensitive. If an outsider gets access to the system, he can use it for getting precise information about the vulnerabilities of the monitored system. Thus, it is a necessity for the approach that the agent platform is at least as well secured as the general system being monitored. Consequentlly, the protection of hosts and agents is implemented based on cryptographic mechanisms using PKI. The basic setup is the following: the Secure Gypsy mobile agent platform is installed on a number of computers (hosts) in a network. The hosts give access to some system resources of the computers. Mobile agents jump from host to host in order to search for irregularities and take action when irregularities are found. All hosts and all users of the system have a public/private key pair and a certificate on their public key. A certification authority server, called Certification Authority Space (CAS), is used for administrating the various keys globally. For better design all users, agents and hosts will be certified by different CAs running on the CAS. Agents are encrypted and signed during transport with optimizations reducing the number of cryptographic operations. The Graphical User Interface (GUI) allows users to regulate how an organization manages and protects its information and computing resources to achieve security objectives. In this system I distinguish two main user roles: Normal users: They launch agents in a simple way with already existing rules for small and simple tasks. Expert users: They launch agents for complex tasks with complex rules. They can create new rules as well as make a full risk analysis of the system. )

)

103

After having created an agent and being registrated at the CAS, a user can start intrusion detection with launching the agent from the GUI. The agent will then travel around the network and fulfill its ID tasks. After the agent returned, the results of its analysis can be viewed in the result window. If an intrusion was detected an alert for the user or an administrator can be generated (e.g., by email or SMS). As a case study for the system I implemented the Web Agent, a new Java-based and task-specific agent in Gypsy. A Web Agent can be used to monitor webserver (access) logfiles to detect accesses from untrusted domains. A report of all suspicious accesses is delivered to the user.

104

Bibliography [Apache00]

Apache HTTP Server Project. . [Asaka99] M. Asaka, S. Okazawa, A. Taguchi, and S. Goto. A Method of Tracing Intruders by Use of Mobile Agents. INET '99, June 1999. [Balasubramaniyan98] J. Balasubermaniyan, J. O. Garcia-Fernandes, D. Isacoff, E. H. Spafford, and D. Zamboni. An Architecture for Intrusion Detection using Autonomous Agents. Department of Computer Sciences, Purdue University; Coast TR 98-05, 1998. [Emerald97] Event Monitoring Enabling Response to Anomalous Live Disturbances. [Fanklin96] S. Franklin and A. Graesser. Is it an Agent or just a Program?: A Taxonomy for Autonomous Agents. Third International Workshop on Agent Theories, Architectures and Languages. Springer Verlag, Berlin, 1996. [Fipa00] The Foundation for Intelligent Physical Agents. . [Frincke98] D. Frincke, D. Tobin, J. McConnell, J. Marconi, and D. Polla. A Framework for Cooperative Intrusion Detection. Proceedings of the 21st National Information Systems Security Conference, pp. 361-373, October 1998. [Gong98] L. Gong. Secure Java Classloading. IEEE Internet Computing, 2(6), November 1998. [Grasshopper00] Grasshopper - the agent development platform. . [Green97] S. Green, L. Hurst, B. Nangle, P. Cunningham, F.Somers and R. Evans. Software Agents: A Review. Technical report. Trinity Collega, Dublin, Ireland, May 1997 [Gypsy00] The Gypsy Project on Mobile Agents. . [Ilgun95] K. Ilgun, R. A. Kemmerer, and P. A. Porras. State transition analysis: A rule-based intrusion detection approach. IEEE Transactions on Software Engineering, 21(3):181-199, March 1995. [Kumer95] S. Kumar and E. H. Spafford. A software architecture to support misuse intrusion detection. In Proceedings of the 18th National Information Security Conference, p 194-204, 1995.

105

[Lange93]

D. B. Lange and M. Oshima. Seven Good Reasons for Mobile Agents. Communications of the ACM, 42(3): 88-9, March 1999. [Lee99] W. Lee, S. J. Stolfo, and K. Mok. A Data Mining Framework for Building Intrusion Detection Models. Proceedings of the IEEE Symposium on Security and Privacy, 1999. [Lunt92] T. Lunt, A. Tamaru, F. Gilham, R. Jagannathan, P. Neumann, H. Javitz, A. Valdes, and T. Garvey. A real-time intrusion detection expert system (IDES) - final technical report. Technical report, Computer Science Laboratory, SRI International, Menlo Park, California, February 1992. [Mac00] Message Authentication Code. . [Magedanz96] T. Magedanz, K. Rothermel and S. Krause. Intelligent Agents: An Emerging Technology for Next Generation Telecommunications? IEEE INFOCOM 1996 (San Francisco, USA, March 24-28, 1996). IEEE, March 1996. [Masif97] Mobile Agent System Interoperability Facilities Specification. . [Milojicic98] D. Milojicic, M. Breugst, I. Busse, J. Campbell, S. Covaci, B. Friedman, K. Kosaka, D. Lange, K. Ono, M. Oshima, C. Tham, S. Virdhagriswaran and J. White. MASIF- The OMG Mobile Agent System Interoperability Facility. Mobile Agents – Second International Workshop, MA '98 (Stuttgart, Germany, September 1998). Published as Kurt Rothermel and Fritz Hohl, editors, Lecture Notes in Computer Science, 1477. Springer, September 1998. [Morreale98] P. Moreale. Agents on the Move. IEEE Spectrum, pages 34-41, April 1998. [OCSP00] The PKIX Online Certificate Status Protocol. . [PGP00] Pretty Good Privacy. . [Pkcs1] RSA Cryptography Standard. . [Pkcs7] Cryptographic Message Syntax. . [Pkcs8] Private-Key Information Syntax Standard. [Pkcs10] Certification Request Syntax. . [Pkcs12] Personal Information Exchange Syntax Standard. . [PKI00] Public Key Infrastructure. . [Pkix00] PKIX Working Group. .

106

[Pkixcmp00]

[Rfc0822] [Rfc1757] [Rubin98] [Selker94] [SSL00] [Voyager00] [VPN00]

Public Key Infrastructure Certificate Management Protocols. . Standard for the format of ARPA internet text messages. . Remote Network Monitoring Management Information Base. . A. Rubin and Jr D. E. Geer. Mobile Code Security. IEEE Internet Computing, 2(6), November 1998. T. Selker. Coach: A teaching agent that Learns. Communications of the ACM, 37(7): 92-9, 1994. Apache-SSL. . Voyager product family. . Virtual Private Network Standards. .

107

Suggest Documents