UNOBTRUSIVE TECHNIQUE FOR DETECTING LEAKAGE OF RECORDS

I J I T E • ISSN: 2229-7367 3(1-2), 2012, pp. 79-83 UNOBTRUSIVE TECHNIQUE FOR DETECTING LEAKAGE OF RECORDS S. MOHAN, K. VENKATACHALAPATHY AND R. KART...
Author: Alexina Gordon
5 downloads 1 Views 243KB Size
I J I T E • ISSN: 2229-7367 3(1-2), 2012, pp. 79-83

UNOBTRUSIVE TECHNIQUE FOR DETECTING LEAKAGE OF RECORDS S. MOHAN, K. VENKATACHALAPATHY AND R. KARTHIKA Department of Computer Science and Engineering, Annamalai University. Tamilnadu, India 1 [email protected] , [email protected], [email protected]

Abstract: In this paper, a new method is introduced for allocating a distribution model that helps us to identify the guilty agent as well as the probability of identifying leakages. Here we can provide data privacy of the records from the leakage problem.i.e. We use the fake records that are looking like an original record but contains the fake details. If some external people are trying to download or open the records without the appropriate permission, then immediately the notification will be sent to the distributers email id and mobile. Keywords: distribution model, data leakage, data privacy, fake records, leakage model.

1. INTRODUCTION During the business, sometimes sensitive data must be handed over to hypothetically trusted third parties. For example, a hospital may give patient records to researchers who will devise new treatments. Similarly a company may have partnerships with other companies that require sharing of customer data. Another enterprise may outsource its data processing, so data must be given to various other companies. We call the owner of the data as the distributor and the supposedly trusted third parties as the agents. Our goal is to detect when the distributor’s sensitive data has been leaked by agents, and if possible to identify the agent that leaked the data. We consider applications where the original sensitive data cannot be perturbed. Perturbation is a very useful technique where the data is modified and made “less sensitive” before being handed over to the agents. (For example, one can add random noise to certain attributes, or one can replace exact values by ranges.) However, in some cases it is important not to alter the original distributor’s data. For example, if an outsourcer is doing the payroll, he must have the exact salary and customer bank account numbers. If medical researchers will be treating patients (as opposed

to simply computing statistics), they may need accurate data for the patients. In this paper, we develop a model for assessing the “guilt” of agents. We also present algorithms for distributing objects to agents, in a way that improves the chances of identifying a leaker. Finally, we also consider the option of adding “fake” objects to the distributed set. Such objects do not correspond to real entities but appear realistic to the agents. In a sense, the fake objects act as a type of watermark for the entire set, without modifying any individual members. If it turns out that an agent was given one or more fake objects that were leaked, then the distributor can be more confident that agent was guilty. In Section 2, we introduce the problem setup and the notation what we use. In Sections 4 and 5, we present a model for calculating “guilt” probabilities in cases of data leakage. In Sections 6, we present strategies for data allocation to agents. Finally, in Section 7, we evaluate the strategies in different data leakage scenarios, and check whether they indeed help us to identify a leaker. (Traditionally, leakage detection is handled by watermarking, e.g., a unique code is embedded in each distributed copy. If that copy is later discovered in the hands of an unauthorized party, the leaker can be identified. Watermarks can be very useful in some cases, but

80 again, involve some modification of the original data.) 2. PROBLEM SETUP AND NOTATIONS A distributor owns a set T = {t1,…,tm} of valuable data objects. The distributor wants to share some of the objects with a set of agents U1, U2…Un, but does not wish the objects be leaked to other third parties. The objects in T could be of any type and size, e.g., they could be tuples in a relation, or relations in a database. An agent Ui receives a subset of objects, determined either by a sample request or an explicit request: 1. Sample request 2. Explicit request Sample request Ri = SAMPLE (T; mi): Any subset of mi records from T can be given to Ui. Explicit request Ri= EXPLICIT (T; condi): Agent Ui receives all the T objects that satisfy condition. Example: Say T contains customer records for a given company A. Company A hires a marketing agency U1 to do an on-line survey of customers. Since any customers will do for he survey U1 requests a sample of 1000 customer records. At the same time, company a subcontract with agent U2 to handle billing for all California customers. Thus, U 2 receives all T records that satisfy the condition “state is California.” 3. GUILT MODEL ANALYSIS To compute this Pr {Gi|S}, we need an estimate for the probability that values in S can be “guessed” by the target. For instance, say some of the objects in T are emails of individuals. We can conduct an experiment and ask a person with approximately the expertise and resources of the target to find the email of say 100 individuals. If this person can find say 90 emails, then we can reasonably guess that the probability of finding one email is 0.9. On the other hand, if the objects in question are bank account numbers, the person may only discover say 20, leading to an estimate of 0.2. We call this estimate pt, the probability that object t can be guessed by the target. The first assumption simply states that an agent’s decision to leak an object is not related to other objects. Our model parameters interact and

S. Mohan, K. Venkatachalapathy and R. Karthika to check if the interactions match our intuition, in this section we study two simple scenarios as Impact of Probability p and Impact of Overlap between Ri and S. In each scenario we have a target that has obtained all the distributor’s objects, i.e., T = S.

Figure 1: Guilt Model Analysis

4. OPTIMIZATION PROBLEM The distributor’s data allocation to agents has one constraint and one objective. The distributor’s constraint is to satisfy agents’ requests, by providing them with the number of objects they request or with all available objects that satisfy their conditions. His objective is to be able to detect an agent who leaks any of his data objects. The distributor may not deny serving an agent request as in and may not provide agents with different perturbed versions of the same objects. We consider fake object allocation as the only possible constraint relaxation. Our detection objective is ideal and intractable. Detection would be assured only if the distributor gave no data object to any agent. 5. MODULE DESCRIPTION In software, a module is a part of a program. Programs are composed of two or more independently developed modules that are not combined until the program is linked. A single module can contain one or several routines. This means that only variables that are part of module interfaces can be seen by outside modules or functions. 5.1 Login / Registration This is a module mainly designed to provide the authority to a user/agent in order to access the

Unobtrusive Technique for Detecting Leakage of Records other modules of the project. Here a user/agent can have the accessibility authority after the registration. 5.2. Data Distributor A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data is leaked and found in an unauthorized place. The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. Data Distributor provides several replication methods, it’s built in recovery mechanism is administration free and ensures that data is delivered even following downtime of the publishers involved. The decisions are not mutually exclusive. It is not uncommon for the same application to use multiple replication types. 5.3. Data Allocation Module The main focus of our project is the data allocation problem as how can the distributor “intelligently” give data to agents in order to improve the chances of detecting a guilty agent. Sample data requests agents are not interested in particular objects. Hence, object sharing is not explicitly defined by their requests. The distributor is “forced” to allocate certain objects to multiple agents only if the number of requested objects exceeds the number of objects in set T. The more data objects the agents request in total, the more recipients on average an object has; and the more objects are shared among different agents, the more difficult it is to detect a guilty agent. The goal of these experiments was to see whether fake objects in the distributed data sets yield significant improvement in our chances of detecting a guilty agent. In the second place, we wanted to evaluate our e-optimal algorithm relative to a random allocation. Data Distributor provides several replication methods, it’s built in recovery mechanism is administration free and ensures that your data is delivered even following downtime of the publishers involved. 5.3.1 Algorithms and its Description The distributor may be able to add fake objects to the distributed data in order to improve his

81

effectiveness in detecting guilty agents. Fake objects may impact the correctness of what agents do. In the first place, the goal of these experiments was to see whether fake objects in the distributed data sets yield significant improvement in our chances of detecting a guilty agent. Algorithm 1: Allocation for Explicit Data Requests Input: R1 . . . Rn, cond1. . . ,condn, b1; . . . , bn, B. Output: R1 . . . Rn, F1 . . . , Fn 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:

R ← Ø Agents that can receive fake objects for i = 1, . . . , n do if bi > 0 then R ←R∪ {i} Fi←Ø while B > 0 do i ← SELECTAGENT(R, R1 . . . Rn) f ← CREATEFAKEOBJECT (Ri, Fi, condi) Ri←Ri R∪ {f} Fi ←Fi ∪{f} bi ← bi – 1 if bi = 0 then R ← R\{Ri} B←B–1 In the second place, we wanted to evaluate our e-optimal algorithm relative to a random allocation. Algorithm 2: Sample Data Requests 1: a ← 0|T| a[k]:number of agents who have received object tk. 2: R1←Ø,……,Rn←Ø 3: remaining ← remaining + 1 4: while remaining > 0 do 5: for all i = 1, . . . , n : |Ri| < mi do 6: k ← SELECTOBJECT(i,Ri) May also use additional Parameters 7: Ri←Ri ∪ {tk} 8: a[k] ← a[k] + 1 9: remaining ← remaining – 1 Sample data requests agents are not interested in particular objects. Hence, object sharing is not

82 explicitly defined by their requests. The distributor is “forced” to allocate certain objects to multiple agents only if the number of requested objects exceeds the number of objects in set T. The more data objects the agents request in total, the more recipients on average an object has; and the more objects are shared among different agents, the more difficult it is to detect a guilty agent. Combined of Algorithm 1& algorithm 2: Optimal Agent Selection

S. Mohan, K. Venkatachalapathy and R. Karthika 5.6. Finding Guilty Agents Module The Optimization Module is the distributor’s data allocation to agents has one constraint and one objective. The distributor’s constraint is to satisfy agents’ requests, by providing them with the number of objects they request or with all available objects that satisfy their conditions. His objective is to be able to detect an agent who leaks any portion of his data. Fake objects are stored in database.

function SELECTAGENT(R,R1……..Rn)

5.7. Mobile Alert

i←argmax(1/|Rit|-1/|Rit|+1)

The cost of this greedy choice in O(n2) in every iteration. The overall running time of the resulting algorithm is O(n+n2B)= O(n2B).

In this module, an alert is sent to the distributor mobile, regarding the guilty agents who leaked the files. It is developed using NOKIA SDK 5100. This tool is designed with the help of java virtual machine. After finding the guilt agent list from the email alert, we copied the uniform resource locator (URL) address from that page and then paste it into the NOKIA SDK developer tool kit location page. After pasted the URL address on the mobile alert tool, it will display the leaker names. In this method only we have found out the particular guilt agent from that list.

5.4. Fake Object Module

6. EXPERIMENTAL RESULTS

Fake objects are objects generated by the distributor in order to increase the chances of detecting agents that leak data. Our use of fake objects is inspired by the use of “trace” records in mailing lists.

In Fig. 2 E-mail alert displayed the guilt agent list from the agents.

return i Algorithm 1 with the use of procedure SELECT AGENT() of Algorithm 2 in place of procedure SELECT AGENTAT RANDOM() in line 7. Algorithm 2 makes a greedy choice by selecting the agent that will yield the greatest improvement in the sum-objective.

The objects are distributing to agents, in a way that improves our chances of identifying a leaker. Finally, we also consider the option of adding “fake” objects to the distributed set. Such objects do not correspond to real entities but appear realistic to the agents. In a sense, the fake objects acts as a type of watermark for the entire set. 5.5. Data Leakage Protection Module In this module, to protect the data leakage, a secret key is sent to the agent who requests for the files. The secret key is sent through the email id of the registered agents. Without the secret key the agent cannot access the file sent by the distributor. The specified secret key that was given to the agent help to download the particular files. Otherwise, the file can not be opened.

Figure 2: E-mail Alert

Unobtrusive Technique for Detecting Leakage of Records In Fig. 3.Mobile alert displayed the guilt agent list from the agents.

83

small network .In the future, we will develop this for a large network to prevent the data leakage problem. References [1] R. Agrawal and J. Kiernan, “Watermarking Relational Databases,” Proc. 28th Int’l Conf. Very Large Data Bases (VLDB ’02), VLDB Endowment, 155-166, 2002. [2] P. Bonatti, S.D.C. di Vimercati, and P. Samarati, “An Algebra for Composing Access Control Policies,” ACM Trans. Information and System Security, 5(1), 1-35, 2002. [3] P. Buneman, S. Khanna, and W. C. Tan, “Why and Where: A Characterization of Data Provenance,” Proc. Eighth Int’l Conf. Database Theory (ICDT ’01), J.V. den Bussche and V. Vianu, eds., pp. 316-330, 2001. [4] P. Buneman and W. C. Tan, “Provenance in Databases,” Proc. ACM SIGMOD, 1171-1173, 2007. [5] Y. Cui and J. Widom, “Lineage Tracing for General Data Warehouse Transformations,” The VLDB J., 12, 41-58, 2003.

Figure 3: Mobile Alert

7. CONCLUSION AND FUTURE RESEARCH In a perfect world there would be no need to hand over sensitive data to agents may leak the data without their knowledge or intentionally. We have shown it is possible to assess the likelihood that an agent is responsible for a leak, based on the overlapping of his data with the leaked data and the data of other agents, and based on the probability which is overlapped with the leaked object. In this online fashion method data leakage detection method the data leaker was identified. We analyze the manual method of mobile alert. In this method we find out the particular guilt agent using the URL address. Our future work includes the improvement of the data allocation strategies. It is involved most different ways to allocating the data to prevent the leakage of data from the agents. In this paper we discussed the distributor can maintain the

[6] S. Czerwinski, R. Fromm, and T. Hodes, “Digital Music Distribution and Audio Watermarking,” http:// www.scientificcommons.org/43025658, 2007. [7] F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li, “An Improved Algorithm to Watermark Numeric Relational Data,” Information Security Applications, 138 149, Springer, 2006. [8] F. Hartung and B. Girod, “Watermarking of Uncompressed and Compressed Video,” Signal Processing, 66(3), 283-301, 1998. [9] S. Jajodia, P. Samarati, M.L. Sapino, and V.S. Subrahmanian, “Flexible Support for Multiple Access Control Policies,” ACM Trans. Database Systems, 26(2), 214-260, 2001. [10] Y. Li, V. Swarup, and S. Jajodia, “Fingerprinting Relational Databases: Schemes and Specialties,” IEEE Trans. Dependable and Secure Computing, 2(1), 34-45, 2005. [11] V. N. Murty, “Counting the Integer Solutions of a Linear Equation with Unit Coefficients,” Math. Magazine, 54(2), 79-81, 1981. [12] S. U. Nabar, B. Marthi, K. Kenthapadi, N. Mishra, and R. Motwani,”Towards Robustness in Query Auditing,” Proc. 32nd Int’l Conf. Very Large Data Bases (VLDB ’06), VLDB Endowment, pp. 151-162, 2006. [13] P. Papadimitriou and H. Garcia-Molina, “Data Leakage Detection,” Technical Report, Stanford Univ., 2008.

Suggest Documents