arxiv: v1 [cs.lg] 25 Sep 2015

Evasion and Hardening of Tree Ensemble Classifiers arXiv:1509.07892v1 [cs.LG] 25 Sep 2015 Alex Kantchelian UC Berkeley J. D. Tygar UC Berkeley Ant...

Author: Miranda Small

5 downloads 1 Views 610KB Size

Report

Download PDF

Recommend Documents

arxiv: v1 [math.na] 25 Sep 2012

arxiv: v1 [astro-ph.co] 25 Sep 2009

arxiv: v1 [physics.flu-dyn] 25 Sep 2014

arxiv: v1 [cs.ro] 25 Sep 2010

arxiv: v1 [cs.cg] 11 Sep 2015

arxiv: v1 [math.ap] 3 Sep 2015

arxiv: v1 [stat.ap] 28 Sep 2015

arxiv: v1 [physics.soc-ph] 8 Sep 2015

arxiv: v1 [math.dg] 8 Sep 2015

arxiv: v1 [math.ca] 27 Sep 2015

arxiv: v1 [astro-ph.im] 18 Sep 2015

arxiv: v1 [math.na] 8 Sep 2015

arxiv: v1 [astro-ph.sr] 30 Sep 2015

arxiv: v1 [physics.chem-ph] 3 Sep 2015

arxiv: v1 [cs.si] 4 Sep 2015

arxiv: v1 [cs.cy] 30 Sep 2015

arxiv: v1 [physics.plasm-ph] 28 Sep 2015

arxiv: v1 [astro-ph.im] 8 Sep 2015

arxiv: v1 [astro-ph.ga] 29 Sep 2015

arxiv: v1 [cs.ds] 10 Sep 2015

arxiv: v1 [math.ca] 15 Sep 2015

arxiv: v1 [q-bio.qm] 2 Sep 2015

arxiv: v1 [stat.ap] 17 Sep 2015

arxiv: v1 [physics.soc-ph] 30 Sep 2015

Evasion and Hardening of Tree Ensemble Classifiers

arXiv:1509.07892v1 [cs.LG] 25 Sep 2015

Alex Kantchelian UC Berkeley

J. D. Tygar UC Berkeley

Anthony D. Joseph UC Berkeley

Abstract Recent work has successfully constructed adversarial “evading” instances for differentiable prediction models. However generating adversarial instances for tree ensembles, a piecewise constant class of models, has remained an open problem. In this paper, we construct both exact and approximate evasion algorithms for tree ensembles: for a given instance x we find the “nearest” instance x0 such that the classifier predictions of x and x0 are different. First, we show that finding such instances is practically possible despite tree ensemble models being nondifferentiable and the optimal evasion problem being NP-hard. In addition, we quantify the susceptibility of such models applied to the task of recognizing handwritten digits by measuring the distance between the original instance and the modified instance under the L0 , L1 , L2 and L∞ norms. We also analyze a wide variety of classifiers including linear and RBF-kernel models, maxensemble of linear models, and neural networks for comparison purposes. Our analysis shows that tree ensembles produced by a state-of-the-art gradient boosting method are consistently the least robust models notwithstanding their competitive accuracy. Finally, we show that a sufficient number of retraining rounds with L0 -adversarial instances makes the hardened model three times harder to evade. This retraining set also marginally improves classification accuracy, but simultaneously makes the model more susceptible to L1 , L2 and L∞ evasions.

1

Introduction

Deep neural networks (DNN) represent a prominent success of machine learning. These models can successfully and accurately address difficult learning problems, including classification of audio, video, and natural language possible where previous approaches have failed. Yet, the existence of adversarial “evading” instances for the current incarnation of DNNs [1] shows a perhaps surprising brittleness: for virtually any instance x that the model classifies correctly, it is possible to find a negligible perturbation δ such that x + δ evades being correctly classified, that is, receives a (sometimes widly) inaccurate prediction. The general study of the evasion problem matters for at least two reasons, one of which is conceptual, the other practical. First, we expect a high-performance learning algorithm to generalize well and be hard to evade: only a “large enough” perturbation δ should be able to alter its decision. The existence of small-δ evading instances shows a defect in the generalization ability of the model, and intuitively calls for either a different model class, or additional model regularization. Second, machine learning is becoming the workhorse of security-oriented applications, the most prominent example being spam email filtering. In those applications, the attacker has a large incentive for finding evading instances. For example, a spammer is interested in finding a small, cheap, set of changes to his emails to get them through the machine learning detector. Thus, showing that only a large enough δ will cause a change in the prediction implies that the model is immune to practical evasion attacks. While prior work extensively studies the evasion problem on differentiable models by means of gradient descent or other approximation heuristics, those results are reported in an essentially quali1

tative fashion, implicitly defaulting the choice of metric for measuring δ to the L2 norm. Moreover, only limited and sporadic qualitative comparisons between models from different classes and algorithms are available. However, the largest missing piece on evasion is the conspicuous absence of attacks for non-differentiable, non-continuous models. Tree sum-ensembles as produced by boosting or bagging are perhaps the most important models from this class as they are often able to achieve competitive performance and enjoy good adoption rates in both industrial and academic contexts [2]. In this paper, we develop both exact and approximate evasion attacks for sum-ensemble of trees. In the exact (or optimal) evasion attack, we compute the smallest δ according to a given metric such that the model misclassifies x + δ. The approximate attack does not guarantee optimality but runs faster. We apply these to a concrete handwritten digit classification task. Our exact attack relies on a reduction to Mixed Integer Programming and enables precise quantitative L0 , L1 , L2 and L∞ robustness comparisons with several other models for which we can compute optimal attacks: L1 and L2 regularized logistic regression and max-ensemble of linear classifiers (shallow max-out network). We also provide L1 , L2 and L∞ robustness upper-bounds for a 3 hidden layer, sigmoidal activation function neural network and a classic Gaussian kernel SVM. The comparison shows that despite having one of the best classification accuracies, tree ensembles as learned by gradient boosting are consistently the most brittle models for L1 , L2 and L∞ distances. Finally, our approximate attack is based on a fast coordinate descent scheme and enables quick generation of evading instances. We use this ability to generate more than 500, 000 L0 -attack instances and retrain a gradient boosted tree model on those. This produces a dramatically hardened model: the median number of pixels modifications required for a mislabeling improves from the original 8 to 30 in the retrained model. Additionally, the retraining leads to an increase in testing accuracy, so that retraining on adversarial instances can be understood as a form of model regularization.

2

Related Work

Following a taxonomy introduced in Barreno et al. [3], evasion attacks are part of the larger family of exploratory attacks which occur at testing time, for a fixed classifier. Such attacks have been considered from the onset of the adversarial machine learning subfield. For example, Lowd [4] describes an attack on linear spam filters, where the spammer finds and appends enough benign-weighted terms to avoid detection. Similarly, Srndic et al. [5] present and evaluate several concrete evasion attacks for a malware detection system based on linear and RBF-SVM under different adversary power assumptions. Additionally, a number of researchers have considered the theoretical aspect of evasion. Lowd et al. [6] investigate evasion of linear models when the models are kept secret but queries are allowed. Nelson et al. [7] later extend this analysis to any model that induces a convex decision boundary in a continuous feature space. More recently, Fawzi et al. [8] provide theoretical results on the robustness of linear and quadratic models when the models are fully known. In this paper, we forgo application-specific feature extraction and mapping by working on a direct pixel-to-feature mapping for our handwritten digit recognition benchmark learning task. Similar to Fawzi et al. [8], we do not limit the amount of information available to the adversary. Our goal is to establish the intrinsic evasion robustness of the machine learning models themselves, and thus provide a guaranteed lower-bound on the attacker effort in the worst-case. Closer to our work, Biggio et al. [9] introduce evasion as a constrained optimization problem and proceed to compute L1 -evading instances for RBF-SVM using a projected gradient descent method. We effectively use a similar method to compute evasion attacks for the differentiable models where no optimal evasion algorithm is known: RBF-SVM and (sigmoidal) deep neural network. Those provide the comparison ground for our main result regarding boosted trees. Szegedy et al. [1] use a quasi-Newtonian method to first demonstrate that L2 -small perturbations on image recognition tasks significantly impact state-of-the-art deep learning classifiers. This result is later refined in Goodfellow et al. [10], showing that for large-dimensional image data, a single small gradient step on the model score is sufficient to obtain an evading instance. This observation supports the authors’ view that evasion is possible only because practical deep neural networks behave too linearly. Using the single-step strategy, the authors retrain a model on both original and evading instances, and show that both test accuracy and robustness are improved. In this paper, we show a similar positive effect of retraining on L0 -evading instances for boosted trees. To do so, we develop both optimal and approximate gradient-free attack algorithms for tree ensemble models. Unlike Goodfellow et 2

al. [10], we show that retraining tree ensembles significantly strengthens the resulting model. We also demonstrate that despite their extreme non-linearity, boosted trees are even more susceptible to evasion attacks than deep neural networks. Finally, we mention two related works from the deep neural network community. Nguyen et al. [11] study a version of evasion where the small δ constraint is altogether dropped and replaced by a “regularity” constraint on the attack instance. They show that it is possible to generate well structured but meaningless images which obtain very certain predictions on state-of-the-art DNN models. On the counter-measures side, Gu et al. [12] show some promising results by augmenting DNNs with a pre-filtering step based on a form of contractive auto-encoding. In this paper, we develop a fast evasion attack for tree ensembles which enables investigating the viability of retraining on adversarial instances to increase model robustness. Remarkably, retraining does produce significantly more robust models without loss of accuracy.

3

Evading Tree Ensemble Models

Let m : X → Y (model) be a classifier. For a given instance x ∈ X and a given distance function d : X × X → R+ , the optimal evasion problem is: minimize d(x, x0 ) subject to m(x) 6= m(x0 ) 0 x ∈X

(1)

In this paper, we focus on binary classifiers Y = {−1, 1} defined over an n-dimensional feature space X ⊂ Rn . Setting the classifier m aside, the distance function d fully specifies (1), hence we can talk about d-adversarial instances, or d-robustness. As different measures of instance deformation lead to different solutions, we present results for four representative distances. We briefly describe those and their typical effects on the solution of (1). Pn • The L0 distance i=1 Ixi 6=x0i , or Hamming distance encourages the sparsest, most localized adversarial deformations with arbitrary magnitude. Pn • The L1 distance i=1 |xi − x0i | encourages localized adversarial deformations and additionally controls for their magnitude. pPn 0 2 • The L2 distance i=1 (xi − xi ) encourages less localized but small adversarial deformations. • the L∞ distance maxi |xi −x0i | encourages uniformly spread adversarial deformations with the smallest possible magnitude. Note that when m is taken to be a thresholded sum-ensemble of trees, (1) becomes NP-hard. Indeed, 3-SAT is polynomially-time reducible to (1)as follows: first, let x0 represent the assignment of values to the variables of the 3-SAT instance. Construct m by arranging each clause as a binary decision tree where the decision nodes are the variables and all leaf predictions are 0 except for the leaf in the path corresponding to the truthness of the initial clause. Set the prediction of this leaf to 1. The model has as many decision trees as clauses in the 3-SAT instance, and each tree has a maximal depth of 3. Set the threshold of m to the number of trees, so that m(x) = 1 if and only if all trees predict 1, which correspond to all clauses being satisfied. Finally, remove the objective of (1) and set x to all 0s (if m(x) = 1, we are done). We describe two approaches for generating evading instances on sum-ensembles of trees. The first approach solves the optimal problem (1) by reformulation into a Mixed Integer Program (MIP). The resulting program can be subsequently solved by an off-the-shelf MIP solver. This method guarantees optimality of the solution and makes quantitative statements about model robustness possible. As a faster alternative to the exact method, we develop a second approach based on coordinate descent. Although this method does not guarantee optimality, we use it to generate evading instances quickly for the retraining experiment. 3.1

Optimal Evasion Attack

We present a reduction of problem (1) into mixed integer programming when m is taken to be a binary classifier derived from a thresholded at 0 sum of regression trees T . Specifically, let M : 3

X → R be the signed margin prediction of the model, that is, m(x) = 1 ⇔ M (x) ≥ 0. M is simply the sum of each regression tree T ∈ T . In turn, each regression tree T defines a partition PT of X where each domain L ∈ PT is associated with a prediction value vL ∈ R. Let J be the maximum tree depth in the model. As those domains actually correspond to the leafs of T , each domain L is defined as the conjunction of at most J atomic propositions l ∈ L of the form “xi < t” or “xi ≥ t” for some feature index i and some threshold t ∈ R. Thus, M can be written as a weighted sum of indicator functions defined as conjunctions of atomic propositions. We have: ( ) ^ X X X vL I x∈l M (x) = T (x) = (2) T ∈T

T ∈T L∈PT

l∈L

where each domain l is of the form {x|xi < t} or its complement X \ {x|xi < t}. Our reduction to MIP is based on Form (2). In what follows, we present the mixed integer program by defining four groups of MIP variables. We then model (2) by the atom and rule consistency families of constraints. The model mislabel constraint enforces the condition that m(x) 6= m(x0 ) in (1). Finally we reduce the objective of (1) to a set of constraints and an objective in MIP Objective. Program Variables Our reduction uses four families of variables. For clarity, every MIP variable is bolded. • n continuous variables (fi )i (features) encoding the values of the coordinates of the feature vector x0 P P • T ∈T L∈Pt 1 binary variables (ri )i (rules) encoding the values of the indicator functions. Each binary decision tree with maximal depth J has at most 2J leaves, so that there can be no more than |T |2J such variables. P P • At most T ∈T L∈PT |L| binary variables (ai )i (atoms) encoding the truth values of all atomic propositions. Our implementation avoids creating redundant a variables by detecting equivalent or negation of existing atoms (replaced by 1 − a) across all trees of the model. Because each binary decision tree with maximal depth J has at most 2J −1 decision nodes, there can be no more than |T |(2J − 1) such variables. • At most n + 1 continuous or binary variables (bi )i (bounds) for expressing the distance d(x, x0 ) of problem (1). These variables are first used in the MIP Objective paragraph. We enforce the logical equivalence between the reduction and the semantics of M by the following constraints. Atom consistency Without loss of generality, consider an atom of the form x0i < t. Let fi be the feature variable corresponding to x0i and a ∈ {0, 1} the atom variable corresponding to the truthness of the atom. The following two linear constraints are such that a = 1 ⇔ x0i < t: t ∗ (1 − a) + fimin ∗ a ≤ fi ≤ t ∗ a − (3) where > 0 is small enough and fimin is a constant representing the smallest possible value for feature i. V 0 Rule consistency Let an indicator function indL (x0 ) = I l∈L x ∈ l , its associated rule variable ri ∈ {0, 1} and atom variables a1 , . . . , a|L| corresponding to the truthness of the atomic propositions (x0 ∈ l)l . The following |L| + 1 constraints are such that ri = 1 ⇔ indL (x0 ) = 1: ∀k ∈ {1, . . . , |L|}, ri ≤ ak and ri ≥

|L| X

ak − |L| + 1

(4)

k=1

Model mislabel Without loss of generality, consider an original instance x such that M (x) < 0. In order for x0 to be an evading instance, we must have M (x0 ) ≥ 0. This condition is captured by the following constraint: X X vL ∗ r L ≥ 0 (5) T ∈T L∈PT

where for notational brevity we have indexed the rule variables (ri )i by their corresponding domains L. 4

MIP Objective Finally, we need to translate the objective of problem (1) into an MIP form. In this paper, we consider the four classic distance metrics for d: L0 , L1 , L2 and L∞ . We start with the L1 case, and explain how to obtain the objective from there in the P other cases. Let n (non-negative) continuous variables (bi )i (bounds). We have kx − x0 k1 = 1≤i≤n bi under the following 2n constraints: ∀i ∈ {1, . . . , n}, −bi ≤ fi − xi ≤ bi (6) To obtain the L0 case, it suffices to make all b variables binary and replace the above constraints by ∀i ∈ {1, . . . , n}, −(fimax − fimin ) ∗ bi ≤ fi − xi ≤ (fimax − fimin ) ∗ bi

(7)

To obtain the L∞ case, we can introduce an additional continuous variable B and n additional constraints such that kx − x0 k∞ = B: ∀i ∈ {1, . . . , n}, bi ≤ B

(8)

Finally, to obtain the L2 case, we need only to change the objective to the quadratic form P 2 0 2 1≤i≤n bi = kx − x k2 , at the cost of turning our otherwise purely linear formulation into a harder to solve Mixed Integer Quadratic Problem (MIQP). In summary, this is a relatively compact reduction for problem (1). For a model which consists of |T | trees of maximal depth J, the model size is in O(|T |2J ) and our formulation contains at most 2|T |2J + 2n − |T | variables and at most (3 + J)|T |2J + 2n − 2|T | + 1 constraints. Note that the exponential dependency on J is not an issue for practical purposes as tree ensembles and in particular boosted trees are predominantly used with small tree depths, e.g., J ≤ 10. 3.2

Approximate Evasion Attack

While the above reduction of problem (1) to an MIP is linear in the size of the model m, the actual solving time can be anywhere from a few seconds to 30 minutes, even for the state-of-the-art industrial solver we use [13]. We thus develop an approximate evasion algorithm to generate good quality evading instances. Our approximate evasion attack is based on the following iterative coordinate descent procedure: let B > 0 be the maximal amount of adversarial deformation we are willing to tolerate. Without loss of generality, suppose M (x) < 0. Starting from the original instance x0 ← x, we compute at each iteration the best change of a single dimension of x0 so that M (x0 ) increases. The algorithm terminates when d(x, x0 ) > B. The critical part of this approach is finding the best dimension of x0 to change at every round. Formally, we want to solve the following problem: maximize M (x0 + ∆) n ∆∈R

subject to k∆k0 = 1

(9)

Because solving (9) is often expensive in practice, we decouple finding the best dimension (i such that ∆i 6= 0) and finding the optimal step size ∆i . Thus, we instead compute the optimal dimension for a fixed step α ∈ R. We consider the following “discretized gradient” problem: maximize M (x0 + αu) subject to ui = 1 and ∀j 6= i, uj = 0 1≤i≤n

(10)

A naive approach to (10) is to compute M (x0 +αu) for all n dimensions. Unfortunately, this solution becomes too slow in medium or large dimensional feature spaces. We present a fast algorithm for solving (10) which takes advantage of the structure of tree ensemble models. The idea is to consider the prediction path for each tree of the ensemble, i.e. the sequence of decision nodes from the root node down to the leaf node holding the prediction value. As each decision node references a single dimension, it is easy to check whether perturbing this dimension by amount α would produce a different branching decision. If the perturbed decision is different from the original decision, then we follow both resulting prediction paths and update the gradient value of the perturbed dimension by the difference between the perturbed and original predictions. Algorithm 1 formalizes this idea. Each decision tree is represented by its root node. A node object can be either a leaf node or a decision node. Decision nodes have four attributes: node.f and node.thres represent the feature index i and the corresponding threshold t of the decision atom “xi < t”. node.left points to the subtree where the atom is true, node.right correspondingly points to the false case. Leaf nodes have a single attribute, node.prediction holding the float prediction value. 5

The fast discrete gradient computation runs in Algorithm 1 Fast discretized gradient for ensemO(|T |J 2 + n): TREE G RAD is called at most J ble of trees model. times, each call of TREE G RAD calls PREDICT function GRAD(model, x, α) at most twice, and PREDICT recurses at most J g ← [0, 0, . . . , 0] (vector of n zeros) times. In contrast, a naive implementation for treeRoot ∈ model do √of GRAD would run in O(n|T |J). As J n, TREE G RAD(treeRoot, x, α, g, ∅) this approach offers considerable speed-up. return g We use this method to generate a large function TREE G RAD(node, x, α, g, considered) amount of L0 -adversarial instances for the reif node.isLeaf() then return training experiment below in subsection 4.2. Specifically, we compute at each round original ← (x[node.f] < node.thres) the discretized gradients for all α ∈ if original then nextNode ← node.left {−1, −2−1 , . . . , −2−8 , 2−8 , . . . , 2−1 , 1} and else nextNode ← node.right select the best combination of dimension and if node.f 6∈ considered then α to change. The algorithm terminates once pert ← (x[node.f] + α < node.thres) the budget of dimensions to modify is exif original 6= pert then hausted. if pert then nextNodeP ← node.left else nextNodeP ← node.right h ←PREDICT(nextNode, x) 4 Results x[node.f] ← x[node.f] + α h ← h+PREDICT(nextNodeP, x) We turn to the experimental evaluation of the x[node.f] ← x[node.f] − α robustness of tree ensembles using the above g[node.f] ← g[node.f] + h two attack strategies. We start by describing considered ← considered ∪ {node.f} the evaluation dataset and our choice of modTREE G RAD (nextNode, x, α, g, considered) els for benchmarking purposes, before moving to a quantitative comparison of the robustness function PREDICT(node, x) of boosted tree models against a garden variety if node.isLeaf() then return node.prediction of learning algorithms. We finally show that the brittleness of boost trees can be effectively if x[node.f] < node.thres then addressed by successive iterations of model rereturn PREDICT(node.left, x) training on evading instances. return PREDICT(node.right, x) Our running binary classification task is to distinguish between handwritten digits “2” and “6”. To this end, we use the subsets of digits labeled “2” or “6” in the training and testing datasets of MNIST [14]. Our training and testing sets respectively include 11,876 and 1,990 images and each image has 28 × 28 gray scale pixels (the gray scale has 256 different possible values). So we have n = 784 and X = [0, 1]n . In addition to these two sets, we create an evaluation dataset of a hundred instances from the testing set such that all instances are correctly classified by all of the benchmarked models. These correctly classified instances are to serve the purpose of x, the starting point instances in the evasion problem (1).

1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 Lin. L

Testing Error (%)

We choose to use second order gradient boosting (denoted BDT) in the XGBoost [15] implementation as the tree ensemble learner because of its outstanding performance as an off-the-shelf general purpose learning algorithm. Our model uses one thousand trees of maximal depth 4, with a learning rate of 0.05. We also include the classic L1 /L2 regularized logistic regressions (denoted Lin. L1 and Lin. L2 ) and RBF-SVM trained using LibLinear [16] and LibSVM [17] respectively. We introduce two more learning models for benchmark purposes.

1

Lin. L2

NN

CPM RBF-SVM BDT

To compare the robustness of boosted trees, we contrast Figure 1: Testing error rates for all conit with a deep neural network. We use Theano [18] to im- sidered models. plement a sigmoidal (tanh) activation network with three hidden layers (denoted NN) in a 784-40-30-20-1 architecture, and train it by gradient descent for a top logistic regression layer, without pre-training nor drop-out. Finally, our last benchmark model is the equivalent of a shallow neural network made of two max-out units (one unit for each class) each made of thirty linear classifiers. This model corresponds to the difference of two Convex 6

30 20 10 0

Lin. L1 Lin. L2 NN

CPMRBF-SVM BDT

4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0

0.35

L∞ attack effort

L2 attack effort

L1 attack effort

50 40

0.30 0.25

0.20 0.15

0.10 0.05

Lin. L1 Lin. L2 NN

CPMRBF-SVM BDT

0.00

Lin. L1 Lin. L2 NN

CPMRBF-SVM BDT

Figure 3: Optimal or best-effort deformation bounds for different metrics on our evaluation dataset of one hundred samples. The smallest bounds, 25-50% and 50-75% quartiles and largest bounds are shown. The red line is the median score. Larger scores mean more deformations are necessary to change the model prediction. Polytope Machines [19] (one for each class) and we use the authors’ implementation (CPM). Two factors motivates the choice of CPM. First, previous work has theoretically considered the evasion robustness of such ensemble of linear classifiers and deemed the problem NP-hard in the general case [20]. Second, unlike RBF-SVM and NN, this model can be readily reduced to a Mixed Integer Program, enabling optimal attacks thanks to a MIP solver. As the reduction is considerably simpler than the one presented for tree ensembles above in subsection 3.1, we omit it here. Finally, we use Gurobi [13] as the MI(Q)P solver. As our main goal is not to compare model accuracies, but rather to obtain the best possible model for each model class, we train on the training set alone and simply tune the hyper-parameters so as to minimize the error on the testing set. Figure 2 shows the testing errors of the learned models. BDT, RBF-SVM and CPM have the lowest indistinguishable error rate, NN closely follows, and the linear models are far behind, with the L1 regularized classifier being slightly worse than its L2 counterpart. 4.1

Robustness

For each of the 6 learned models, and for all of the 100 correctly classified testing instance, we compute optimal or best effort evasion attacks under the deformation metrics. The evasions for all linear models and metrics are optimally computed by the MIP solver. We also reduce all evasions for CPM and BDT to a MIP formulation but can only solve to optimality for the L0 , L1 and L∞ metrics. For the quadratic L2 -evasion, we set a time limit of one hour and use the best solution computed in that time. The time out is always reached. We use a classic projected gradient descent method for solving the L1 , L2 and L∞ evasions of NN and RBF-SVM, but do not attempt to address the combinatorially hard L0 -evasion in this paper. Figure 3 summarizes the obtained adversarial bounds as one boxplot for each combination of model and distance. Although BDT gives the best testing accuracy, it systematically ranks last for robustness, with all of the metrics we consider. Remarkably negligible L1 or L2 perturbations suffice to evade BDT. On the other end, RBF-SVM is apparently the hardest model to evade, agreeing with results from [10]. While NN and CPM perform generally on par, the L1 -regularized linear model exhibits significantly more brittleness than its L2 counterpart. This phenomenon is explained by large weights concentrating in specific dimensions as a result of sparsity. Thus, small modifications in the heavily weighted model dimensions result in potentially large classifier output variations. 7

Figure 2: Examples of optimal or best effort evading “6” instances. The last row represents the L0 -robustified BDT model after 60 retraining iterations. We do not compute L0 evasions for NN and RBF-SVM models.

50

Lin. L2

CPM

BDT

5

40

4

30

3

20

2

10 0

Lin. L1

6

Number of test errors

L0 attack effort

L0 attack effort

40 35 30 25 20 15 10 5 0

1

10

20

30

40

Retraining rounds

50

60

1

01

10

20

30

40

Retraining rounds

50

60

(a) Exact L0 robustness for com- (b) L0 robustness of the sequence of (c) Testing errors of the sequence of patible models. retrained BDTs. retrained BDTs.

Figure 4: Making BDT robust against L0 -attacks by retraining. 4.2

Robustification

We demonstrate that it is possible to significantly improve the robustness of the BDT model by retraining on adversarial instances. We iteratively generate L0 -adversarial instances for the current model and for all the 11,876 original training instances using the approach outlined in subsection 3.2, with maximum modification budget B = 10 pixels. We then train on the original training dataset augmented with all of the adversarial but correctly labeled instances to obtain the next iteration model. We keep the model hyper-parameters constant at 1,000 trees with maximal depth 4. We perform this procedure for 60 rounds to generate more than 700,000 additional training instances, 60 times more adversarial instances than original instances. We note that our fast attack finds a good quality evading instance in less than a second, while solving the L0 problem to optimality in the exact approach can take up to several hours for the models from the later iterations. Figure 4a shows the exact L0 robustness bounds for the models where this can be computed. Figure 4b shows the evolution of the optimal L0 -adversarial bound as retraining occurs. While the first few retraining rounds produce more brittle models, the final model enjoys a median lower-bound of 30 modified pixels, where as few as 8 pixels were initially sufficient to evade the classifier. This is also about twice as good as the CPM bound. We can also observe in figure 4c that the retrained models tend to make fewer testing errors than the original model. To see if the gained robustness against L0 deformations translates into robustness against other deformations, we also measured the L1 , L2 and L∞ robustness bounds. Unfortunately, we found significantly lower scores on all three metrics compared to the original model: hardening against L0 attacks made the model more sensitive to all other types of attacks.

5

Conclusion

Gradient boosted trees are simultaneously among the most accurate learning models and the easiest ones to evade. This might have problematic implications when those algorithms are fielded in adversarial settings. On the other hand, boosted trees can be successfully hardened against L0 -evasion attacks by successive retraining. The retraining strategy also improves model accuracy and can thus be understood as a form of model regularization. In future research, it would be interesting to investigate how the tree ensembles generated by random forests compare to boosted decision trees on the robustness benchmark. Investigating if boosted trees can simultaneously be hardened against all four types of attacks is also an interesting open problem. Finally, capitalizing on the regularization effect of retraining adversarial instances, a significant breakthrough would be to achieve a similar regularization effect without having to generate new training instances and unduly expand already large-scale training sets. Thus, a lightweight adversarial regularization method has the potential to further increase the accuracy of boosted trees and make them potentially more suitable to security-sensitive application domains.

8

References [1] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint, 2013. [2] Tim Salimans. Second place winner entry for the Higgs Boson Machine Learning Kaggle Challenge. https://github.com/TimSalimans/HiggsML. Accessed: 2015-06-05. [3] Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. D. Tygar. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security, ASIACCS ’06, 2006. [4] Daniel Lowd. Good word attacks on statistical spam filters. In Proceedings of the Second Conference on Email and Anti-Spam (CEAS), 2005. [5] Nedim Srndic and Pavel Laskov. Practical evasion of a learning-based classifier: A case study. In Proceedings of the 2014 IEEE Symposium on Security and Privacy, S&P ’14, 2014. [6] Daniel Lowd and Christopher Meek. Adversarial learning. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ’05, 2005. [7] Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, and J. D. Tygar. Query strategies for evading convex-inducing classifiers. Journal of Machine Learning Research, 13, May 2012. [8] Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Analysis of classifiers robustness to adversarial perturbations. arXiv preprint, 2014. [9] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim rndi, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In Machine Learning and Knowledge Discovery in Databases, volume 8190 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2013. [10] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint, 2014. [11] A Nguyen, J Yosinski, and J Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Computer Vision and Pattern Recognition. 2015. [12] Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. arXiv preprint, 2015. [13] Inc. Gurobi Optimization. Gurobi optimizer reference manual, 2015. [14] Yann LeCun, Corinna Cortes, and Christopher J.C. Burges. MNIST dataset, 1998. [15] Tianqi Chen and Tong He. XGBoost: eXtreme Gradient Boosting. https://github. com/dmlc/xgboost. Accessed: 2015-06-05. [16] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 2008. [17] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 2011. [18] James Bergstra, Olivier Breuleux, Fr´ed´eric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), 2010. [19] Alex Kantchelian, Michael C. Tschantz, Ling Huang, Peter L. Bartlett, Anthony D. Joseph, and J.D. Tygar. Large-margin convex polytope machine. In Advances in Neural Information Processing Systems, 2014. [20] David Stevens and Daniel Lowd. On the hardness of evading combinations of linear classifiers. In Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, AISec ’13, 2013.

9