THE DISTRIBUTION OF MALICIOUS DOMAINS

T H E D O M A I N T O O L S R E P O R T, 2 0 1 6 E D I T I O N THE DISTRIBUTION OF MALICIOUS DOMAINS SUMMARY In our previous reports, we profiled ma...
23 downloads 0 Views 674KB Size
T H E D O M A I N T O O L S R E P O R T, 2 0 1 6 E D I T I O N

THE DISTRIBUTION OF MALICIOUS DOMAINS

SUMMARY In our previous reports, we profiled malicious domains by describing patterns in their registration details: top level domain (TLD), free email provider, Whois privacy provider, and hosting location. In this edition, we compared the distributions of malicious domains vs neutral domains across a measure of age (both of the domain and of the name server domain) and a measure of the entropy of the domain name. We also examined malicious domains across registrars to find additional clues as to how and when these domains were registered.

KEY FINDINGS

D OM AI N A G E

DOMAIN NAME ENTROPY

Even among young domains, there are far more

Domain names with high entropy—that is, those that

neutral than malicious domains. However, when we

are gibberish combinations of letters and numbers—

examine bad domains as a class, many more of them

are more likely to be malicious than linguistically

are relatively young. Neutral domains, as a class,

coherent domains. While this may not be surprising,

show less of a skew toward youth.

it is informative to see the specific data.

N AM E SE R V ER D O M A IN A GE

DOMAIN REGISTRAR

Most domains have a name server associated with

Some domain registrars stand out for having high

them. The domains of the name servers themselves

percentages of malicious domains registered through

can act as a statistical signal; the signal shows that

them. And, in one particular case, the registrar

more malicious domains have comparatively young

also has fairly high absolute numbers of malicious

name server domains.

domains in addition to a high percentage.

© Copyright DomainTools, 2016. All Rights Reserved.

1

OVERVIEW

In the DomainTools Report, we mine DomainTools data in order to discover patterns in domain registrations that may help researchers or security analysts learn more about concentrations of malicious activity. In the first two reports, we examined attributes such as top level domain (TLD), Whois privacy providers, and registration behaviors of domain registrants strongly connected to highvolume malicious activity. We believe that malicious actors behave in a predictable manner, and the more we profile that behavior, the better we can defend against them. Those prior reports found high concentrations of malicious domains in various Japanese and Chinese privacy providers, email providers, and bulk domain registration agents. The data in those reports have helped us paint a broad picture, and the data in this latest report hopes to add to our understanding of cybercriminals. For this edition of the report, we examined several new attributes, some of which readers may have considered before. They include:

As in earlier editions of The DomainTools Report, having nearly all of the existing domains’ registration information at our fingertips has allowed us to pull out some interesting patterns. Ultimately, we believe it will be possible to predict the likelihood that a new or previously-unseen domain will be malicious, based on its unique composition of attributes. We will do this via a variety of techniques including machine learning.

NOTE DomainTools already has a proven algorithm for predicting risk of domains based on how tightly-coupled they are to existing malicious activity. The work described here may be able to complement that algorithm by identifying risky domain profiles even when the domains in question are not closely connected to prior bad behavior.

Like snowflakes or fingerprints, no two domains are exactly alike. At a minimum, each name is unique, but in most cases there are multiple attributes that differ. Some of these differences—individually or in concert with others—may help predict the risk level of the domain.

>> Age of the domain (as of Feb 2016) >> Age of name server domain (as of Feb 2016) >> Entropy of the domain name composition >> Registrar of the domain

© Copyright DomainTools, 2016. All Rights Reserved.

2

METHODOLOGY AND CHARTING

THE DATA SET Readers of earlier editions will recall that our methodology is to look at well-vetted blacklists and at the entire population of active domains. We attempt to find spikes in the relative concentration of known-bad domains versus the overall background levels of badness. For this report, the data set we examined was approximately 140,000,000 domains extracted from passive DNS data. For the set of malicious domains, we used data from high quality blacklist feeds from partners of DomainTools. To be clear, while there are well over 300 million domains currently registered worldwide, we opted to analyze the subset that are actually seen in DNS requests as we believe the results of the analysis are more relevant when they focus on domains actually receiving traffic. CALCULATIONS In previous editions of the DomainTools Report, we introduced what we call “VCP” Charts (Volume, Concentration, Proportion). In this report, we establish a new calculation, which we call “Signal Strength.” This describes how malicious domains are distributed across a certain linear attribute, such as age, compared to how neutral domains are distributed across the same attribute. It is essentially a measurement of how much the distribution skews towards being an indicator of a malicious domain. For example: of all blacklisted domains, 4.46% are currently between 12 and 13 months old. Of all neutral domains, 1.52% of them are in this same age range. Therefore, if we compare the two percentages, the rate at which malicious domains fall into that age range is 2.93 times the rate at which neutral domains fall into that same range. We call this a “signal strength” of 2.93. Please note that, to improve legibility, the age data we present in this report is grouped by quarter (3 months, 6 months, 9 months, and so forth). Why do we measure signal strength across each value in the distribution? While our report can simply compare the two distributions as a whole through standard statistical measures, we wanted our research to help inform our risk scoring and reputation scoring algorithms as to which attributes and which values within that attribute indicate maliciousness, and the relative strength of the signals.

© Copyright DomainTools, 2016. All Rights Reserved.

3

DOMAIN AGE

Many security professionals are leery of brand-new domains. Some have even suggested that all new domains should

NOTE

go through a mandatory “waiting period” during which

Malicious domains don’t tend to stay around as long

they must prove themselves to be free of harmful activity

as neutral ones. Some of them are taken down by law

(malware, phishing, etc) before they can be released into

enforcement, ICANN, research/white-hat sinkholes, etc.

DNS for general use. These ideas are well-intentioned but

Others are used for a brief time and then discarded as

they aren’t likely be adopted in the near future. Thus, it falls

they appear on blacklists and become less effective.

to the security community to provide effective defenses

Blacklisted domains don’t necessarily get taken down

against harmful domains of any age.

per se, but they often are not renewed by their owners and thus drop out of DNS after a year or two.

We examined the rates at which domains of various ages appeared on blacklists. The results (depicted in the first chart in this section) tend to support at least some level of “age discrimination” against domains. However, compared to all existing and active domains, these are still comparatively small numbers. It makes the most sense from an analytical standpoint to look at the age distribution of all domains in a classification, the classes being “malicious” and “neutral.”

KEY TAKEAWAYS As we can see, the distributions show a significantly younger average for malicious domains than for neutral domains. Malicious domains tend to be younger, and they will not remain active for any extended periods of time. For anyone who has studied spam or phishing campaigns, this may not be surprising. Domains used for those activities are often

Given the huge population of domains over the history of the Internet, and because a lot of malicious domains are

registered, used, and discarded over a very brief period of time—sometimes well under one day.

ephemeral (see note), it is logical that neutral domains skew older than malicious ones. Of all malicious domains reported by our blacklist feeds, over 75% are under 13 months old. However, it is still worth noting that as a percentage of all youthful domains, malicious ones are a relatively low

The last chart on the next page provides another way of seeing that maliciousness is heavily skewed towards younger domains. This is especially true at 21 months and younger, as these have signal strength well above the average.

population.

Age Distribution of Neutral Domains (by month) 25%

20%

20%

15%

15%

10%

10%

5%

5%

0%

0% 3 9 15 21 27 33 39 45 51 57 63 69 75 81 87 93 99 105 111 117 123 129 135 141 147 153 159 165 171 177 183 189 195 201 207 213 219 225 231 237 243 249 255 261 267 273 279 285

% of distribution

25%

3 9 15 21 27 33 39 45 51 57 63 69 75 81 87 93 99 105 111 117 123 129 135 141 147 153 159 165 171 177 183 189 195 201 207 213 219 225 231 237 243 249 255 261 267 273 279 285

% of distribution

Age Distribution of Malicious Domains (by month)

© Copyright DomainTools, 2016. All Rights Reserved.

4

The two charts on the previous page show the distribution over time of malicious and neutral domains respectively. The first chart on this page overlays both distributions. It is important to note that the X-axis is “contribution to the class,” not absolute numbers—in other words, the taller red bars do not mean there are more bad domains than good domains. The last chart shows the signal strength calculation for maliciousness as a function of domain age, including a line indicating the average and a 95% confidence interval.

Age Distribution of Domains - Malicious vs Neutral (by month) 25%

% of distribution

20%

15%

10%

Age of Domains - Signal Strength (by month)

2

267

273

279

285

267

273

279

285

255

261

255

261

243

249

243

249

231

237

231

237

219

225

219

225

207

213

207

213

195

201

195

201

183

189

183

189

171

177

171

177

159

165

159

165

147

153

147

153

135

141

135

141

123

129

123

129

111

117 117

99

105 105

93

99

81

87

69

75

63

51

57

39

45

33

21

27

3

Average 9

1

0% 0

15

Signal Strength

4 5% 3

Age of Domains - Signal Strength (by month) 4.0

3.5

3.0

2.0

1.5

1.0

0.5 Average

111

93

81

87

75

69

63

51

57

45

39

33

21

27

15

9

0.0 3

Signal Strength

2.5

© Copyright DomainTools, 2016. All Rights Reserved.

5

NAME SERVER DOMAIN AGE

Almost every domain in DNS requires a name server host

Conficker and its variants were widespread from 2008-2011

that is authoritative for the domain. Name servers, in turn,

or so (and have reappeared occasionally since), but the

include a domain; for example, in ns1.domaincontrol.com, the

sinkholes were a key part of fighting back against the virus.

name server domain is “domaincontrol.com.” We applied the

Because part of Conficker’s “lifecycle” was to generate large

same analytical framework to the name server domain age

numbers of domains, the sinkholes accumulated the high

as we did with domain age to identify whether or not there

numbers that contribute to the spike seen here. Sinkholes

was a strong signal of maliciousness.

are a method that researchers and white-hat hackers use to neutralize command-and-control infrastructure or to study

The chart on this page shows an interesting pattern. It

malware. The sinkhole causes the malware either to connect

turns out that if a domain is malicious, it is much more likely

to non-routable IP addresses (effectively halting it) or to

that it has a young name server domain. Aside from a few

connect to servers under the researcher’s control.

outliers, 4 years seems to mark a fairly strong threshold for this signal. In other words, for domains with name server

KEY TAKEAWAYS

domains older than 4 years, there’s not much of a signal,

As we compare the mean and median of the age of name

except for a couple of isolated spikes that represent large

server domains linked to badness, we see that they are

volumes associated with specific name server domains that

significantly younger than those corresponding to neutral

came online during specific, brief intervals. The one at 51

domains. The signal degradation over time is not as clear

months is particularly high, and merited further investigation.

as it is for the age of the domain itself. Nonetheless, we can confidently say that younger name server domains correlate

It turned out that 51 months ago, late 2011, several “sinkhole”

to more malicious activity than older ones.

name servers were activated for the Conficker worm.

Age of Name Server Domain - Signal Strength (by month)

35

30

20

15

10

5 Average 291

© Copyright DomainTools, 2016. All Rights Reserved.

297

279

285

267

273

255

261

243

249

231

237

219

225

207

213

195

201

183

189

171

177

159

165

147

153

135

141

123

129

111

117

99

105

93

81

87

69

75

63

51

57

39

45

33

21

27

9

15

0 3

Signal Strength

25

6

ENTROPY IN DOMAIN NAMES

Most security analysts have likely seen high entropy domain

In our calculations of entropy, the higher the number, the

names. “Entropy” in this context refers to linguistically

more randomness there is in the domain name. The first chart

chaotic patterns of characters in domain names. For

in this section tells the tale. There is a well-defined curve, in

example, the name “domaintools.com” has very low entropy,

the neutral domains—by far the largest pool—where domain

because the combinations of letters that make up the name

names of low entropy form the bulk of the distribution, and

are not random and they appear frequently in English and

the numbers diminish sharply as entropy increases. They level

other languages. A name like “fqwqxqyqkxqfz.com,” on the

off at low numbers as the names become increasingly random.

other hand, has high entropy. A human can spot a high-

Malicious domains, taken collectively, have a slightly different

entropy domain name at a glance, but to analyze millions

profile, showing more in the high-entropy region than the

of domains, we let computers do the work. We created

neutral domains.

algorithms that calculated the entropy of all active domains, and compared the neutral and malicious pools.

The second chart (next page) breaks out three different categories of malicious activity: spam, phishing, and malware.

In the vast majority of cases, gibberish domain names have

Here the curves diverge, and each of the malicious categories

no beneficial purpose. They are typically auto-generated

has a slightly different distribution. The spam domains show a

and used for machine-to-machine communication such as

secondary peak in the region of high entropy. Since spammers

botnet command and control channels, spam campaigns,

use and discard high volumes of domains, they often use

or other malicious activity. The only legitimate use of such

domain generation algorithms (DGAs) to efficiently generate

constructions that we have ever encountered is domains

large numbers of domain names. DGAs often produce high

used in secure encrypted communication products; but

entropy domain names, which likely explains that secondary

this is a very low incidence compared to the numbers of

peak.

malicious high-entropy domain names.

Entropy of Domain Names (Malicious vs Neutral) 9%

8%

7%

% of distribution

6%

5%

4%

3%

Entropy of Domain Names (Signal Strength) Signal Strength

6 2% 4 1% Average 2

111

© Copyright DomainTools, 2016. All Rights Reserved.

113

107

109

103

105

99

101

97

95

91

93

89

87

85

81

83

79

77

75

71

73

69

67

65

61

63

59

57

55

51

53

49

47

43

45

0% 0

7

KEY TAKEAWAYS As we look at the signal strength, we see a cluster of above-average signals in the higher entropy ranges, corresponding to higher rates of badness. We also notice a difference in the distributions of different types of maliciousness across the entropy spectrum, with spam being much higher than phishing and malware. Phishing domains are most similar in entropy to neutral domains, and this makes sense because phishing domains are intended to imitate legitimate domains.

Entropy of Domain Names (by Category) Malware Phishing

8%

Spam 7%

% of distribution

6%

5%

4%

3%

Signal Strength

2% 6 4 1% Average 2 111

113

107

109

103

105

99

101

97

93

95

91

89

87

85

81

83

79

77

75

71

73

69

67

65

61

63

59

55

57

51

53

49

47

43

45

0% 0

Entropy of Domain Names (Signal Strength) 7

6

4

3

Average

2

1

111

113

107

109

103

105

101

99

97

95

93

91

89

87

85

83

81

79

77

75

73

71

69

67

65

63

61

59

57

55

53

51

49

47

43

0 45

Signal Strength

5

© Copyright DomainTools, 2016. All Rights Reserved.

8

DOMAIN REGISTRARS

The domain registrar, readily visible in a Whois record, can

In addition to the chart on the next page, the tables below

be analyzed in the way we examined attributes in previous

show the top 10 registrars by total malicious domains and

reports. We broke out the pool of domains by registrar to

by concentration of malicious domains (minimum of 1,000

see whether specific ones showed “hot spots” of malicious

malicious domains).

domains. Concentration is more important in this case than absolute number, because many registrars have lots of bad

A few registrars stand out for having comparatively high

domains, but their overall rates of badness are relatively

percentages of malicious registrations, but at low absolute

low. GoDaddy is a good example; as a signal, GoDaddy

numbers. There is one notable outlier, with a relatively high

registration doesn’t tell us much about the domain’s risk

volume and high rate of maliciousness. This registrar (GMO)

level. A lot of bad domains are registered through GoDaddy,

seems to be favored by certain spammers who use the

but much higher numbers of neutral domains are as well.

domains—mainly registered with .co.jp email addresses—for

Similarly, as shown in the original DomainTools report, many

large spam campaigns.

malicious domains are registered using a gmail.com email address, but the concentration of badness tied to gmail

KEY TAKEAWAYS

addresses is well below the average.

With large-scale access to domain registration data, and some automation kung-fu, one could theoretically create a security

For this analysis, we return to the VCP Chart format. The

rule that blocked or quarantined messages sent from domains

chart (next page) shows how registrars compare in terms of

tied to a specific registrar. In practice, such a rule would almost

volume (absolute numbers of domains), concentration (rate

certainly block some legitimate traffic, causing headaches

of malicious versus neutral domains), and proportion of the

for users. But as one attribute among many that collectively

different malicious activity types for each registrar. Note the

compose a risk profile, the registrar attribute does provide a

ones above both averages, particularly the ones above the

discernible signal as there are some registrars with very high

average concentration.

concentrations of malicious domains.

RE G I ST RA R

MALICIOUS

%

R EG I STR A R

%

1

GMO Internet Inc.

307,046

11.86%

1

Nanjing Imperiosus

32.67%

2

GoDaddy.com, LLC

170,356

0.87%

2

Xiamen Nawang Technology

17.02%

3

PublicDomainRegistry.com

82,464

2.77%

3

DomainContext, Inc.

12.50%

4

eNom, Inc.

78,442

1.44%

4

GMO Internet Inc.

11.86%

5

Alpnames Limited

57,337

6.06%

5

Todaynic.com Inc.

9.99%

6

Xiamen Nawang Technology

35,924

17.02%

6

Shanghai Meicheng Technology

9.25% 8.94%

7

Xin Net Technology Corporation

31,848

3.46%

7

TLD Registrar Solutions Ltd.

8

Chengdu West Dimension

28,762

1.34%

8

Chengdu Fly-Digital Technology

8.05%

9

Name.com, Inc.

28,432

3.69%

9

Alpnames Limited

6.06%

HiChina Zhicheng Technology

25,992

1.21%

10

Xiamen ChinaSource Internet Service

6.01%

10

© Copyright DomainTools, 2016. All Rights Reserved.

9

Malicious Domains by Registrar (Volume vs Concentration) 40.00%

30.00%

Nanjing Imperiosus Technology Co. Ltd.

20.00%

15.00%

Xiamen Nawang Technology Co., Ltd DomainContext, Inc.

10.00% Shanghai Meicheng Technology Information Development Co., Ltd.

Xiamen ChinaSource Internet Service Co., Ltd.

5.00% 4.00%

Concentration

3.00%

2.00%

GMO Internet Inc.

Chengdu Fly-Digital Technology Co., Ltd

7.00%

Alpnames Limited Mijn InternetOplossing B.V. CommuniGal Communication Ltd. Average

NameCheap, Inc.

Netowl, Inc. Web Commerce Communications Limited dba WebNic.cc Limited Liability Company "Registrar of domain names REG.RU" Jiangsu Bangning Science and technology Co. Ltd. Shanghai Yovole Networks, Inc. Instra Corporation Pty Ltd.

Xin Net Technology Corporation

PDR Ltd. d/b/a PublicDomainRegistry.com

Beijing Innovative Linkage Technology Ltd. dba dns.com.cn BigRock Solutions Ltd.

1.50% Namesilo, LLC

Internet Domain Services BS Corp Crazy Domains FZ-LLC

Chengdu West Dimension Digital Technology Co., Ltd. HiChina Zhicheng Technology Limited

1.00%

eNom, Inc.

Key-Systems GmbH 0.70%

Vautron Rechenzentrum AG

West263 International Limited Hangzhou AiMing Network Co., LTD OnlineNIC, Inc. Melbourne IT, Ltd DNC Holdings, Inc.

GoDaddy.com, LLC

0.50% Alibaba Cloud Computing Ltd. d/b/a HiChina (www.net.cn) 0.40% Gandi SAS 0.30%

Domain.com, LLC

eName Technology Co., Ltd. FastDomain Inc.

0.20%

Ascio Technologies, Inc. Danmark - Filial af Ascio technologies, Inc. USA 0.15% malware 0.10%

phishing spam

Network Solutions, LLC Average

0.07% 1,000

2,000

5,000

10,000

20,000 Malicious Domains

50,000

100,000

200,000

ABOUT VCP CHARTS This chart plots the total number of malicious domains on the X-axis vs the concentration of malicious domains on the Y-axis, using a logarithmic scale on both axes. Each mark is a pie chart showing the relative proportion of types of malicious activity. The total size of the pie charts represents the relative volume of malicious domains. The crossing gray lines show 95% confidence intervals around the averages for each axis.

© Copyright DomainTools, 2016. All Rights Reserved.

10

BUILDING A COMPOSITE PICTURE

The signals here are all relatively subtle within the

However, these signals may prove extremely valuable in

approximately 140 million total domains we surfaced via

combination. An ongoing DomainTools project seeks to use

passive DNS data. With the possible exception of name

machine learning and other techniques to analyze various

entropy, none of the signals by themselves are strong

composites of attribute signals to develop high-confidence

enough to be dispositive. Even with entropy, if one were

domain risk assessment.

to block all high-entropy domain names, there could be (rare) false positives in the form of the encrypted

In the meantime, we hope that these analyses are helpful to

communications domains mentioned earlier. Similar

security professionals, researchers, and anyone else interested

caveats would apply to security rules based on any of the

in better understanding large-scale patterns in domain

other attributes taken one at a time.

registration data with respect to nefarious activities.

ABOUT DOMAINTOOLS

WORLD’S LARGEST DNS F O R E N S I C S D A T A B A S E **

DomainTools is the leader in domain name, DNS and Internet OSINT-based cyber threat intelligence and cybercrime forensics products and data. With over 14 years of domain name, DNS and related ‘cyber fingerprint’ data across the Internet, DomainTools helps companies assess security threat risks, profile attackers, investigate online fraud and crimes, and map cyber activity in order to stop attacks.

>> Over 300 million known

Our goal is to stop security threats to your organization before they happen, using domain/DNS data, predictive analysis, and monitoring of trends on the Internet. We collect and retain Open Source Intelligence (OSINT) data from many sources and we index and analyze the data based on various connection algorithms to deliver actionable intelligence, including domain scoring and forensic mapping.

>> 4.5 Billion+ IP address

domains in DNS >> 10 Billion+ current and historical Whois records

change events >> 1.8 Billion+ Registrar change events

DomainTools uses over 10 billion related DNS data points to build a map of ‘who’s doing what’ on the Internet. Government agencies, Fortune 500 companies and leading security firms use our data as a critical ingredient in their threat investigation and cybercrime forensics work. For more information about DomainTools' data and products, please visit our website at www.domaintools.com.

[email protected]

206.838.9020

www.domaintools.com

>> 3 billion+ name server change events ** These figures are from Q1 2016, but they are inherently out of date, as we add about 5M records a day.

© Copyright DomainTools, 2016. All Rights Reserved.

11

Suggest Documents