T H E D O M A I N T O O L S R E P O R T, 2 0 1 6 E D I T I O N
THE DISTRIBUTION OF MALICIOUS DOMAINS
SUMMARY In our previous reports, we profiled malicious domains by describing patterns in their registration details: top level domain (TLD), free email provider, Whois privacy provider, and hosting location. In this edition, we compared the distributions of malicious domains vs neutral domains across a measure of age (both of the domain and of the name server domain) and a measure of the entropy of the domain name. We also examined malicious domains across registrars to find additional clues as to how and when these domains were registered.
KEY FINDINGS
D OM AI N A G E
DOMAIN NAME ENTROPY
Even among young domains, there are far more
Domain names with high entropy—that is, those that
neutral than malicious domains. However, when we
are gibberish combinations of letters and numbers—
examine bad domains as a class, many more of them
are more likely to be malicious than linguistically
are relatively young. Neutral domains, as a class,
coherent domains. While this may not be surprising,
show less of a skew toward youth.
it is informative to see the specific data.
N AM E SE R V ER D O M A IN A GE
DOMAIN REGISTRAR
Most domains have a name server associated with
Some domain registrars stand out for having high
them. The domains of the name servers themselves
percentages of malicious domains registered through
can act as a statistical signal; the signal shows that
them. And, in one particular case, the registrar
more malicious domains have comparatively young
also has fairly high absolute numbers of malicious
name server domains.
domains in addition to a high percentage.
© Copyright DomainTools, 2016. All Rights Reserved.
1
OVERVIEW
In the DomainTools Report, we mine DomainTools data in order to discover patterns in domain registrations that may help researchers or security analysts learn more about concentrations of malicious activity. In the first two reports, we examined attributes such as top level domain (TLD), Whois privacy providers, and registration behaviors of domain registrants strongly connected to highvolume malicious activity. We believe that malicious actors behave in a predictable manner, and the more we profile that behavior, the better we can defend against them. Those prior reports found high concentrations of malicious domains in various Japanese and Chinese privacy providers, email providers, and bulk domain registration agents. The data in those reports have helped us paint a broad picture, and the data in this latest report hopes to add to our understanding of cybercriminals. For this edition of the report, we examined several new attributes, some of which readers may have considered before. They include:
As in earlier editions of The DomainTools Report, having nearly all of the existing domains’ registration information at our fingertips has allowed us to pull out some interesting patterns. Ultimately, we believe it will be possible to predict the likelihood that a new or previously-unseen domain will be malicious, based on its unique composition of attributes. We will do this via a variety of techniques including machine learning.
NOTE DomainTools already has a proven algorithm for predicting risk of domains based on how tightly-coupled they are to existing malicious activity. The work described here may be able to complement that algorithm by identifying risky domain profiles even when the domains in question are not closely connected to prior bad behavior.
Like snowflakes or fingerprints, no two domains are exactly alike. At a minimum, each name is unique, but in most cases there are multiple attributes that differ. Some of these differences—individually or in concert with others—may help predict the risk level of the domain.
>> Age of the domain (as of Feb 2016) >> Age of name server domain (as of Feb 2016) >> Entropy of the domain name composition >> Registrar of the domain
© Copyright DomainTools, 2016. All Rights Reserved.
2
METHODOLOGY AND CHARTING
THE DATA SET Readers of earlier editions will recall that our methodology is to look at well-vetted blacklists and at the entire population of active domains. We attempt to find spikes in the relative concentration of known-bad domains versus the overall background levels of badness. For this report, the data set we examined was approximately 140,000,000 domains extracted from passive DNS data. For the set of malicious domains, we used data from high quality blacklist feeds from partners of DomainTools. To be clear, while there are well over 300 million domains currently registered worldwide, we opted to analyze the subset that are actually seen in DNS requests as we believe the results of the analysis are more relevant when they focus on domains actually receiving traffic. CALCULATIONS In previous editions of the DomainTools Report, we introduced what we call “VCP” Charts (Volume, Concentration, Proportion). In this report, we establish a new calculation, which we call “Signal Strength.” This describes how malicious domains are distributed across a certain linear attribute, such as age, compared to how neutral domains are distributed across the same attribute. It is essentially a measurement of how much the distribution skews towards being an indicator of a malicious domain. For example: of all blacklisted domains, 4.46% are currently between 12 and 13 months old. Of all neutral domains, 1.52% of them are in this same age range. Therefore, if we compare the two percentages, the rate at which malicious domains fall into that age range is 2.93 times the rate at which neutral domains fall into that same range. We call this a “signal strength” of 2.93. Please note that, to improve legibility, the age data we present in this report is grouped by quarter (3 months, 6 months, 9 months, and so forth). Why do we measure signal strength across each value in the distribution? While our report can simply compare the two distributions as a whole through standard statistical measures, we wanted our research to help inform our risk scoring and reputation scoring algorithms as to which attributes and which values within that attribute indicate maliciousness, and the relative strength of the signals.
© Copyright DomainTools, 2016. All Rights Reserved.
3
DOMAIN AGE
Many security professionals are leery of brand-new domains. Some have even suggested that all new domains should
NOTE
go through a mandatory “waiting period” during which
Malicious domains don’t tend to stay around as long
they must prove themselves to be free of harmful activity
as neutral ones. Some of them are taken down by law
(malware, phishing, etc) before they can be released into
enforcement, ICANN, research/white-hat sinkholes, etc.
DNS for general use. These ideas are well-intentioned but
Others are used for a brief time and then discarded as
they aren’t likely be adopted in the near future. Thus, it falls
they appear on blacklists and become less effective.
to the security community to provide effective defenses
Blacklisted domains don’t necessarily get taken down
against harmful domains of any age.
per se, but they often are not renewed by their owners and thus drop out of DNS after a year or two.
We examined the rates at which domains of various ages appeared on blacklists. The results (depicted in the first chart in this section) tend to support at least some level of “age discrimination” against domains. However, compared to all existing and active domains, these are still comparatively small numbers. It makes the most sense from an analytical standpoint to look at the age distribution of all domains in a classification, the classes being “malicious” and “neutral.”
KEY TAKEAWAYS As we can see, the distributions show a significantly younger average for malicious domains than for neutral domains. Malicious domains tend to be younger, and they will not remain active for any extended periods of time. For anyone who has studied spam or phishing campaigns, this may not be surprising. Domains used for those activities are often
Given the huge population of domains over the history of the Internet, and because a lot of malicious domains are
registered, used, and discarded over a very brief period of time—sometimes well under one day.
ephemeral (see note), it is logical that neutral domains skew older than malicious ones. Of all malicious domains reported by our blacklist feeds, over 75% are under 13 months old. However, it is still worth noting that as a percentage of all youthful domains, malicious ones are a relatively low
The last chart on the next page provides another way of seeing that maliciousness is heavily skewed towards younger domains. This is especially true at 21 months and younger, as these have signal strength well above the average.
population.
Age Distribution of Neutral Domains (by month) 25%
20%
20%
15%
15%
10%
10%
5%
5%
0%
0% 3 9 15 21 27 33 39 45 51 57 63 69 75 81 87 93 99 105 111 117 123 129 135 141 147 153 159 165 171 177 183 189 195 201 207 213 219 225 231 237 243 249 255 261 267 273 279 285
% of distribution
25%
3 9 15 21 27 33 39 45 51 57 63 69 75 81 87 93 99 105 111 117 123 129 135 141 147 153 159 165 171 177 183 189 195 201 207 213 219 225 231 237 243 249 255 261 267 273 279 285
% of distribution
Age Distribution of Malicious Domains (by month)
© Copyright DomainTools, 2016. All Rights Reserved.
4
The two charts on the previous page show the distribution over time of malicious and neutral domains respectively. The first chart on this page overlays both distributions. It is important to note that the X-axis is “contribution to the class,” not absolute numbers—in other words, the taller red bars do not mean there are more bad domains than good domains. The last chart shows the signal strength calculation for maliciousness as a function of domain age, including a line indicating the average and a 95% confidence interval.
Age Distribution of Domains - Malicious vs Neutral (by month) 25%
% of distribution
20%
15%
10%
Age of Domains - Signal Strength (by month)
2
267
273
279
285
267
273
279
285
255
261
255
261
243
249
243
249
231
237
231
237
219
225
219
225
207
213
207
213
195
201
195
201
183
189
183
189
171
177
171
177
159
165
159
165
147
153
147
153
135
141
135
141
123
129
123
129
111
117 117
99
105 105
93
99
81
87
69
75
63
51
57
39
45
33
21
27
3
Average 9
1
0% 0
15
Signal Strength
4 5% 3
Age of Domains - Signal Strength (by month) 4.0
3.5
3.0
2.0
1.5
1.0
0.5 Average
111
93
81
87
75
69
63
51
57
45
39
33
21
27
15
9
0.0 3
Signal Strength
2.5
© Copyright DomainTools, 2016. All Rights Reserved.
5
NAME SERVER DOMAIN AGE
Almost every domain in DNS requires a name server host
Conficker and its variants were widespread from 2008-2011
that is authoritative for the domain. Name servers, in turn,
or so (and have reappeared occasionally since), but the
include a domain; for example, in ns1.domaincontrol.com, the
sinkholes were a key part of fighting back against the virus.
name server domain is “domaincontrol.com.” We applied the
Because part of Conficker’s “lifecycle” was to generate large
same analytical framework to the name server domain age
numbers of domains, the sinkholes accumulated the high
as we did with domain age to identify whether or not there
numbers that contribute to the spike seen here. Sinkholes
was a strong signal of maliciousness.
are a method that researchers and white-hat hackers use to neutralize command-and-control infrastructure or to study
The chart on this page shows an interesting pattern. It
malware. The sinkhole causes the malware either to connect
turns out that if a domain is malicious, it is much more likely
to non-routable IP addresses (effectively halting it) or to
that it has a young name server domain. Aside from a few
connect to servers under the researcher’s control.
outliers, 4 years seems to mark a fairly strong threshold for this signal. In other words, for domains with name server
KEY TAKEAWAYS
domains older than 4 years, there’s not much of a signal,
As we compare the mean and median of the age of name
except for a couple of isolated spikes that represent large
server domains linked to badness, we see that they are
volumes associated with specific name server domains that
significantly younger than those corresponding to neutral
came online during specific, brief intervals. The one at 51
domains. The signal degradation over time is not as clear
months is particularly high, and merited further investigation.
as it is for the age of the domain itself. Nonetheless, we can confidently say that younger name server domains correlate
It turned out that 51 months ago, late 2011, several “sinkhole”
to more malicious activity than older ones.
name servers were activated for the Conficker worm.
Age of Name Server Domain - Signal Strength (by month)
35
30
20
15
10
5 Average 291
© Copyright DomainTools, 2016. All Rights Reserved.
297
279
285
267
273
255
261
243
249
231
237
219
225
207
213
195
201
183
189
171
177
159
165
147
153
135
141
123
129
111
117
99
105
93
81
87
69
75
63
51
57
39
45
33
21
27
9
15
0 3
Signal Strength
25
6
ENTROPY IN DOMAIN NAMES
Most security analysts have likely seen high entropy domain
In our calculations of entropy, the higher the number, the
names. “Entropy” in this context refers to linguistically
more randomness there is in the domain name. The first chart
chaotic patterns of characters in domain names. For
in this section tells the tale. There is a well-defined curve, in
example, the name “domaintools.com” has very low entropy,
the neutral domains—by far the largest pool—where domain
because the combinations of letters that make up the name
names of low entropy form the bulk of the distribution, and
are not random and they appear frequently in English and
the numbers diminish sharply as entropy increases. They level
other languages. A name like “fqwqxqyqkxqfz.com,” on the
off at low numbers as the names become increasingly random.
other hand, has high entropy. A human can spot a high-
Malicious domains, taken collectively, have a slightly different
entropy domain name at a glance, but to analyze millions
profile, showing more in the high-entropy region than the
of domains, we let computers do the work. We created
neutral domains.
algorithms that calculated the entropy of all active domains, and compared the neutral and malicious pools.
The second chart (next page) breaks out three different categories of malicious activity: spam, phishing, and malware.
In the vast majority of cases, gibberish domain names have
Here the curves diverge, and each of the malicious categories
no beneficial purpose. They are typically auto-generated
has a slightly different distribution. The spam domains show a
and used for machine-to-machine communication such as
secondary peak in the region of high entropy. Since spammers
botnet command and control channels, spam campaigns,
use and discard high volumes of domains, they often use
or other malicious activity. The only legitimate use of such
domain generation algorithms (DGAs) to efficiently generate
constructions that we have ever encountered is domains
large numbers of domain names. DGAs often produce high
used in secure encrypted communication products; but
entropy domain names, which likely explains that secondary
this is a very low incidence compared to the numbers of
peak.
malicious high-entropy domain names.
Entropy of Domain Names (Malicious vs Neutral) 9%
8%
7%
% of distribution
6%
5%
4%
3%
Entropy of Domain Names (Signal Strength) Signal Strength
6 2% 4 1% Average 2
111
© Copyright DomainTools, 2016. All Rights Reserved.
113
107
109
103
105
99
101
97
95
91
93
89
87
85
81
83
79
77
75
71
73
69
67
65
61
63
59
57
55
51
53
49
47
43
45
0% 0
7
KEY TAKEAWAYS As we look at the signal strength, we see a cluster of above-average signals in the higher entropy ranges, corresponding to higher rates of badness. We also notice a difference in the distributions of different types of maliciousness across the entropy spectrum, with spam being much higher than phishing and malware. Phishing domains are most similar in entropy to neutral domains, and this makes sense because phishing domains are intended to imitate legitimate domains.
Entropy of Domain Names (by Category) Malware Phishing
8%
Spam 7%
% of distribution
6%
5%
4%
3%
Signal Strength
2% 6 4 1% Average 2 111
113
107
109
103
105
99
101
97
93
95
91
89
87
85
81
83
79
77
75
71
73
69
67
65
61
63
59
55
57
51
53
49
47
43
45
0% 0
Entropy of Domain Names (Signal Strength) 7
6
4
3
Average
2
1
111
113
107
109
103
105
101
99
97
95
93
91
89
87
85
83
81
79
77
75
73
71
69
67
65
63
61
59
57
55
53
51
49
47
43
0 45
Signal Strength
5
© Copyright DomainTools, 2016. All Rights Reserved.
8
DOMAIN REGISTRARS
The domain registrar, readily visible in a Whois record, can
In addition to the chart on the next page, the tables below
be analyzed in the way we examined attributes in previous
show the top 10 registrars by total malicious domains and
reports. We broke out the pool of domains by registrar to
by concentration of malicious domains (minimum of 1,000
see whether specific ones showed “hot spots” of malicious
malicious domains).
domains. Concentration is more important in this case than absolute number, because many registrars have lots of bad
A few registrars stand out for having comparatively high
domains, but their overall rates of badness are relatively
percentages of malicious registrations, but at low absolute
low. GoDaddy is a good example; as a signal, GoDaddy
numbers. There is one notable outlier, with a relatively high
registration doesn’t tell us much about the domain’s risk
volume and high rate of maliciousness. This registrar (GMO)
level. A lot of bad domains are registered through GoDaddy,
seems to be favored by certain spammers who use the
but much higher numbers of neutral domains are as well.
domains—mainly registered with .co.jp email addresses—for
Similarly, as shown in the original DomainTools report, many
large spam campaigns.
malicious domains are registered using a gmail.com email address, but the concentration of badness tied to gmail
KEY TAKEAWAYS
addresses is well below the average.
With large-scale access to domain registration data, and some automation kung-fu, one could theoretically create a security
For this analysis, we return to the VCP Chart format. The
rule that blocked or quarantined messages sent from domains
chart (next page) shows how registrars compare in terms of
tied to a specific registrar. In practice, such a rule would almost
volume (absolute numbers of domains), concentration (rate
certainly block some legitimate traffic, causing headaches
of malicious versus neutral domains), and proportion of the
for users. But as one attribute among many that collectively
different malicious activity types for each registrar. Note the
compose a risk profile, the registrar attribute does provide a
ones above both averages, particularly the ones above the
discernible signal as there are some registrars with very high
average concentration.
concentrations of malicious domains.
RE G I ST RA R
MALICIOUS
%
R EG I STR A R
%
1
GMO Internet Inc.
307,046
11.86%
1
Nanjing Imperiosus
32.67%
2
GoDaddy.com, LLC
170,356
0.87%
2
Xiamen Nawang Technology
17.02%
3
PublicDomainRegistry.com
82,464
2.77%
3
DomainContext, Inc.
12.50%
4
eNom, Inc.
78,442
1.44%
4
GMO Internet Inc.
11.86%
5
Alpnames Limited
57,337
6.06%
5
Todaynic.com Inc.
9.99%
6
Xiamen Nawang Technology
35,924
17.02%
6
Shanghai Meicheng Technology
9.25% 8.94%
7
Xin Net Technology Corporation
31,848
3.46%
7
TLD Registrar Solutions Ltd.
8
Chengdu West Dimension
28,762
1.34%
8
Chengdu Fly-Digital Technology
8.05%
9
Name.com, Inc.
28,432
3.69%
9
Alpnames Limited
6.06%
HiChina Zhicheng Technology
25,992
1.21%
10
Xiamen ChinaSource Internet Service
6.01%
10
© Copyright DomainTools, 2016. All Rights Reserved.
9
Malicious Domains by Registrar (Volume vs Concentration) 40.00%
30.00%
Nanjing Imperiosus Technology Co. Ltd.
20.00%
15.00%
Xiamen Nawang Technology Co., Ltd DomainContext, Inc.
10.00% Shanghai Meicheng Technology Information Development Co., Ltd.
Xiamen ChinaSource Internet Service Co., Ltd.
5.00% 4.00%
Concentration
3.00%
2.00%
GMO Internet Inc.
Chengdu Fly-Digital Technology Co., Ltd
7.00%
Alpnames Limited Mijn InternetOplossing B.V. CommuniGal Communication Ltd. Average
NameCheap, Inc.
Netowl, Inc. Web Commerce Communications Limited dba WebNic.cc Limited Liability Company "Registrar of domain names REG.RU" Jiangsu Bangning Science and technology Co. Ltd. Shanghai Yovole Networks, Inc. Instra Corporation Pty Ltd.
Xin Net Technology Corporation
PDR Ltd. d/b/a PublicDomainRegistry.com
Beijing Innovative Linkage Technology Ltd. dba dns.com.cn BigRock Solutions Ltd.
1.50% Namesilo, LLC
Internet Domain Services BS Corp Crazy Domains FZ-LLC
Chengdu West Dimension Digital Technology Co., Ltd. HiChina Zhicheng Technology Limited
1.00%
eNom, Inc.
Key-Systems GmbH 0.70%
Vautron Rechenzentrum AG
West263 International Limited Hangzhou AiMing Network Co., LTD OnlineNIC, Inc. Melbourne IT, Ltd DNC Holdings, Inc.
GoDaddy.com, LLC
0.50% Alibaba Cloud Computing Ltd. d/b/a HiChina (www.net.cn) 0.40% Gandi SAS 0.30%
Domain.com, LLC
eName Technology Co., Ltd. FastDomain Inc.
0.20%
Ascio Technologies, Inc. Danmark - Filial af Ascio technologies, Inc. USA 0.15% malware 0.10%
phishing spam
Network Solutions, LLC Average
0.07% 1,000
2,000
5,000
10,000
20,000 Malicious Domains
50,000
100,000
200,000
ABOUT VCP CHARTS This chart plots the total number of malicious domains on the X-axis vs the concentration of malicious domains on the Y-axis, using a logarithmic scale on both axes. Each mark is a pie chart showing the relative proportion of types of malicious activity. The total size of the pie charts represents the relative volume of malicious domains. The crossing gray lines show 95% confidence intervals around the averages for each axis.
© Copyright DomainTools, 2016. All Rights Reserved.
10
BUILDING A COMPOSITE PICTURE
The signals here are all relatively subtle within the
However, these signals may prove extremely valuable in
approximately 140 million total domains we surfaced via
combination. An ongoing DomainTools project seeks to use
passive DNS data. With the possible exception of name
machine learning and other techniques to analyze various
entropy, none of the signals by themselves are strong
composites of attribute signals to develop high-confidence
enough to be dispositive. Even with entropy, if one were
domain risk assessment.
to block all high-entropy domain names, there could be (rare) false positives in the form of the encrypted
In the meantime, we hope that these analyses are helpful to
communications domains mentioned earlier. Similar
security professionals, researchers, and anyone else interested
caveats would apply to security rules based on any of the
in better understanding large-scale patterns in domain
other attributes taken one at a time.
registration data with respect to nefarious activities.
ABOUT DOMAINTOOLS
WORLD’S LARGEST DNS F O R E N S I C S D A T A B A S E **
DomainTools is the leader in domain name, DNS and Internet OSINT-based cyber threat intelligence and cybercrime forensics products and data. With over 14 years of domain name, DNS and related ‘cyber fingerprint’ data across the Internet, DomainTools helps companies assess security threat risks, profile attackers, investigate online fraud and crimes, and map cyber activity in order to stop attacks.
>> Over 300 million known
Our goal is to stop security threats to your organization before they happen, using domain/DNS data, predictive analysis, and monitoring of trends on the Internet. We collect and retain Open Source Intelligence (OSINT) data from many sources and we index and analyze the data based on various connection algorithms to deliver actionable intelligence, including domain scoring and forensic mapping.
>> 4.5 Billion+ IP address
domains in DNS >> 10 Billion+ current and historical Whois records
change events >> 1.8 Billion+ Registrar change events
DomainTools uses over 10 billion related DNS data points to build a map of ‘who’s doing what’ on the Internet. Government agencies, Fortune 500 companies and leading security firms use our data as a critical ingredient in their threat investigation and cybercrime forensics work. For more information about DomainTools' data and products, please visit our website at www.domaintools.com.
[email protected]
206.838.9020
www.domaintools.com
>> 3 billion+ name server change events ** These figures are from Q1 2016, but they are inherently out of date, as we add about 5M records a day.
© Copyright DomainTools, 2016. All Rights Reserved.
11