Machine learning – en introduktion Josefin Rosén, Senior Analytical Expert, SAS Institute
[email protected] Twitter: @rosenjosefin #SASFORUMSE
Copyright © 2015, SAS Institute Inc. All rights reserved.
Machine learning – en introduktion Agenda Vad är machine learning?
När, var och hur används machine learning? Exempel – deep learning Machine learning i SAS
Copyright © 2015, SAS Institute Inc. All rights reserved.
Machine learning – vad är det? Wikipedia: Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.
SAS: Machine learning is a branch of artificial intelligence that automates the building of systems that learn from data, identify patterns, and make decisions – with minimal human intervention.
Copyright © 2015, SAS Institute Inc. All rights reserved.
Vad är vad egentligen? Statistics
Pattern Recognition
Data Science Data Mining Machine Learning
Databases Information Retrieval
Copyright © 2015, SAS Institute Inc. All rights reserved.
Computational Neuroscience
AI
Machine learning – vad är det?
”Komplicerade metoder, men användbara resultat”
Copyright © 2015, SAS Institute Inc. All rights reserved.
När används machine learning? När modellens prediktionsnoggrannhet är viktigare än tolkningen av modellen När traditionella tillvägagångssätt inte passar, t ex när man har: fler variabler än observationer många korrelerade variabler ostrukturerad data fundamentalt ickelinjära eller ovanliga fenomen
Copyright © 2015, SAS Institute Inc. All rights reserved.
Beslutsträd Träningsdata Regression
Neuralt nätverk
Copyright © 2015, SAS Institute Inc. All rights reserved.
Var används machine learning? Några exempel:
Rekommendationsapplikationer Fraud detection Prediktivt underhåll Textanalys Mönster och bildigenkänning Den självkörande Google-bilen
Copyright © 2015, SAS Institute Inc. All rights reserved.
Statistics
Pattern Recognition
Data Science Data Mining Machine Learning
Databases Information Retrieval
Copyright © 2015, SAS Institute Inc. All rights reserved.
Computational Neuroscience
AI
Machine Learning Data Mining
SUPERVISED LEARNING Regression LASSO regression Logistic regression Ridge regression
Decision tree Gradient boosting Random forests
Know y
Neural networks SVM Naïve Bayes Neighbors Gaussian processes
UNSUPERVISED LEARNING A priori rules Clustering k-means clustering Mean shift clustering Spectral clustering
Kernel density estimation Nonnegative matrix factorization PCA
Don’t know y
SEMI-SUPERVISED LEARNING Prediction and classification* Clustering* EM TSVM Manifold regularization Autoencoders
Sometimes know y Multilayer perceptron Restricted Boltzmann machines
Kernel PCA Sparse PCA
Singular value decomposition SOM *In semi-supervised learning, supervised prediction and classification algorithms are often combined with clustering.
Copyright © 2015, SAS Institute Inc. All rights reserved.
Deep learning Deep learning – att använda neurala nätverk med fler än två gömda lager Används framgångsrikt bl a inom mönsterigenkänning Bra på att extrahera features från ett dataset
Copyright © 2015, SAS Institute Inc. All rights reserved.
MNIST träningsdata 784 variabler bildar en 28x28 digital grid 784-dimensionell inputvektor X = (x1,…,x784) Varierande gråskala från 0 till 255
60,000 träningsbilder med label 10,000 testbilder utan label
Copyright © 2015, SAS Institute Inc. All rights reserved.
MNIST exempel Träna en stacked denoising autoencoder Extrahera representativa features från MNIST data Jämföra med PCA, två PCs
Copyright © 2015, SAS Institute Inc. All rights reserved.
Stacked denoising autoencoder
Uncorrupted Output Features
h5 h4 Hidden layers h3
Target Layer
Hidden Neurons
Hidden Neurons
Hidden Neurons
h2
Hidden Neurons
h1
Hidden Neurons
Extractable Features
Partially Corrupted Input Features
Copyright © 2015, SAS Institute Inc. All rights reserved.
Input Layer
Record ID
Hidden Unit 1
Hidden Unit 2
1
0.98754
0.32453
2
0.76854
0.87345
3
0.87435
0.05464
⋮
⋮
⋮
h3
Extractable Features
Hidden Neurons
h2
Hidden Neurons
h1
Hidden Neurons
Partially Corrupted Input Features
Input Layer
Record ID
Pixel 1
Pixel 2
Pixel 3
Pixel 4
Pixel 5
Pixel 6
Pixel 7
Pixel 8
Pixel 9
Pixel 10
…
1
0
0
0
0
0
5
8
11
6
3
…
2
0
0
0
0
10
20
45
46
36
24
…
3
0
25
37
32
40
64
107
200
67
46
…
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋱
Copyright © 2015, SAS Institute Inc. All rights reserved.
Feature extraction – denoising autoencoder
Copyright © 2015, SAS Institute Inc. All rights reserved.
Feature extraction - PCA
Copyright © 2015, SAS Institute Inc. All rights reserved.
SAS machine learning algoritmer
Neural networks Decision trees Random forests Associations and sequence discovery Gradient boosting and bagging Support vector machines Nearest-neighbor mapping K-means clustering DBSCAN Self-organizing maps Local search optimization techniques such as genetic algorithms
Expectation maximization Multivariate adaptive regression splines Bayesian networks Kernel density estimation Principal components analysis Singular value decomposition Gaussian mixture models Sequential covering rule building Model ensembles Recommendations
Copyright © 2015, SAS Institute Inc. All rights reserved.
SAS-produkter som använder machine learning SAS Enterprise Miner SAS Text Miner
SAS In-Memory Statistics for Hadoop SAS Visual Statistics SAS/STAT SAS/OR SAS Factory Miner
Copyright © 2015, SAS Institute Inc. All rights reserved.
SAS EM-noder
SAS procedurer
Regression
High Performance Regression LARS Partial Least Squares Regression
ADAPTIVEREG GAM GENMOD GLMSELECT HPGENSELECT HPLOGISTIC HHPQUANTSELECT HPREG LOGISTIC QUANTREG QUANTSELECT REG
Beslutsträd
Decision Tree High Performance Tree
ARBORETUM HPSPLIT
Random forest
High Performance Tree
HPFOREST
Gradient boosting
Gradient Boosting
ARBORETUM
Neurala nätverk
AutoNeural DMNeural High Performance Neural Neural Network
HPNEURAL NEURAL
Support vector machine
High Performance Support Vector Machine
HPSVM
Naïve Bayes Neighbors
HPBNET* Memory Based Reasoning
DISCRIM
*PROC HPBNET kan lära sig olika nätverksstrukturer (naïve, TAN, PC, och MB) och automatiskt välja den bästa modellen
Copyright © 2015, SAS Institute Inc. All rights reserved.
Supervised learning algoritmer
Algoritm
Unsupervised learning algoritmer Algoritm
SAS EM-noder
A priori rules
Association Link Analysis
K-means klustring
Cluster High Performance Cluster
SAS procedurer
FASTCLUS HPCLUS
Spektral klustring
Custom lösning genom Base SAS och procedurerna DISTANCE och PRINCOMP
Kernel density estimation
KDE
Kernel PCA
Custom lösning genom Base SAS och procedurerna CORR, PRINCOMP och SCORE
Singular value decomposition
HPTMINE IML
Self organizing maps
SOM/Kohonen
Copyright © 2015, SAS Institute Inc. All rights reserved.
Semi-Supervised learning algoritmer Algoritm Denoising autoencoders
SAS EM-noder
SAS procedurer HPNEURAL NEURAL
Copyright © 2015, SAS Institute Inc. All rights reserved.
Varför har machine learning fått ökat intresse? Big data Beräkningsresurser Kraftfulla datorer “Space is big. You just won't believe how
vastly, hugely, mind-bogglingly big it is” Billig datalagring Douglas Adams i ”Liftarens guide till galaxen”
Copyright © 2015, SAS Institute Inc. All rights reserved.
Copyright © 2015, SAS Institute Inc. All rights reserved.
Mer läsning •
•
White papers
http://www.sas.com/en_us/whitepapers/machine-learning-with-sas-enterprise-miner-107521.html
http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf
SAS-länkar
http://www.sas.com/en_us/insights/analytics/machine-learning.html
http://www.sas.com/en_us/insights/articles/analytics/introduction-to-machine-learning-five-things-the-quants-wish-weknew.html
•
SAS Data Mining Community
•
https://communities.sas.com/community/support-communities/sas_data_mining_and_text_mining/
Big Data Matters Webinar Series:
www.sas.com/bigdatamatters
Copyright © 2015, SAS Institute Inc. All rights reserved.
Tack!
Copyright © 2015, SAS Institute Inc. All rights reserved.