Machine learning – en introduktion Josefin Rosén, Senior Analytical Expert, SAS Institute [email protected] Twitter: @rosenjosefin #SASFORUMSE

Copyright © 2015, SAS Institute Inc. All rights reserved.

Machine learning – en introduktion Agenda  Vad är machine learning?

 När, var och hur används machine learning?  Exempel – deep learning  Machine learning i SAS

Copyright © 2015, SAS Institute Inc. All rights reserved.

Machine learning – vad är det? Wikipedia: Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.

SAS: Machine learning is a branch of artificial intelligence that automates the building of systems that learn from data, identify patterns, and make decisions – with minimal human intervention.

Copyright © 2015, SAS Institute Inc. All rights reserved.

Vad är vad egentligen? Statistics

Pattern Recognition

Data Science Data Mining Machine Learning

Databases Information Retrieval

Copyright © 2015, SAS Institute Inc. All rights reserved.

Computational Neuroscience

AI

Machine learning – vad är det?

”Komplicerade metoder, men användbara resultat”

Copyright © 2015, SAS Institute Inc. All rights reserved.

När används machine learning? När modellens prediktionsnoggrannhet är viktigare än tolkningen av modellen När traditionella tillvägagångssätt inte passar, t ex när man har:  fler variabler än observationer  många korrelerade variabler  ostrukturerad data  fundamentalt ickelinjära eller ovanliga fenomen

Copyright © 2015, SAS Institute Inc. All rights reserved.

Beslutsträd Träningsdata Regression

Neuralt nätverk

Copyright © 2015, SAS Institute Inc. All rights reserved.

Var används machine learning? Några exempel:

 Rekommendationsapplikationer  Fraud detection  Prediktivt underhåll  Textanalys  Mönster och bildigenkänning  Den självkörande Google-bilen

Copyright © 2015, SAS Institute Inc. All rights reserved.

Statistics

Pattern Recognition

Data Science Data Mining Machine Learning

Databases Information Retrieval

Copyright © 2015, SAS Institute Inc. All rights reserved.

Computational Neuroscience

AI

Machine Learning Data Mining

SUPERVISED LEARNING Regression LASSO regression Logistic regression Ridge regression

Decision tree Gradient boosting Random forests

Know y

Neural networks SVM Naïve Bayes Neighbors Gaussian processes

UNSUPERVISED LEARNING A priori rules Clustering k-means clustering Mean shift clustering Spectral clustering

Kernel density estimation Nonnegative matrix factorization PCA

Don’t know y

SEMI-SUPERVISED LEARNING Prediction and classification* Clustering* EM TSVM Manifold regularization Autoencoders

Sometimes know y Multilayer perceptron Restricted Boltzmann machines

Kernel PCA Sparse PCA

Singular value decomposition SOM *In semi-supervised learning, supervised prediction and classification algorithms are often combined with clustering.

Copyright © 2015, SAS Institute Inc. All rights reserved.

Deep learning  Deep learning – att använda neurala nätverk med fler än två gömda lager  Används framgångsrikt bl a inom mönsterigenkänning  Bra på att extrahera features från ett dataset

Copyright © 2015, SAS Institute Inc. All rights reserved.

MNIST träningsdata  784 variabler bildar en 28x28 digital grid  784-dimensionell inputvektor X = (x1,…,x784)  Varierande gråskala från 0 till 255

 60,000 träningsbilder med label  10,000 testbilder utan label

Copyright © 2015, SAS Institute Inc. All rights reserved.

MNIST exempel  Träna en stacked denoising autoencoder  Extrahera representativa features från MNIST data  Jämföra med PCA, två PCs

Copyright © 2015, SAS Institute Inc. All rights reserved.

Stacked denoising autoencoder

Uncorrupted Output Features

h5 h4 Hidden layers h3

Target Layer

Hidden Neurons

Hidden Neurons

Hidden Neurons

h2

Hidden Neurons

h1

Hidden Neurons

Extractable Features

Partially Corrupted Input Features

Copyright © 2015, SAS Institute Inc. All rights reserved.

Input Layer

Record ID

Hidden Unit 1

Hidden Unit 2

1

0.98754

0.32453

2

0.76854

0.87345

3

0.87435

0.05464







h3

Extractable Features

Hidden Neurons

h2

Hidden Neurons

h1

Hidden Neurons

Partially Corrupted Input Features

Input Layer

Record ID

Pixel 1

Pixel 2

Pixel 3

Pixel 4

Pixel 5

Pixel 6

Pixel 7

Pixel 8

Pixel 9

Pixel 10



1

0

0

0

0

0

5

8

11

6

3



2

0

0

0

0

10

20

45

46

36

24



3

0

25

37

32

40

64

107

200

67

46



























Copyright © 2015, SAS Institute Inc. All rights reserved.

Feature extraction – denoising autoencoder

Copyright © 2015, SAS Institute Inc. All rights reserved.

Feature extraction - PCA

Copyright © 2015, SAS Institute Inc. All rights reserved.

SAS machine learning algoritmer 

Neural networks  Decision trees  Random forests  Associations and sequence discovery  Gradient boosting and bagging  Support vector machines  Nearest-neighbor mapping  K-means clustering  DBSCAN  Self-organizing maps  Local search optimization techniques such as genetic algorithms



Expectation maximization  Multivariate adaptive regression splines  Bayesian networks  Kernel density estimation  Principal components analysis  Singular value decomposition  Gaussian mixture models  Sequential covering rule building  Model ensembles  Recommendations

Copyright © 2015, SAS Institute Inc. All rights reserved.

SAS-produkter som använder machine learning  SAS Enterprise Miner  SAS Text Miner

 SAS In-Memory Statistics for Hadoop  SAS Visual Statistics  SAS/STAT  SAS/OR  SAS Factory Miner

Copyright © 2015, SAS Institute Inc. All rights reserved.

SAS EM-noder

SAS procedurer

Regression

High Performance Regression LARS Partial Least Squares Regression

ADAPTIVEREG GAM GENMOD GLMSELECT HPGENSELECT HPLOGISTIC HHPQUANTSELECT HPREG LOGISTIC QUANTREG QUANTSELECT REG

Beslutsträd

Decision Tree High Performance Tree

ARBORETUM HPSPLIT

Random forest

High Performance Tree

HPFOREST

Gradient boosting

Gradient Boosting

ARBORETUM

Neurala nätverk

AutoNeural DMNeural High Performance Neural Neural Network

HPNEURAL NEURAL

Support vector machine

High Performance Support Vector Machine

HPSVM

Naïve Bayes Neighbors

HPBNET* Memory Based Reasoning

DISCRIM

*PROC HPBNET kan lära sig olika nätverksstrukturer (naïve, TAN, PC, och MB) och automatiskt välja den bästa modellen

Copyright © 2015, SAS Institute Inc. All rights reserved.

Supervised learning algoritmer

Algoritm

Unsupervised learning algoritmer Algoritm

SAS EM-noder

A priori rules

Association Link Analysis

K-means klustring

Cluster High Performance Cluster

SAS procedurer

FASTCLUS HPCLUS

Spektral klustring

Custom lösning genom Base SAS och procedurerna DISTANCE och PRINCOMP

Kernel density estimation

KDE

Kernel PCA

Custom lösning genom Base SAS och procedurerna CORR, PRINCOMP och SCORE

Singular value decomposition

HPTMINE IML

Self organizing maps

SOM/Kohonen

Copyright © 2015, SAS Institute Inc. All rights reserved.

Semi-Supervised learning algoritmer Algoritm Denoising autoencoders

SAS EM-noder

SAS procedurer HPNEURAL NEURAL

Copyright © 2015, SAS Institute Inc. All rights reserved.

Varför har machine learning fått ökat intresse?  Big data  Beräkningsresurser  Kraftfulla datorer “Space is big. You just won't believe how

vastly, hugely, mind-bogglingly big it is”  Billig datalagring Douglas Adams i ”Liftarens guide till galaxen”

Copyright © 2015, SAS Institute Inc. All rights reserved.

Copyright © 2015, SAS Institute Inc. All rights reserved.

Mer läsning •



White papers 

http://www.sas.com/en_us/whitepapers/machine-learning-with-sas-enterprise-miner-107521.html



http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf

SAS-länkar 

http://www.sas.com/en_us/insights/analytics/machine-learning.html



http://www.sas.com/en_us/insights/articles/analytics/introduction-to-machine-learning-five-things-the-quants-wish-weknew.html



SAS Data Mining Community 



https://communities.sas.com/community/support-communities/sas_data_mining_and_text_mining/

Big Data Matters Webinar Series: 

www.sas.com/bigdatamatters

Copyright © 2015, SAS Institute Inc. All rights reserved.

Tack!

Copyright © 2015, SAS Institute Inc. All rights reserved.