Hochschule Düsseldorf University of Applied Scienses
Fachbereich Wirtschaftswissenschaften Faculty of Business Studies
HSD W Business Analytics (M.Sc.) IT in Business Analytics
IT Applications in Business Analytics SS2016 / Lecture 03 – Tools, Technologies, Data Sources Thomas Zeutschler
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
1
Let’s get started…
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
2
Sequence of Lectures 1
Introduction 1st April 2016
2
Methodology and process model for analytics (CRISP DM)
3
Tools, technologies and data sources
4
The R Programming Language
5
KNIME
6
Case Study 1
9
Case Study 2
12 Case Study 3 15 Wrap Up HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
8th
July 2016
Theory
Tools Training Hands On Case Studies
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
3
Database Systems – Introduction
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
4
Basic Concept A Database is a collection of data organized in some way grouped on what they refer to (business), and on data types (technology) relationships between pieces of data
EmployeeID Example: Think of records (paper, electronic) describing employee LastName FirstName Phone The organization of data is explained by Metadata. Data = symbols for recording and communication
Number Text Text Number
Employee is a Class of data (what data refer to), which has 1…N Attributes, and each attribute belongs to a specific data type (text, numeric…).
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
5
Basic Concept – Database Management System A Database Management System (DBMS) is a software for creating databases, storing & retrieving data, creating user interface (forms), creating reports, and administering a DB system (security, access, etc.). DBMS Product (a specific DBMS software; same as “DBMS Brand”). Database System (DBMS Application, Application)*: An implementation of a DBMS product including one or more databases, logic (business rules), some code, and user interface. Broader meaning: Supporting various information needs. An information system that results from “applying” a DBMS or a database. Narrower meaning: Supporting specific information needs.
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
6
Basic Concept – Attributes The set of allowed values for each attribute is called the Domain of the attribute Attribute values are (normally) required to be atomic; that is, indivisible The special value null is a member of every domain. Indicated that the value is “unknown” The null value is problematic in the definition of many database operations. HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
attributes (or columns)
tuples (or rows)
7
Basic Concept – Attribute Relations A1, A2, …, An are attributes R = (A1, A2, …, An ) is a relation schema
Example:
instructor = (ID, name, dept_name, salary)
Formally, given sets D1, D2, …. Dn, a relation r is a subset of
D1 x D2 x … x Dn Thus, a relation is a set of n-tuples (a1, a2, …, an) where each ai Di The current values (relation instance) of a relation are specified by a table An element t of r is a tuple, represented by a row in a table
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
8
Basic Concept – Tuples
Order of tuples is irrelevant. Tuples may be stored in an arbitrary order. Example: Instructor relation with unordered tuples.
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
9
Basic Concept – Keys Let K R K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation r(R) Example: {ID} and {ID,name} are both superkeys of instructor.
Superkey K is a candidate key if K is minimal Example: {ID} is a candidate key for Instructor
One of the candidate keys is selected to be the primary key. Which one?
Foreign key constraint: Value in one relation must appear in another Referencing relation Referenced relation Example – dept_name in instructor is a foreign key from instructor referencing department
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
10
Basic Concept – Database Schema
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
11
Basic Concept – SQL SQL Structured Query Language is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS) SQL is based upon relational algebra and tuple relational calculus. https://en.wikipedia.org/wiki/Relational_algebra SQL defines 3 language aspects: data definition language (DDL) …to define database schemas data manipulation language … selecting, inserting, deleting and updating data data control language …to control access rights in databases
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
12
Basic Concept – SQL Statement
Online RDBMS: http://sqlfiddle.com/ Online SQL editor: http://www.w3schools.com/sql/trysql.asp?filename=trysql_select_all HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
13
Basic Concept – SQL can get complex
Both queries return the same result. Do they? HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
14
Basic Concept – SQL Joins
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
15
Basic Concept – Learning SQL http://www.w3schools.com/sql SQL is an essential skill in data science and business analytics. SQL is not only used in relational database management systems but also for many other data management technologies like Hadoop and No-SQL databases.
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
16
Database System Classification
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
17
Database System Classification – Overview Relational Store – RDBMS, as discussed before Column Store each storage block contains data from only one column Document Store stores documents made up of tagged elements Key-Value Store hash table of keys XML Databases Graph Databases Codasyl Databases Object Oriented Databases and other…
NoSQL
Databases
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
18
Database System Classification SQL Databases Predefined Schema Standard definition and interface language Tight consistency Well defined semantics NoSQL Database No predefined Schema Per-product definition and interface language Getting an answer quickly is more important than getting a correct answer HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
19
Graph Database A Graph Database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.
An Ontology is a formal naming and definition of the types, properties, and interrelationships of the entities that really or fundamentally exist for a particular domain of discourse.
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
20
Graph Database – Sample
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
21
Graph Database Sample Largest, most actively used graph today.
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
22
OLAP Database Online analytical processing, or OLAP is a computer-based technique of analyzing data to look for insights by answering multidimensional analytical (MDA) queries easily.
An OLAP cube is a term that typically refers to multi-dimensional array of data. The term cube here refers to a multi-dimensional dataset, which is also sometimes called a hypercube if the number of dimensions is greater than 3. HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
23
OLAP Database
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
24
Hadoop Apache Hadoop is an open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework. Hadoop was designed by the web company Yahoo out of necessity for efficiently storing and analyzing the inherently enormous data set of web scale applications. HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
25
Hadoop Map Reduce MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. The term MapReduce actually refers to two separate and distinct tasks that Hadoop programs perform. The first is the map job, which takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce job is always performed after the map job.
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
26
Hadoop Map Reduce – WordCount Sample
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
27
Hadoop Hardware Cluster
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
28
Spark
Apache Spark™ is a fast and general engine for large-scale data processing.
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
29
Big Data Framework A pre-customized and pre-compiled collection of tools and technologies required for big data processing based on Hadoop.
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
30
Analytical Data Processing
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
31
Analytical Data Processing Platform An Analytical Data Processing Platform is a software that provides an integrated environment for machine learning, data mining, text mining, predictive analytics and business analytics. It is used for business and commercial applications as well as for research, education, training, rapid prototyping, and application development and supports all steps of the data mining process including data preparation, results visualization, validation and optimization. HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
32
Analytics Data Processing – Sample: Knime
www.knime.org
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
33
Analytics Data Processing – Sample
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
34
Analytics Data Processing – Sample
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
35
Cloud Platforms
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
36
Cloud Platforms – Sample
And many other (smaller) vendors… HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
37
Data Management Platform – Microsoft From Data to Decision and Action
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
38
Cloud Platform Infrastructure Data centers around the globe
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
39
Cloud Data Center Microsoft Azure Data Center in Dublin
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
40
Cloud Data Center Inside a Data Center
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
41
Cloud Data Management Platform – Microsoft
Applications
Orchestration
Relational
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
Reports
Dashboards
Information management
Complex event processing
Non-relational
NoSQL
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
Natural language query
Mobile
Modeling
Machine learning
Streaming
Internal & external 42
Cloud Data Management Platform – Microsoft Big Data for IoT (Internet of Things)
Business apps
People
Custom apps
Sensors and devices
DATA
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
Automated Systems
INTELLIGENCE SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
ACTION 43
Cloud Data Management Platform – Microsoft Azure Data Factory – Data Orchestration Connect to data sources Orchestrate data movement & data processing Publish to users as a searchable data view Operationalize workflows (schedule, manage, debug) Lifecycle management, monitoring
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
MapReduce Hive Pig C# Stored Procedures Azure Machine Learning
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
44
Cloud Data Management Platform – Microsoft Azure Stream Analytics – Data Orchestration
Consumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure, and applications
Performs time-sensitive analysis using SQL-like language against multiple real-time streams and reference data Outputs to persistent stores, dashboards or back to devices
POS Terminals
Smart Phones
Kiosks
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
Servers
ATM
Security Kinect
PCs/ Laptops Self Checkout Stations
Slates/ Tablets
Automation Devices
Point of Service Devices
Digital Signs
Vending Machines Thin Clients Handhelds
Logic Controllers
Remote Medical Monitors
Specialized Devices
Diagnostic Equipment
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
45
Cloud Data Management Platform – Microsoft Azure Machine Learning Get started with just a browser Requires no provisioning; simply log on to your Azure subscription or try it for free off azure.com/ml Experience the power of choice Choose from hundreds of algorithms and packages from R and Python or drop in your own custom code Take advantage of business-tested algorithms from Xbox and Bing Deploy solutions in minutes With the click of a button, deploy the finished model as a web service that can connect to any data, anywhere
Microsoft Azure Machine Learning Studio Modeling environment (shown) Microsoft Azure Machine Learning API service Model in production as a web service
Microsoft Azure Machine Learning Marketplace APIs and solutions for broad use
Connect to the world Brand and monetize solutions on our global Machine Learning Marketplace https://datamarket.azure.com/
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
46
Data Sources
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
47
Data Sources – Public Data Sets Das Datenportal für Deutschland https://www.govdata.de European Union Open Data Portal https://open-data.europa.eu Google Public Data http://www.google.com/publicdata/directory?hl=en_US Where can I find (very) large public datasets? https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
48
Data Sources – Public Data Sets Kaggle.com https://www.kaggle.com/datasets Google – Search for Data Sets https://cse.google.com/cse/publicurl?cx=002720237717066476899:v2wv26idk7m Data Sets for Classroom Use http://www2.stetson.edu/~jrasp/data.htm List of free data sets https://r-dir.com/reference/datasets.html List of Reference Data Sets for Statistics https://r-dir.com/reference/datasets.html
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
49
Lecture Summary & Homework
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
50
Lessons Learned The basic concept of relational databases is known. New data management technologies are emerging. Hadoop as a platform for big data management. Analytical Data Processing requires adequate tools and processes. Cloud as platform for data storage and computing.
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
51
Resources Database Design (eBook) Chapter 1 - 7 https://opentextbc.ca/dbdesign/ Database Management Systems (eBook) http://sirpabs.ilahas.com/ebooks/Computer%20&%20Technology/Databas e%20Mgmt/Database.Management.Systems.2nd.Edition.pdf
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
52
Get Prepared (Homework) Train yourself in R. Take the course: http://tryr.codeschool.com/ Train yourself in SQL. Take the course: http://www.w3schools.com/sql Open a second window with the online SQL database: http://www.w3schools.com/sql/trysql.asp?filename=trysql_select_al Take the SQL Quiz and try to get at least 20 points. http://www.w3schools.com/sql/sql_quiz.asp
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
53
Any Questions?
HSD
Faculty of Business Studies Thomas Zeutschler Associate Lecturer
SS 2016 - IT Applications in Business Analytics - 3. Tools, Technologies and Data Sources
54