USING SOCIAL MEDIA DATA TO PROPEL YOUR BUSINESS. A 3-Step Guide to Ingest, Process & Visualize Twitter Data BROUGHT TO YOU BY

USING SOCIAL MEDIA DATA TO PROPEL YOUR BUSINESS A 3-Step Guide to Ingest, Process & Visualize Twitter Data BROUGHT TO YOU BY Table of Contents INTR...
Author: Bethanie Burns
1 downloads 1 Views 1MB Size
USING SOCIAL MEDIA DATA TO PROPEL YOUR BUSINESS

A 3-Step Guide to Ingest, Process & Visualize Twitter Data BROUGHT TO YOU BY

Table of Contents INTRO: SOCIAL MEDIA AS A GAME CHANGER FOR BUSINESS OVERCOMING BIG DATA CHALLENGES: AN AGILE APPROACH 3 SIMPLE STEPS TO INTEGRATING SOCIAL MEDIA DATA CROSSING THE FINISH LINE WITH VISUALIZATION CONCLUSION

Intro: Social Media as a Game Changer for Business Social media has revolutionized how we use mobile devices, but it can also provide a lot of value to organizations. The Annual MIT Sloan Management Review and Deloitte global survey on social business polled nearly 5,000 professionals across 109 countries and 26 industries on the use of social media in their organizations. Nearly 75% of respondents say that social business is important or somewhat important to their business today, up from 52% when the study started three years ago. More than 90% of participants from maturing social business companies say their leaders believe it can create powerful and positive change. Global organizations are using social media data to overcome a variety of challenges, gain valuable new insights and streamline communications to propel their businesses forward. For example:

→→IMPROVE THE CUSTOMER EXPERIENCE: Domino’s Pizza

used social media to turn consumer dissatisfaction into a groundswell of support, showing customers they were listening to their complaints – even sharing a photo of a messy pizza that an unhappy customer posted on the company’s website – and taking corrective action. In a less public but equally effective approach, Wells Fargo2 established a social media command center to coordinate analysis and response to customer feedback gleaned through social media channels around the world.

1 “How Social Media Can Influence High-Stakes Business Decisions,” by Kim S. Nash, CIO Magazine, October 2014 2 “How Social Media Can Influence High-Stakes Business Decisions,” by Kim S. Nash, CIO Magazine, October 2014

1

→→INFLUENCE POLITICAL DECISIONS: Virgin America used social media to influence Dallas 3

city leaders to give them, rather than a competitor, control of two gates at Dallas Love Field airport. Through social media the airline was able to demonstrate widespread support from the public that resulted in opening up access to new markets and averted a crisis.

→→UNCOVER IDEAS FOR NEW PRODUCTS AND SERVICES: Ford Motor Co. studied consumer 4

behavior through social media when it decided to build its popular hands-free lift gate. By taking a closer look at feedback from Ford enthusiasts the automaker was able to conduct an accurate cost-benefit analysis and even gained information on why customers wanted the feature which helped sway approval for the project.

→→COMPETITIVE RESEARCH: While monitoring customer conversations on social networks,

T-Mobile USA5 discovered that their lack of an iPhone offering was sending customers elsewhere. The company identified at-risk customers using names and the geo-location of their tweets. Tying back to their CRM system they reached out to those subscribers whose contracts were about to expire stressing the advantages of T-Mobile. In 90 days T-Mobile reduced customer attrition by 50%.

→→COMPANY PRODUCTIVITY: BASF

uses social business to improve productivity through better communications across its 88 global and regional business units and more than 112,000 employees worldwide. Through ‘connect.BASF’ they have 4,500 communities working together on various business initiatives. 6

Social media won’t replace current sources for intelligence gathering but is a strong complement and can help organizations make better, more informed business decisions. The challenge is in understanding how to incorporate this data, which is quite different in format from the typical data we gather, store and analyze within an Enterprise Data Warehouse, and then process it effectively. The good news is you don’t need significant investments in new skills or technologies to start leveraging this new source of information. This guide presents three simple steps you can take to make sense of this data in a quick, fast and easy way. No Hadoop, Java or specialized programming skills required.

3 “How Social Media Can Influence High-Stakes Business Decisions,” by Kim S. Nash, CIO Magazine, October 2014 4 “How Social Media Can Influence High-Stakes Business Decisions,” by Kim S. Nash, CIO Magazine, October 2014 5 “Moving Beyond Marketing, Generating Social Business Value Across the Enterprise,” MIT Sloan Management Review and Deloitte, 2014 6 “Moving Beyond Marketing, Generating Social Business Value Across the Enterprise,” MIT Sloan Management Review and Deloitte, 2014

Overcoming Big Data Challenges:

An Agile Approach One of the challenges in making better business decisions is the ability to keep pace with the “Three V’s” – the Volume, Velocity and Variety of data. While social media data is pushing the boundaries of the “Three V’s”, it can provide extremely valuable, relevant and timely insights. The trick is figuring out how to capture the data and leverage it with the skills your developers already have and tools they know how to use. Today, while organizations strive to harness the power of data for competitive advantage, the reality is that the high total cost of ownership, ongoing tuning and maintenance efforts and performance limitations of current approaches stand in the way. Many organizations are using a combination of conventional ETL tools and SQL coding for data integration. Adding a new data source, like social data, requires specialized programming skills, a strong IT background and significant time – often up to two to three months – to create a data model and determine the best approach for incorporating the data into the data warehouse. Performance bottlenecks often occur due to scalability issues and tuning requirements that further bog down traditional approaches. Social media creates additional challenges in dealing with the three “V’s.” For example:

→→VOLUME: Twitter volumes average more than 500 million tweets a day, or 6,000 tweets per

second (TPS). However, during the busiest twitter second on record, that number spiked to a staggering 143,199 TPS1. Meanwhile, Facebook takes in about 600 terabytes of data per day.42

→→VARIETY: Social media is a new type of data and introduces relatively new formats that need

to be integrated. JavaScript Object Notation (JSON) is one popular format that must be read and the data prepared for analysis.

→→VELOCITY: Data is coming at increased speed and organizations need to figure out how to

stay ahead. Some companies set thresholds, for example when social media posts on a specific topic rise to 100 posts in 60 minutes a company response is triggered. Other companies will go so far as to cite social media’s ability to outstrip the speed of response as a caution in the risk factors section of their annual reports. For better or worse, tweets that go viral in a matter of minutes can have a huge impact on an organization.

1 “Twitter Usage Statistics,” http://www.internetlivestats.com/twitter-statistics/ 2 “How Facebook Manages a 300-Petbyte Data Warehouse, 600 Terabytes Per Day,” April 11, 2014, http://allfacebook.com/orcfile_b130817

A targeted solution for ingesting, processing and distributing data can help organizations keep pace with the three “V’s.” It shifts the burden of handling common and repetitive tasks as well as performance tuning from individuals to software. It leverages algorithms, optimizations and smart technology to intelligently accelerate performance on-the-fly. A scalable architecture, designed for today’s dynamic business requirements and environments, uses efficient processing methods dynamically executed as needed. Innovations like Direct I/O enable more efficient transfer of larger blocks of data. And, of particular relevance to this topic, pervasive connectivity to a wide variety of sources and targets enables organizations to collect, prepare, aggregate and combine social data just as they would any data from any other source. Syncsort DMX is a light-weight, high-performance software solution with a graphical user interface and built-in connectivity that provides an intuitive and fast approach to extract, transform and load data from Twitter or other social tools in less time with fewer resources and less money. Social media is of the moment and the data it provides enables unprecedented opportunities for more relevant response. Now more than ever, organizations need agile solutions like Syncsort DMX to help gain greater flexibility, faster time-to-insight and optimize decision making for their business.

Download “The Ultimate Checklist for High-Performance ETL” for more details on the key capabilities of high-performance ETL > 6

There are a lot of aspects to consider when leveraging social media data. But the first step is to actually collect, prepare and make this data available for analytics along with other valuable corporate data sources. In the next two sections we’ll take a closer look at how you can easily factor social data into your business decisions using Syncsort DMX.

3 Simple Steps to Integrating Social Media Data You’re likely accustomed to dealing with data from a range of sources and targets, including relational databases, files, mainframes, CRM systems, web logs and HDFS. Unlocking the insights from this data is at the heart of the value to be gained from Big Data. However, social media introduces data formats, such as JavaScript Object Notation (JSON), that are new to many organizations. JSON is a low overhead alternative to XML that is primarily used to transmit data between a server and a web application. Twitter feeds use the JSON data format. The ability to handle JSON data is critical to enhancing business agility, enabling you to react quickly to market dynamics, changes in customer behavior, new competitive forces, etc. Unlike the typical flat data from data warehouses, JSON data tends to have more structure and sections with repeating groups of data that include demographics (age range end and start), likes and interests and location (country, state, city). JSON is one of the numerous data formats Syncsort DMX seamlessly plugs into so you can get started quickly – without any new programming skills or additional technical proficiencies required. Using Twitter as an example, it is a simple 3-step process.

Step 1 – Read the JSON File To begin incorporating insights from this type of social media data, you need to start by reading this type of data, converting it to tabular format and generating metadata. As shown below in Figure 1, once you get your data from Twitter (left), you can use Syncsort DMX to convert it into a layout (center) that you can easily understand, and transform the data (right) using DMX’s built-in functions and data transformations. Each tweet has tons of metadata stored alongside of it: geolocation data, hash-tags, mentions, images, etc. The Syncsort DMX JSON Reader allows you to read and parse the data, automatically creating the tabular record layout based on the JSON fields. From here, you can choose the fields you wish to preserve as well as define the horizontal or vertical expansion of repeated elements.

FIGURE 1

Step 2 – Specify Your Transformations The next step consists of filtering or transforming the Twitter data. You can select any type of transformation you wish to do with your JSON data. In Figure 2 we are outputting to a text file, but there are many different ways to transform the data. You can define field level transformations (combining two fields, such as first and last name, into a single field) or set level transformations (such as joining one data set to something else, aggregations or sorts). Use these transformations to enrich the data, such as the topic of the tweet, geo-location, gender and volume of retweets for further analysis. With this level of granular control you can zero-in on the areas where a new product announcement is most popular, understand if men or women are most interested in a competitor’s latest move, identify emerging trends in key markets, etc.

FIGURE 2

Step 3 – Run the Job In the final step, shown in Figure 3, you select where the job will run, the format and location of the job as well as when the job will run. It can run on a regular schedule (monthly, daily, or even every minute for near real-time analysis!) or as a one-time job. The extremely lightweight Syncsort DMX run-time engine performs the transformations specified and creates the output file. The parsed, filtered data is now available for use, and DMX can even send a notification when complete. While we’ve chosen to output our results to a tabular file, this data can easily be inserted into your data warehouse, Hadoop or even a visualization tool directly from DMX.

FIGURE 3

Crossing the Finish Line with Visualization The ability to consume data in a more intuitive and agile format is important to making that data even more useful for business decisions. Syncsort DMX gives users a fast way to ingest, prepare, transform and combine social media with other sources of information, getting it ready for engaging data discovery and visualizations. DMX supports any number of data visualization tools but has recently formed a relationship and tight integration with Tableau. This integration enables users to output their data, including social media data, with one click into Tableau to provide a more natural, end-user oriented approach to explore data and facilitate advanced analytics and visualization. As shown in Figure 4, creating a Tableau Data Extract (TDE) is as simple as creating any data flow and selecting TDE as the target file type using the Syncsort DMX point-and-click user interface. Since DMX comes with the Tableau API, the TDE file type is automatically included in the drop down file type menu. Use DMX to specify the fields you wish to visualize and pre-build your common transformations and filters into the dataset prior to loading into Tableau.

FIGURE 4

Simply run the job to generate the visualization. Figure 5 is an example of one way to visualize the data in Tableau, in this instance focusing on geo-location and concentration. In this view it is readily apparent that the heaviest volumes of tweets are coming from Illinois (Chicago), Georgia (Atlanta) and California (Los Angeles). Business users can easily utilize Tableau to select other views of this data, perhaps by age range or male/female, or of other data. All of this is done without writing a single line of code, freeing up ETL developers to focus on more complex, sophisticated data flows.

FIGURE 5

Conclusion... Global organizations are gaining tremendous business value from social media data – improving the customer experience, influencing key decision makers, uncovering ideas for new products and services, gaining competitive insights and improving productivity. While incorporating this data into the mix of other sources provides valuable insights, it is putting new pressures on IT teams and data management infrastructures. Already taxed with cumbersome processes and technologies that require a strong IT background, specialized programming skills and significant time are additionally required to ingest this new data source. Social media creates more challenges in dealing with the three “V’s” – breaking new boundaries in volume, adding to variety with new data formats like JSON and pushing many organizations’ limits on handling velocity with tweets going viral in minutes.

From Data Integration to Data Discovery & Visualization Syncsort understands these challenges and has incorporated a quick and easy way to make sense of all your data, including social media data. Syncsort DMX delivers a light footprint, end-to-end solution to ingest, prepare, understand and combine data from virtually any source without the need for highly specialized programming skills. DMX takes you all the way from raw data, to data discovery and visualization. Using Syncsort DMX and the same skills you already have within your organization, you can now easily factor social data like Twitter into your business decisions. It’s a simple 3-Step process to read this type of data, convert it to tabular format and combine it with other sources of information to gather new insights. It’s all done within an intuitive, graphical user interface without writing a single line of code. Going a step further, you can even output the data quickly to your favorite visualization tool. Realizing the importance of visualization to helping make business decisions, Syncsort has developed an API for Tableau, a leading visualization tool. The Tableau API comes with Syncsort DMX. Creating a Tableau Data Extract (TDE) is as simple as creating any data flow and selecting Tableau Data Extract (TDE) as the target file type in the drop down menu. From there business users can select various views of the data, freeing up ETL developers to focus on more complex, sophisticated data requests from across the organization. Many organizations are struggling to keep up with today’s accelerated pace of business. In this environment, social media has become a critical source of information. Ignoring it can have fast and devastating effects. However, with the right tools organizations can start using social data in minimum time, to gain that competitive edge and amplify success regardless of their size and scope.

Overcoming Big Data Challenges with Syncsort:

From Raw Data to Data Visualization

Challenge: Data Ingestion HOW SYNCSORT CAN HELP:

tool to connect all data sources and targets including relational →→One databases, appliances, files, XML, cloud and even mainframe JSON Reader to ingest, understand, prepare and combine social →→Built-in media data with other corporate data sources

Challenge: Skills HOW SYNCSORT CAN HELP:

“no coding” approach; complex SQL, Java, Perl code is replaced with a →→Apowerful, easy-to-use graphical development environment built-in functions and transformations to process all your →→Comprehensive data

Challenge: Productivity & Reusability HOW SYNCSORT CAN HELP:

of Use Case Accelerators to quickly develop common data flows →→Asuchlibrary as CDC, aggregations, joins and more metadata capabilities for increased re-usability, impact analysis →→Built-in and data lineage

Challenge: Scalability & Performance HOW SYNCSORT CAN HELP:

scalability with up to 75% less CPU and memory utilization to →→Linear handle social data volumes to 25x faster elapsed processing time than conventional tools and hand →→Up coding

→→Self-optimized engine with no tuning required Challenge: Analytics & Visualization Ready HOW SYNCSORT CAN HELP:

integration with leading analytic platforms allow you to load datasets →→Full into Vertica, Netezza, Greenplum, Teradata, HDFS and more Tableau Data Extracts (TDEs) directly from the graphical user →→Create interface without the need of additional software

Syncsort provides fast, secure, enterprise-grade software spanning Big Data solutions in Hadoop to Big Iron on mainframes. We help customers around the world to collect, process and distribute more data in less time, with fewer resources and lower costs. 87 of the Fortune 100 companies are Syncsort customers, and Syncsort’s products are used in more than 85 countries to offload expensive and inefficient legacy data workloads, speed data warehouse and mainframe processing and optimize cloud data integration. Experience Syncsort at www.syncsort.com

Learn More! EBOOK: 5 Tips to Break Through ELT Roadblocks > WHITE PAPER: Syncsort DMX Technical White Paper > EBOOK: 5 Pitfalls to Avoid with Hadoop >

© 2014 Syncsort Incorporated. All rights reserved. Company and product names used herein may be the trademarks of their respective companies.