Strategic Prototyping for Developing Big Data Systems

Strategic Prototyping for Developing Big Data Systems Authors: Hong-Mei Chen, Rick Kazman, and Serge Haziyev, 2016 [1] Presenter: Gustavo Fortes Tonde...
Author: Martha Park
6 downloads 0 Views 1MB Size
Strategic Prototyping for Developing Big Data Systems Authors: Hong-Mei Chen, Rick Kazman, and Serge Haziyev, 2016 [1] Presenter: Gustavo Fortes Tondello

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

1

Agenda Risks of Big Data System Development

RASP (Risk-Based, Architecture-Centric Strategic Prototyping) Toward Strategic Prototyping The Case Studies An Architecture-Centric Approach Applying RASP Takeaways Discussion

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

2

Risks of Big Data System Development The five v’s (volume, variety, velocity, veracity, value)

Paradigm shifts Rapid proliferation of big data technology Rapid technology changes The difficulty of selecting big data technology The complex integration of new and old systems The short history of big data system development

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

3

RASP (Risk-Based, Architecture-Centric Strategic Prototyping) Goal: To provide cost-effective, systematic risk management in agile big data system development [1] An embedded multiple-case study of nine big data projects at SoftServe, a global outsourcing firm, validated RASP

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

4

Toward Strategic Prototyping Rapid Application Development in small data: relies on horizontal evolutionary prototypes All system functions are developed with limited, but increasing functionality per release Evolutionary prototyping is needed when: ◦ ◦ ◦ ◦

requirements are uncertain, technologies are new, no comparable system has been previously developed, or experimentation or design evaluation is necessary to assess solutions

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

5

Toward Strategic Prototyping Horizontal evolutionary prototyping does not work well with big data because of scale and complexity It is not possible to measure performance in a small data set and assume it will scale linearly Creating prototypes can be expensive and time consuming in big data Prototyping cannot identify some risks, such as: ◦ vendor or product survivability for commercial products ◦ community’s size and vigor for open source products

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

6

Toward Strategic Prototyping Alternatives to prototyping ◦ Simulations: viable in mature domains ◦ Formal analytical models: used in domains with critical requirements, but expensive ◦ Architecture analysis: relatively inexpensive way to gain early insight into a system’s properties and risks Architectural analysis is viable for big data and helps identify: ◦ Architectural risks: performance, availability, security, integrity ◦ Process risks: requirements, tools, methods ◦ Organizational risks: business goals, awareness, scope, needs

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

7

Toward Strategic Prototyping Strategic Prototyping: combining architecture analysis and prototyping ◦ Architecture analysis is less expensive, but it cannot verify all risks ◦ Prototyping is more expensive, but it can detect a larger variety of risks

Types of prototypes: ◦ Throwaway prototypes: rapid prototypes or proof of concept ◦ Vertical evolutionary prototypes: one or more system components are developed with full functionality in each release ◦ Minimum viable product (MVP): an evolutionary prototype that has only those core features that allow product deployment, minimizing the time spent on an iteration

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

8

The Case Studies The nine case studies that we conducted at SoftServe covered: ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

network security and intrusion prevention online-coupon Web analytics a cloud-based mobile-app development platform a telecom e-tailing platform Web analytics and marketing optimization healthcare insurance operational intelligence an ultra-large-scale travel site an operations intelligence platform for a content delivery network, and a big-data-as-a-service platform

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

9

The Case Studies The case studies showed that big data system development differs qualitatively from small data system development Most of the schedule and effort went into: ◦ ◦ ◦ ◦

value discovery (from big data) architecture analysis prototyping orchestration of off-the-shelf commercial and open source components

In contrast, in small-data system development, a far higher percentage of schedule and effort goes straight into programming

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

10

An Architecture-Centric Approach The authors integrate RASP (Risk-Based, Architecture-Centric Strategic Prototyping) with BDD (Big Data Design) [2,3] BDD aims to address big data’s inherent complexity Architecture is the key to dealing with complexity: complex systems that exhibit near-decomposability are easier to describe and understand and have simplified behavior [4] An architectural approach to technology selection and system integration is critical because big data development focuses primarily on orchestration instead of coding

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

11

Source: [1] 2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

12

Applying RASP in the case studies The authors have developed questions that architects should answer and a decision flowchart for them to follow before they embark on a (relatively costly) prototyping effort The questions focused on whether: ◦ ◦ ◦ ◦

the technology was new the team had successfully used it before significant new requirements existed objective evidence substantiated product performance claims

The answers to these questions helped the authors determine whether to prototype and how to evaluate the prototyping results (in their case studies)

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

13

Source: [1] 2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

14

Applying RASP in the case studies For two of the case studies, which required quick feedback on the budget and schedule for business decision making, architecture analysis alone sufficed In the remaining seven case studies, creating vertical evolutionary prototypes and throwaway prototypes took from two to six weeks

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

15

Applying RASP – Guidelines (1) Employ architecture analysis to make early decisions that have a project-wide scope. Then, employ throwaway prototypes to make technology choices and demonstrate feasibility. Architecture analysis alone is insufficient to prove many important system properties. Architecture analysis complements vertical evolutionary prototyping. Analysis helps you select candidate technologies; prototyping validates those choices. An evolutionary prototype can effectively mitigate risk if it is implemented as a skeleton—an infrastructure into which components and technologies can be integrated.

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

16

Applying RASP – Guidelines (2) Vertical evolutionary prototyping can help answer questions about system-wide properties but might need to be augmented with throwaway prototypes when requirements are volatile. Throwaway prototypes work best to quickly evaluate a technology. Whether to create an MVP is more of a business decision than a decision driven simply by technological risk.

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

17

Takeaways (from the paper) An architecture-centric approach (RASP) can help make risk management explicit, systematic, and cost-effective. The need for strategic prototyping is critical for big data system development, which involves high risk, high complexity, and immature technologies. An architecture-centric approach provides a more intuitive, efficient, and effective way to identify, mitigate, and continuously monitor risks. However, no design or development method can guarantee success. The architectural agility created and the architecture analyses performed with RASP rely on the architect’s training and discipline.

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

18

References [1] Hong-Mei Chen, Rick Kazman, and Serge Haziyev. 2016. “Strategic Prototyping for Developing Big Data Systems”. IEEE Software, 33:2 (March/April 2016), pp. 36-43. DOI: http://dx.doi.org/10.1109/MS.2016.36 [2] H.-M. Chen, R. Kazman, and S. Haziyev. “Agile Big Data Analytics Development: An Architecture-Centric Approach”. Proc. HICSS 16, 2016, pp. 5378-5387. [3] H.-M. Chen et al.. “Big Data System Development: An Embedded Case Study with a Global Outsourcing Firm”. Proc. BIGDSE 15, 2015, pp. 44-50. [4] H. Simon. “The Architecture of Complexity”. Proc. Am. Philosophical Soc., vol. 106, no. 6, 1962, pp. 467–482.

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

19

Discussion – strengths and weaknesses Strengths ◦ The authors propose a well documented process for designing big data applications ◦ The arguments in favour of the process are strong and well presented ◦ The process is grounded on the authors’ experience in several case studies

Weaknesses ◦ The authors frequently compared their method with small data development processes, but they did not compare it with other big data development approaches ◦ The effectiveness of the suggested process was not empirically validated

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

20

Discussion – related papers Vanauer, Böhle, and Hellingrath. “Guiding the Introduction of Big Data in Organizations: A Methodology with Business- and Data-Driven Ideation and Enterprise Architecture Management-Based Implementation”. Proc. HICSS 2015, IEEE, pp. 908-917. ◦ This paper also discusses a methodology for analyzing and designing big data systems ◦ However, it focuses more on analyzing business value or data attributes instead of architectural analysis and prototyping

Chen, Wu, and Wang. “The Evolvement of Big Data Systems: From the Perspective of an Information Security Application”. Big Data Research 2 (2015), Elsevier, pp. 65-73. ◦ This paper discusses how requirements and technology evolve over time in big data systems ◦ However, it focuses more on technological issues than the process

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

21

Discussion – future work The suggested processes and methods must be further validated by: ◦ Tests outside the research institution and company that developed them ◦ Comparison with other similar approaches (or even with development projects that don’t use a standard process) to assess the effectiveness or the suggested process

The work could also be expanded by providing more information on how the process can be adapted to different project types of requirements.

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

22

Discussion What are the differences between big data and small data system development? Do you agree that throwaway and vertical evolutionary prototypes are better for big data development than horizontal evolutionary prototypes? The authors argue that selecting and orchestrating technology is more important than coding in big data projects; do you agree? What could be different approaches for designing and developing big data applications?

2016-10-04

CS 846 - SOFTWARE ENGINEERING FOR BIG DATA

23