Big Data Fundamentals: Concepts, Drivers & Techniques

Prentice Hall
Thomas Erl / Wajid Khattak / Paul Buhler  
Total pages
January 2016
Related Titles

Product detail

Product Price CHF Available  
Big Data Fundamentals: Concepts, Drivers & Techniques
44.50 approx. 7-9 days


Big Data Science Fundamentals offers a comprehensive, easy-to-understand, and up-to-date understanding of Big Data for all business professionals and technologists. Leading enterprise technology author Thomas Erl introduces key Big Data concepts, theory, terminology, technologies, key analysis/analytics techniques, and more - all logically organized, presented in plain English, and supported by easy-to-understand diagrams and case study examples.


  • Presents vendor-neutral coverage of concepts, theory, terminology, technologies, key analysis/analytics techniques, and more
  • Illuminates fundamental and advanced principles with hundreds of images, diagrams, and real case studies
  • Clarifies the linkages between Big Data and existing enterprise technologies, analytics capabilities, and business intelligence systems
  • Clear, consistent, logically organized, and up-to-date
  • The newest title in The Prentice Hall Service Technology Series from Thomas Erl

Table of Contents

Acknowledgments   xvii
Reader Services   xviii
Chapter 1: Understanding Big Data   3

Concepts and Terminology   5
Datasets   5
Data Analysis   6
Data Analytics   6
Descriptive Analytics   8
Diagnostic Analytics   9
Predictive Analytics   10
Prescriptive Analytics   11
Business Intelligence (BI)   12
Key Performance Indicators (KPI)   12
Big Data Characteristics   13
Volume   14
Velocity   14
Variety   15
Veracity   16
Value   16
Different Types of Data   17
Structured Data   18
Unstructured Data   19
Semi-structured Data   19
Metadata   20
Case Study Background   20
History   20
Technical Infrastructure and Automation Environment   21
Business Goals and Obstacles   22
Case Study Example   24
Identifying Data Characteristics   26
Volume   26
Velocity   26
Variety   26
Veracity   26
Value   27
Identifying Types of Data   27
Chapter 2: Business Motivations and Drivers for Big Data Adoption   29
Marketplace Dynamics   30
Business Architecture   33
Business Process Management   36
Information and Communications Technology   37
Data Analytics and Data Science   37
Digitization   38
Affordable Technology and Commodity Hardware   38
Social Media   39
Hyper-Connected Communities and Devices   40
Cloud Computing   40
Internet of Everything (IoE)   42
Case Study Example   43
Chapter 3: Big Data Adoption and Planning Considerations   47
Organization Prerequisites   49
Data Procurement   49
Privacy   49
Security   50
Provenance   51
Limited Realtime Support   52
Distinct Performance Challenges   53
Distinct Governance Requirements   53
Distinct Methodology   53
Clouds   54
Big Data Analytics Lifecycle   55
Business Case Evaluation   56
Data Identification   57
Data Acquisition and Filtering   58
Data Extraction   60
Data Validation and Cleansing   62
Data Aggregation and Representation   64
Data Analysis   66
Data Visualization   68
Utilization of Analysis Results   69
Case Study Example   71
Big Data Analytics Lifecycle   73
Business Case Evaluation   73
Data Identification   74
Data Acquisition and Filtering   74
Data Extraction   74
Data Validation and Cleansing   75
Data Aggregation and Representation   75
Data Analysis   75
Data Visualization   76
Utilization of Analysis Results   76
Chapter 4: Enterprise Technologies and Big Data Business Intelligence   77
Online Transaction Processing (OLTP)   78
Online Analytical Processing (OLAP)   79
Extract Transform Load (ETL)   79
Data Warehouses   80
Data Marts   81
Traditional BI   82
Ad-hoc Reports   82
Dashboards   82
Big Data BI   84
Traditional Data Visualization   84
Data Visualization for Big Data   85
Case Study Example   86
Enterprise Technology   86
Big Data Business Intelligence   87
Chapter 5: Big Data Storage Concepts   91

Clusters   93
File Systems and Distributed File Systems   93
NoSQL   94
Sharding   95
Replication   97
Master-Slave   98
Peer-to-Peer   100
Sharding and Replication   103
Combining Sharding and Master-Slave Replication   104
Combining Sharding and Peer-to-Peer Replication   105
CAP Theorem   106
ACID   108
BASE   113
Case Study Example   117
Chapter 6: Big Data Processing Concepts   119
Parallel Data Processing   120
Distributed Data Processing   121
Hadoop   122
Processing Workloads   122
Batch   123
Transactional   123
Cluster   124
Processing in Batch Mode   125
Batch Processing with MapReduce   125
Map and Reduce Tasks   126
Map   127
Combine   127
Partition   129
Shuffle and Sort   130
Reduce   131
A Simple MapReduce Example   133
Understanding MapReduce Algorithms   134
Processing in Realtime Mode   137
Speed Consistency Volume (SCV)   137
Event Stream Processing   140
Complex Event Processing   141
Realtime Big Data Processing and SCV   141
Realtime Big Data Processing and MapReduce   142
Case Study Example   143
Processing Workloads   143
Processing in Batch Mode   143
Processing in Realtime   144
Chapter 7: Big Data Storage Technology   145
On-Disk Storage Devices   147
Distributed File Systems   147
RDBMS Databases   149
NoSQL Databases   152
Characteristics   152
Rationale   153
Types   154
Key-Value   156
Document   157
Column-Family   159
Graph   160
NewSQL Databases   163
In-Memory Storage Devices   163
In-Memory Data Grids   166
Read-through   170
Write-through   170
Write-behind   172
Refresh-ahead   172
In-Memory Databases   175
Case Study Example   179
Chapter 8: Big Data Analysis Techniques   181
Quantitative Analysis   183
Qualitative Analysis   184
Data Mining   184
Statistical Analysis   184
A/B Testing   185
Correlation   186
Regression   188
Machine Learning   190
Classification (Supervised Machine Learning)   190
Clustering (Unsupervised Machine Learning)   191
Outlier Detection   192
Filtering   193
Semantic Analysis   195
Natural Language Processing   195
Text Analytics   196
Sentiment Analysis   197
Visual Analysis   198
Heat Maps   198
Time Series Plots   200
Network Graphs   201
Spatial Data Mapping   202
Case Study Example   204
Correlation   204
Regression   204
Time Series Plot   205
Clustering   205
Classification   205
Appendix A: Case Study Conclusion   207
About the Authors   211

Thomas Erl   211
Wajid Khattak   211
Paul Buhler   212
Index   213


Thomas Erl is a top-selling IT author, founder of Arcitura Education and series editor of the Prentice Hall Service Technology Series from Thomas Erl. With more than 200,000 copies in print worldwide, his books have become international bestsellers and have been formally endorsed by senior members of major IT organizations, such as IBM, Microsoft, Oracle, Intel, Accenture, IEEE, HL7, MITRE, SAP, CISCO, HP and many others. As CEO of Arcitura Education Inc., Thomas has led the development of curricula for the internationally recognized Big Data Science Certified Professional (BDSCP), Cloud Certified Professional (CCP) and SOA Certified Professional (SOACP) accreditation programs, which have established a series of formal, vendor-neutral industry certifications obtained by thousands of IT professionals around the world. Thomas has toured more than 20 countries as a speaker and instructor. More than 100 articles and interviews by Thomas have been published in numerous publications, including The Wall Street Journal and CIO Magazine.

Wajid Khattak is a Big Data researcher and trainer at Arcitura Education Inc. His areas of interest include Big Data engineering and architecture, data science, machine learning, analytics and SOA. He has extensive .NET software development experience in the domains of business intelligence reporting solutions and GIS.

Wajid completed his MSc in Software Engineering and Security with distinction from Birmingham City University in 2008. Prior to that, in 2003, he earned his BSc (Hons) degree in Software Engineering from Birmingham City University with first-class recognition. He holds MCAD & MCTS (Microsoft), SOA Architect, Big Data Scientist, Big Data Engineer and Big Data Consultant (Arcitura) certifications.

Dr. Paul Buhler is a seasoned professional who has worked in commercial, government and academic environments. He is a respected researcher, practitioner and educator of service-oriented computing concepts, technologies and implementation methodologies. His work in XaaS naturally extends to cloud, Big Data and IoE areas. Dr. Buhler’s more recent work has been focused on closing the gap between business strategy and process execution by leveraging responsive design principles and goal-based execution.

As Chief Scientist at Modus21, Dr. Buhler is responsible for aligning corporate strategy with emerging trends in business architecture and process execution frameworks. He also holds an Affiliate Professorship at the College of Charleston, where he teaches both graduate and undergraduate computer science courses. Dr. Buhler earned his Ph.D. in Computer Engineering at the University of South Carolina. He also holds an MS degree in Computer Science from Johns Hopkins University and a BS in Computer Science from The Citadel.