Hadoop 2 Quick-Start Guide

Series
Addison-Wesley
Author
Douglas Eadline  
Publisher
Addison-Wesley
Cover
Softcover
Edition
1
Language
English
Total pages
304
Pub.-date
October 2015
ISBN13
9780134049946
ISBN
0134049942
Related Titles


Product detail

Product Price CHF Available  
9780134049946
Hadoop 2 Quick-Start Guide
39.30

Description

An easy, accessible guide to Big Data technology, this book covers all the basics students need to know to install and use Hadoop 2 on both personal computers and servers, and navigate the entire Apache Hadoop ecosystem. Hadoop 2 is demystified; This guide explains the problems Hadoop solves, shows how it relates to Big Data, and demonstrates both administrators and users work with it. From its Getting Started checklist/flowchart to its roadmap of additional resources, Hadoop 2 Quick-Start Guide is the perfect Hadoop 2 starting point for students to master Big Data.

Features

  • Helps students get Hadoop up and running fast with clear, well-tested beginner-level instructions and examples
  • Includes hands-on coverage: HDFS, running programs, benchmarking, MapReduce, higher-level tools, YARN, administration, and more
  • Demystifies Hadoop 2

Table of Contents

Foreword    xi

Preface xiii

Acknowledgments    xix

About the Author xxi

 

Chapter 1: Background and Concepts    1

Defining Apache Hadoop  1

A Brief History of Apache Hadoop  3

Defining Big Data  4

Hadoop as a Data Lake  5

Using Hadoop: Administrator, User, or Both  6

First There Was MapReduce  7

Moving Beyond MapReduce with Hadoop V2   13

The Apache Hadoop Project Ecosystem   15

Summary and Additional Resources   18

 

Chapter 2: Installation Recipes    19

Core Hadoop Services   19

Planning Your Resources   21

Installing on a Desktop or Laptop   23

Installing Hadoop with Ambari   40

Installing Hadoop in the Cloud Using Apache Whirr   56

Summary and Additional Resources   62

 

Chapter 3: Hadoop Distributed File System Basics 63

Hadoop Distributed File System Design Features   63

HDFS Components   64

HDFS User Commands   72

HDFS Web GUI   77

Using HDFS in Programs   77

Summary and Additional Resources   83

 

Chapter 4: Running Example Programs and Benchmarks 85

Running MapReduce Examples   85

Running Basic Hadoop Benchmarks   95

Summary and Additional Resources   98

 

Chapter 5: Hadoop MapReduce Framework    101

The MapReduce Model   101

MapReduce Parallel Data Flow   104

Fault Tolerance and Speculative Execution   107

Summary and Additional Resources   109

 

Chapter 6: MapReduce Programming 111

Compiling and Running the Hadoop WordCount Example   111

Using the Streaming Interface   116

Using the Pipes Interface   119

Compiling and Running the Hadoop Grep Chaining Example   121

Debugging MapReduce   124

Summary and Additional Resources   128

 

Chapter 7: Essential Hadoop Tools    131

Using Apache Pig   131

Using Apache Hive   134

Using Apache Sqoop to Acquire Relational Data   139

Using Apache Flume to Acquire Data Streams   148

Manage Hadoop Workflows with Apache Oozie   154

Using Apache HBase   163

Summary and Additional Resources   169

 

Chapter 8: Hadoop YARN Applications 171

YARN Distributed-Shell   171

Using the YARN Distributed-Shell   172

Structure of YARN Applications   178

YARN Application Frameworks   179

Summary and Additional Resources   184

 

Chapter 9: Managing Hadoop with Apache Ambari 185

Quick Tour of Apache Ambari   186

Managing Hadoop Services   194

Changing Hadoop Properties   198

Summary and Additional Resources   204

 

Chapter 10: Basic Hadoop Administration Procedures   205

Basic Hadoop YARN Administration   206

Basic HDFS Administration   208

Capacity Scheduler Background   220

Hadoop Version 2 MapReduce Compatibility   222

Summary and Additional Resources   225

 

Appendix A: Book Webpage and Code Download 227

 

Appendix B: Getting Started Flowchart and Troubleshooting Guide    229

Getting Started Flowchart   229

General Hadoop Troubleshooting Guide   229

 

Appendix C: Summary of Apache Hadoop Resources by Topic 243

General Hadoop Information   243

Hadoop Installation Recipes   243

HDFS   244

Examples   244

MapReduce   245

MapReduce Programming   245

Essential Tools   245

YARN Application Frameworks   246

Ambari Administration   246

Basic Hadoop Administration   247

 

Appendix D: Installing the Hue Hadoop GUI    249

Hue Installation   249

Starting Hue   253

Hue User Interface   253

 

Appendix E: Installing Apache Spark   257

Spark Installation on a Cluster   257

Starting Spark across the Cluster   258

Installing and Starting Spark on the Pseudo-distributed Single-Node Installation   260

Run Spark Examples   260

 

Index   261

 

Author

Douglas Eadline began his career as a practitioner and a chronicler of the Linux cluster HPC revolution and now documents Big Data analytics. Starting with the first Beowulf Cluster how-to document, Doug has written hundreds of articles, white papers, and instructional documents covering virtually all aspects of High Performance Computing (HPC). Prior to starting and editing the popular ClusterMonkey.net website in 2005, he served as editor-in-chief for ClusterWorld Magazine, and was senior HPC editor for Linux Magazine. Currently, he is a writer and consultant to the HPC/Data Analytics industry and leader of the Limulus Personal Cluster Project (limulus.basement-supercomputing.com). He authored Hadoop Fundamentals LiveLessons, Second Edition (2015), and Apache Hadoop YARN LiveLessons (2014), and is coauthor of Apache Hadoop™ YARN (2014), all from Addison-Wesley.