更新时间:2021-07-09 20:03:08
封面
版权信息
Credits
About the Author
Acknowledgements
About the Reviewer
www.PacktPub.com
Preface
Chapter 1. Getting Started with Hadoop 2.X
Introduction
Installing a single-node Hadoop Cluster
Installing a multi-node Hadoop cluster
Adding new nodes to existing Hadoop clusters
Executing the balancer command for uniform data distribution
Entering and exiting from the safe mode in a Hadoop cluster
Decommissioning DataNodes
Performing benchmarking on a Hadoop cluster
Chapter 2. Exploring HDFS
Loading data from a local machine to HDFS
Exporting HDFS data to a local machine
Changing the replication factor of an existing file in HDFS
Setting the HDFS block size for all the files in a cluster
Setting the HDFS block size for a specific file in a cluster
Enabling transparent encryption for HDFS
Importing data from another Hadoop cluster
Recycling deleted data from trash to HDFS
Saving compressed data in HDFS
Chapter 3. Mastering Map Reduce Programs
Writing the Map Reduce program in Java to analyze web log data
Executing the Map Reduce program in a Hadoop cluster
Adding support for a new writable data type in Hadoop
Implementing a user-defined counter in a Map Reduce program
Map Reduce program to find the top X
Map Reduce program to find distinct values
Map Reduce program to partition data using a custom partitioner
Writing Map Reduce results to multiple output files
Performing Reduce side Joins using Map Reduce
Unit testing the Map Reduce code using MRUnit
Chapter 4. Data Analysis Using Hive Pig and Hbase
Storing and processing Hive data in a sequential file format
Storing and processing Hive data in the RC file format
Storing and processing Hive data in the ORC file format
Storing and processing Hive data in the Parquet file format
Performing FILTER By queries in Pig
Performing Group By queries in Pig
Performing Order By queries in Pig
Performing JOINS in Pig
Writing a user-defined function in Pig
Analyzing web log data using Pig
Performing the Hbase operation in CLI
Performing Hbase operations in Java
Executing the MapReduce programming with an Hbase Table
Chapter 5. Advanced Data Analysis Using Hive
Processing JSON data in Hive using JSON SerDe
Processing XML data in Hive using XML SerDe
Processing Hive data in the Avro format
Writing a user-defined function in Hive
Performing table joins in Hive
Executing map side joins in Hive
Performing context Ngram in Hive
Call Data Record Analytics using Hive
Twitter sentiment analysis using Hive
Implementing Change Data Capture using Hive
Multiple table inserting using Hive
Chapter 6. Data Import/Export Using Sqoop and Flume
Importing data from RDMBS to HDFS using Sqoop
Exporting data from HDFS to RDBMS
Using query operator in Sqoop import
Importing data using Sqoop in compressed format
Performing Atomic export using Sqoop
Importing data into Hive tables using Sqoop
Importing data into HDFS from Mainframes
Incremental import using Sqoop
Creating and executing Sqoop job
Importing data from RDBMS to Hbase using Sqoop
Importing Twitter data into HDFS using Flume
Importing data from Kafka into HDFS using Flume
Importing web logs data into HDFS using Flume
Chapter 7. Automation of Hadoop Tasks Using Oozie
Implementing a Sqoop action job using Oozie
Implementing a Map Reduce action job using Oozie
Implementing a Java action job using Oozie
Implementing a Hive action job using Oozie
Implementing a Pig action job using Oozie
Implementing an e-mail action job using Oozie
Executing parallel jobs using Oozie (fork)
Scheduling a job in Oozie
Chapter 8. Machine Learning and Predictive Analytics Using Mahout and R
Setting up the Mahout development environment
Creating an item-based recommendation engine using Mahout
Creating a user-based recommendation engine using Mahout
Predictive analytics on Bank Data using Mahout
Text data clustering using K-Means using Mahout