更新时间:2021-08-20 10:27:33
封面
版权页
Credits
About the Author
About the Reviewers
www.PacktPub.com
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Chapter 1. Installing Spark and Setting Up Your Cluster
Directory organization and convention
Installing the prebuilt distribution
Building Spark from source
Spark topology
A single machine
Running Spark on EC2
Deploying Spark with Chef (Opscode)
Deploying Spark on Mesos
Spark on YARN
Spark standalone mode
References
Summary
Chapter 2. Using the Spark Shell
The Spark shell
Loading a simple text file
Interactively loading data from S3
Chapter 3. Building and Running a Spark Application
Building Spark applications
Data wrangling with iPython
Developing Spark with Eclipse
Developing Spark with other IDEs
Building your Spark job with Maven
Building your Spark job with something else
Chapter 4. Creating a SparkSession Object
SparkSession versus SparkContext
Building a SparkSession object
SparkContext - metadata
Shared Java and Scala APIs
Python
iPython
Reference
Chapter 5. Loading and Saving Data in Spark
Spark abstractions
Data modalities
Data modalities and Datasets/DataFrames/RDDs
Loading data into an RDD
Saving your data
Chapter 6. Manipulating Your RDD
Manipulating your RDD in Scala and Java
Manipulating your RDD in Python
Chapter 7. Spark 2.0 Concepts
Code and Datasets for the rest of the book
The data scientist and Spark features
Spark v2.0 and beyond
Apache Spark - evolution
Apache Spark - the full stack
The art of a big data store - Parquet
Chapter 8. Spark SQL
The Spark SQL architecture
Spark SQL how-to in a nutshell
Spark SQL programming
Chapter 9. Foundations of Datasets/DataFrames – The Proverbial Workhorse for DataScientists
Datasets - a quick introduction
Dataset APIs - an overview
Dataset interfaces and functions
Chapter 10. Spark with Big Data
Parquet - an efficient and interoperable big data format
HBase
Chapter 11. Machine Learning with Spark ML Pipelines
Spark's machine learning algorithm table
Spark machine learning APIs - ML pipelines and MLlib
ML pipelines
Spark ML examples
The API organization
Basic statistics
Linear regression
Classification
Clustering
Recommendation