内容简介 · · · · · ·
为了帮助读者学习如何使用、部署和维护Apache Spark,该开源集群计算框架的部分创建者编写了本书这本综合指南。
本书作者比尔·钱伯斯和马太·扎哈里亚在强调Spark 2.0的改进和新功能的同时,将Spark题分为不同的部分,每个部分都有其独特的目标。
你将探索Spark的结构化API的基本操作和常见功能以及Structured Streaming,后者是用于构建端到端流应用的一种全新的高层API。开发人员和系统管理员会学Spark监控、调优、调试的基础知识,探索机器学习技术以及Spark可扩展机器学习库MLlib的部署场景。
目录 · · · · · ·
Preface
Part I.Gentle Overview of Big Data and Spark
1. What Is Apache Spark?
Apache Spark's Philosophy
Context: The Big Data Problem
History of Spark
· · · · · · (更多)
Part I.Gentle Overview of Big Data and Spark
1. What Is Apache Spark?
Apache Spark's Philosophy
Context: The Big Data Problem
History of Spark
· · · · · · (更多)
Preface
Part I.Gentle Overview of Big Data and Spark
1. What Is Apache Spark?
Apache Spark's Philosophy
Context: The Big Data Problem
History of Spark
The Present and Future of Spark
Running Spark
Downloading Spark Locally
Launching Spark's Interactive Consoles
Running Spark in the Cloud
Data Used in This Book
2. A Gentle Introduction to Spark
Spark's Basic Architecture
Spark Applications
Spark's Language APIs
Spark's APIs
Starting Spark
The SparkSession
DataFrames
Partitions
Transformations
Lazy Evaluation
Actions
Spark UI
An End-to-End Example
DataFrames and SQL
Conclusion
3. A Tour of Spark's Too1set
Running Production Applications
Datasets: Type-Safe Structured APIs
Structured Streaming
Machine Learning and Advanced Analytics
Lower-Level APIs
SparkR
Spark's Ecosystem and Packages
Conclusion
Part II.Structured APls——DataFrames, SQL, and Datasets
4. Structured API Overview
DataFrames and Datasets
Schemas
Overview of Structured Spark Types
DataFrames Versus Datasets
Columns
Rows
Spark Types
Overview of Structured API Execution
Logical Planning
Physical Planning
Execution
Conclusion
5. Basic Structured Operations
Schemas
Columns and Expressions
Columns
Expressions
Records and Rows
Creating Rows
DataFrame Transformations
Creating DataFrames
select and selectExpr
Converting to Spark Types (Literals)
Adding Columns
……
6.Working with Different Types of Data
7.Aggregations
8.Joins
9.Data Sources
10.Spark SQL
11.Datasets
Part III.Low—Level APIs
12.Resilient Distributed Datasets(RDDs)
13.Advanced RDDs
14.Distributed Shared Variables
Part IV.Production Applications
15.HowSparkRunson a Cluster
16.Developing Spark Applications
17.Deploying Spark
18.Monitoring and Debugging
19.Performance Tuning
Part V.Streaming
20.Stream Processing Fundamentals
21.Structured Streaming Basics
22.Event-Time and Stateful Processing
23.Structured Streaming in Production
Part VI.Advanced Analytics and Machine Learning
24.Advanced Analytics and Machine Learning Overview
25.Preprocessing and Feature Engineering
26.Classification
27.Regression
28.Recommendation
29.Unsupervised Learning
30.Graph Analytics
31.Deep Learning
Part VII.Ecosystem
32.Language Specifics:Python(PySpark)and R(SparkR and sparklyr)
33.Ecosystem and Community
Index
· · · · · · (收起)
Part I.Gentle Overview of Big Data and Spark
1. What Is Apache Spark?
Apache Spark's Philosophy
Context: The Big Data Problem
History of Spark
The Present and Future of Spark
Running Spark
Downloading Spark Locally
Launching Spark's Interactive Consoles
Running Spark in the Cloud
Data Used in This Book
2. A Gentle Introduction to Spark
Spark's Basic Architecture
Spark Applications
Spark's Language APIs
Spark's APIs
Starting Spark
The SparkSession
DataFrames
Partitions
Transformations
Lazy Evaluation
Actions
Spark UI
An End-to-End Example
DataFrames and SQL
Conclusion
3. A Tour of Spark's Too1set
Running Production Applications
Datasets: Type-Safe Structured APIs
Structured Streaming
Machine Learning and Advanced Analytics
Lower-Level APIs
SparkR
Spark's Ecosystem and Packages
Conclusion
Part II.Structured APls——DataFrames, SQL, and Datasets
4. Structured API Overview
DataFrames and Datasets
Schemas
Overview of Structured Spark Types
DataFrames Versus Datasets
Columns
Rows
Spark Types
Overview of Structured API Execution
Logical Planning
Physical Planning
Execution
Conclusion
5. Basic Structured Operations
Schemas
Columns and Expressions
Columns
Expressions
Records and Rows
Creating Rows
DataFrame Transformations
Creating DataFrames
select and selectExpr
Converting to Spark Types (Literals)
Adding Columns
……
6.Working with Different Types of Data
7.Aggregations
8.Joins
9.Data Sources
10.Spark SQL
11.Datasets
Part III.Low—Level APIs
12.Resilient Distributed Datasets(RDDs)
13.Advanced RDDs
14.Distributed Shared Variables
Part IV.Production Applications
15.HowSparkRunson a Cluster
16.Developing Spark Applications
17.Deploying Spark
18.Monitoring and Debugging
19.Performance Tuning
Part V.Streaming
20.Stream Processing Fundamentals
21.Structured Streaming Basics
22.Event-Time and Stateful Processing
23.Structured Streaming in Production
Part VI.Advanced Analytics and Machine Learning
24.Advanced Analytics and Machine Learning Overview
25.Preprocessing and Feature Engineering
26.Classification
27.Regression
28.Recommendation
29.Unsupervised Learning
30.Graph Analytics
31.Deep Learning
Part VII.Ecosystem
32.Language Specifics:Python(PySpark)and R(SparkR and sparklyr)
33.Ecosystem and Community
Index
· · · · · · (收起)
Spark权威指南(影印版)的书评 · · · · · · ( 全部 3 条 )
符合《Spark权威指南》的定义
这篇书评可能有关键情节透露
优点: 首先,本书正式出版2018.2月份,对比《hadoop权威指南》,都是性能调优没有讲的很深入,但是都提到了,数据倾斜,null问题,external shuffle service,dynamic allocation,甚至在spark程序中怎么根据log调试jvm内存等等基本上都提到了,包括DataFrame vs DataSet vs R... (展开)内容上乘,翻译中规中矩
目前在读中,4月份出的第一版,我当月就买来了,因为看重了是基于Spark2.0的书籍,目前国内还是很少的,自我感觉,内容上乘,翻译的中规中矩,目前看了前七章,前面还好,第七章翻译的质量一般还有些地方明显错误,后续看看咋样。 但是内容,对我这种只会初级使用者来说,很友...
(展开)
> 更多书评 3篇
论坛 · · · · · ·
在这本书的论坛里发言这本书的其他版本 · · · · · · ( 全部3 )
-
中国电力出版社 (2020)7.7分 35人读过
-
O'Reilly Media (2017)8.4分 78人读过
以下书单推荐 · · · · · · ( 全部 )
- spark (kevinkern)
谁读这本书? · · · · · ·
二手市场
· · · · · ·
- 在豆瓣转让 有18人想读,手里有一本闲着?
订阅关于Spark权威指南(影印版)的评论:
feed: rss 2.0
0 有用 z00c 2019-07-25 17:33:55
入门必备,序言中就综述了spark的发展历程及原因,明白spark所主要关注的问题。厘清了不少概念上的困扰。
0 有用 z00c 2019-07-25 17:33:55
入门必备,序言中就综述了spark的发展历程及原因,明白spark所主要关注的问题。厘清了不少概念上的困扰。