作者:
[英]
Nick Penteeath
出版社: 东南大学出版社
原作名: Machine Learning With Spark
出版年: 2016-1-1
页数: 319
定价: 68.00元
装帧: 平装
丛书: Packt Publishing 影印版丛书
ISBN: 9787564160913
出版社: 东南大学出版社
原作名: Machine Learning With Spark
出版年: 2016-1-1
页数: 319
定价: 68.00元
装帧: 平装
丛书: Packt Publishing 影印版丛书
ISBN: 9787564160913
豆瓣评分
目前无人评价
内容简介 · · · · · ·
Apache spark是一款全新开发的分布式框架,特别对低延迟任务和内存数据存储进行了优化。它结合了速度、可扩展性、内存处理以及容错性,是极少数适用于并行计算的框架之一,同时还非常易于编程,拥有一套灵活、表达能力丰富、功能强大的API设计。
《Spark机器学习(影印版 英文版)》指导你学习用于载入及处理数据的spark APl的基础知识,以及如何为各种机器学习模型准备适合的输入数据:另有详细的例子和实际生活中的真实案例来帮助你学习包括推荐系统、分类、回归、聚类、降维在内的常见机器学习模型,你还会看到如大规模文本处理之类的高级主题、在线机器学习的相关方法以及使用spa rk st reami ng进行模型评估。
作者简介 · · · · · ·
Nick Pentreath
是Graphflow公司联合创始人。Graphflow是一家大数据和机器学习公司,专注于以用户为中心的推荐系统和客户服务智能化技术。Nick拥有金融市场、机器学习和软件开发背景,曾任职于高盛集团,之后去在线广告营销创业公司Cognitive Match Limited(伦敦)担任研究科学家,后又去非洲最大的社交网络Mxit领导数据科学与分析团队。Nick是Apache Spark项目管理委员会成员之一。
目录 · · · · · ·
Preface
Chapter 1: Getting Up and Running with Spark
Installing and setting up Spark locally
Spark clusters
The Spark programming model
SparkContext and SparkConf
· · · · · · (更多)
Chapter 1: Getting Up and Running with Spark
Installing and setting up Spark locally
Spark clusters
The Spark programming model
SparkContext and SparkConf
· · · · · · (更多)
Preface
Chapter 1: Getting Up and Running with Spark
Installing and setting up Spark locally
Spark clusters
The Spark programming model
SparkContext and SparkConf
The Spark shell
Resilient Distributed Datasets
Creating RDDs
Spark operations
Caching RDDs
Broadcast variables and accumulators
The first step to a Spark program in Scala
The first step to a Spark program in Java
The first step to a Spark program in Python
Getting Spark running on Amazon EC2
Launching an EC2 Spark cluster
Summary
Chapter 2: Designing a Machine Learning System
Introducing MovieStream
Business use cases for a machine learning system
Personalization
Targeted marketing and customer segmentation
Predictive modeling and analytics
Types of machine learning models
The components of a data-driven machine learning system
Data ingestion and storage
Data cleansing and transformation
Model training and testing loop
Model deployment and integration
Model monitoring and feedback
Batch versus real time
An architecture for a machine learning system
Practical exercise
Summary
Chapter 3: Obtaining, Processing, and Preparing Data
with Spark
Accessing publicly available datasets
The MovieLens lOOk dataset
Exploring and visualizing your data
Exploring the user dataset
Exploring the movie dataset
Exploring the rating dataset
Processing and transforming your data
Filling in bad or missing data
Extracting useful features from your data
Numerical features
Categorical features
Derived features
Transforming timestamps into categorical features
Text features
Simple text feature extraction
Normalizing features
Using MLlib for feature normalization
Using packages for feature extraction
Summary
Chapter 4: Building a Recommendation Engine with Spark
Types of recommendation models
Content-based filtering
Collaborative filtering
Matrix factorization
Extracting the right features from your data
Extracting features from the MovieLens 100k dataset
Training the recommendation model
Training a model on the MovieLens 100k dataset
Training a model using implicit feedback data
Using the recommendation model
User recommendations
Generating movie recommendations from the MovieLens 100k dataset
Item recommendations
Generating similar movies for the MovieLens 100k dataset
Evaluating the performance of recommendation models
Mean Squared Error
Mean average precision at K
Using MLlib's built-in evaluation functions
RMSE and MSE
MAP
Summary
Chapter 5: Building a Classification Model with Spark
Types of classification models
Linear models
Logistic regression
Linear support vector machines
The na'fve Bayes model
Decision trees
Extracting the right features from your data
Extracting features from the Kaggle/StumbleUpon
evergreen classification dataset
Training classification models
Training a classification model on the Kaggle/StumbleUpon
evergreen classification dataset
Using classification models
Generating predictions for the Kaggle/StumbleUpon
evergreen classification dataset
Evaluating the performance of classification models
Accuracy and prediction error
Precision and recall
ROC curve and AUC
Improving model performance and tuning parameters
Feature standardization
Additional features
Using the correct form of data
Tuning model parameters
Linear models
Decision trees
The na'fve Bayes model
Cross-validation
Summary
Chapter 6: Buildin a~ssion Model with Spark
Types of regression models
Least squares regression
Decision trees for regression
Extracting the right features from your data
Extracting features from the bike sharing dataset
Creating feature vectors for the linear model
Creating feature vectors for the decision tree
Training and using regression models
Training a regression model on the bike sharing dataset
Evaluating the performance of regression models
Mean Squared Error and Root Mean Squared Error
Mean Absolute Error
Root Mean Squared Log Error
The R-squared coefficient
Computing performance metrics on the bike sharing dataset
Linear model
Decision tree
Improving model performance and tuning parameters
Transforming the target variable
Impact of training on log-transformed targets
Tuning model parameters
Creating training and testing sets to evaluate parameters
The impact of parameter settings for linear models
The impact of parameter settings for the decision tree
Summary
Chapter 7: Building a Clustering Model with Spark
Types of clustering models
K-means clustering
Initialization methods
Variants
Mixture models
Hierarchical clustering
Extracting the right features from your data
Extracting features from the MovieLens dataset
Extracting movie genre labels
Training the recommendation model
Normalization
Training a clustering model
Training a clustering model on the MovieLens dataset
Making predictions using a clustering model
Interpreting cluster predictions on the MovieLens dataset
Interpreting the movie clusters
Evaluating the performance of clustering models
Internal evaluation metrics
External evaluation metrics
Computing performance metrics on the MovieLens dataset
Tuning parameters for clustering models
Selecting K through cross-validation
Summary
Chapter 8: Dimensionality Reduction with Spark
Types of dimensionality reduction
Principal Components Analysis
Singular Value Decomposition
Relationship with matrix factorization
Clustering as dimensionality reduction
Extracting the right features from your data
Extracting features from the LFW dataset
Exploring the face data
Visualizing the face data
Extracting facial images as vectors
Normalization
Training a dimensionality reduction model
Running PCA on the LFW dataset
Visualizing the Eigenfaces
Interpreting the Eigenfaces
Using a dimensionality reduction model
Projecting data using PCA on the LFW dataset
The relationship between PCA and SVD
Evaluating dimensionality reduction models
Evaluating k for SVD on the LFW dataset
Summary
Chapter 9: Advanced Text Processing with Spark
What's so special about text data?
Extracting the right features from your data
Term weighting schemes
Feature hashing
Extracting the TF-IDF features from the 20 Newsgroups dataset
Exploring the 20 Newsgroups data
Applying basic tokenization
Improving our tokenization
Removing stop words
Excluding terms based on frequency
A note about stemming
Training a TF-IDF model
Analyzing the TF-IDF weightings
Using a TF-IDF model
Document similarity with the 20 Newsgroups dataset and
TF-IDF features
Training a text classifier on the 20 Newsgroups dataset
using TF-IDF
Evaluating the impact of text processing
Comparing raw features with processed TF-IDF features on the
20 Newsgroups dataset
Word2Vec models
Word2Vec on the 20 Newsgroups dataset
Summary
Chapter 10: Real-time Machine Learning withSpark Streaming
Online learning
Stream processing
An introduction to Spark Streaming
Input sources
Transformations
Actions
Window operators
Caching and fault tolerance with Spark Streaming
Creating a Spark Streaming application
The producer application
Creating a basic streaming application
Streaming analytics
Stateful streaming
Online learning with Spark Streaming
Streaming regression
A simple streaming regression program
Creating a streaming data producer
Creating a streaming regression model
Streaming K-means
Online model evaluation
Comparing model performance with Spark Streaming
Summary
Index
· · · · · · (收起)
Chapter 1: Getting Up and Running with Spark
Installing and setting up Spark locally
Spark clusters
The Spark programming model
SparkContext and SparkConf
The Spark shell
Resilient Distributed Datasets
Creating RDDs
Spark operations
Caching RDDs
Broadcast variables and accumulators
The first step to a Spark program in Scala
The first step to a Spark program in Java
The first step to a Spark program in Python
Getting Spark running on Amazon EC2
Launching an EC2 Spark cluster
Summary
Chapter 2: Designing a Machine Learning System
Introducing MovieStream
Business use cases for a machine learning system
Personalization
Targeted marketing and customer segmentation
Predictive modeling and analytics
Types of machine learning models
The components of a data-driven machine learning system
Data ingestion and storage
Data cleansing and transformation
Model training and testing loop
Model deployment and integration
Model monitoring and feedback
Batch versus real time
An architecture for a machine learning system
Practical exercise
Summary
Chapter 3: Obtaining, Processing, and Preparing Data
with Spark
Accessing publicly available datasets
The MovieLens lOOk dataset
Exploring and visualizing your data
Exploring the user dataset
Exploring the movie dataset
Exploring the rating dataset
Processing and transforming your data
Filling in bad or missing data
Extracting useful features from your data
Numerical features
Categorical features
Derived features
Transforming timestamps into categorical features
Text features
Simple text feature extraction
Normalizing features
Using MLlib for feature normalization
Using packages for feature extraction
Summary
Chapter 4: Building a Recommendation Engine with Spark
Types of recommendation models
Content-based filtering
Collaborative filtering
Matrix factorization
Extracting the right features from your data
Extracting features from the MovieLens 100k dataset
Training the recommendation model
Training a model on the MovieLens 100k dataset
Training a model using implicit feedback data
Using the recommendation model
User recommendations
Generating movie recommendations from the MovieLens 100k dataset
Item recommendations
Generating similar movies for the MovieLens 100k dataset
Evaluating the performance of recommendation models
Mean Squared Error
Mean average precision at K
Using MLlib's built-in evaluation functions
RMSE and MSE
MAP
Summary
Chapter 5: Building a Classification Model with Spark
Types of classification models
Linear models
Logistic regression
Linear support vector machines
The na'fve Bayes model
Decision trees
Extracting the right features from your data
Extracting features from the Kaggle/StumbleUpon
evergreen classification dataset
Training classification models
Training a classification model on the Kaggle/StumbleUpon
evergreen classification dataset
Using classification models
Generating predictions for the Kaggle/StumbleUpon
evergreen classification dataset
Evaluating the performance of classification models
Accuracy and prediction error
Precision and recall
ROC curve and AUC
Improving model performance and tuning parameters
Feature standardization
Additional features
Using the correct form of data
Tuning model parameters
Linear models
Decision trees
The na'fve Bayes model
Cross-validation
Summary
Chapter 6: Buildin a~ssion Model with Spark
Types of regression models
Least squares regression
Decision trees for regression
Extracting the right features from your data
Extracting features from the bike sharing dataset
Creating feature vectors for the linear model
Creating feature vectors for the decision tree
Training and using regression models
Training a regression model on the bike sharing dataset
Evaluating the performance of regression models
Mean Squared Error and Root Mean Squared Error
Mean Absolute Error
Root Mean Squared Log Error
The R-squared coefficient
Computing performance metrics on the bike sharing dataset
Linear model
Decision tree
Improving model performance and tuning parameters
Transforming the target variable
Impact of training on log-transformed targets
Tuning model parameters
Creating training and testing sets to evaluate parameters
The impact of parameter settings for linear models
The impact of parameter settings for the decision tree
Summary
Chapter 7: Building a Clustering Model with Spark
Types of clustering models
K-means clustering
Initialization methods
Variants
Mixture models
Hierarchical clustering
Extracting the right features from your data
Extracting features from the MovieLens dataset
Extracting movie genre labels
Training the recommendation model
Normalization
Training a clustering model
Training a clustering model on the MovieLens dataset
Making predictions using a clustering model
Interpreting cluster predictions on the MovieLens dataset
Interpreting the movie clusters
Evaluating the performance of clustering models
Internal evaluation metrics
External evaluation metrics
Computing performance metrics on the MovieLens dataset
Tuning parameters for clustering models
Selecting K through cross-validation
Summary
Chapter 8: Dimensionality Reduction with Spark
Types of dimensionality reduction
Principal Components Analysis
Singular Value Decomposition
Relationship with matrix factorization
Clustering as dimensionality reduction
Extracting the right features from your data
Extracting features from the LFW dataset
Exploring the face data
Visualizing the face data
Extracting facial images as vectors
Normalization
Training a dimensionality reduction model
Running PCA on the LFW dataset
Visualizing the Eigenfaces
Interpreting the Eigenfaces
Using a dimensionality reduction model
Projecting data using PCA on the LFW dataset
The relationship between PCA and SVD
Evaluating dimensionality reduction models
Evaluating k for SVD on the LFW dataset
Summary
Chapter 9: Advanced Text Processing with Spark
What's so special about text data?
Extracting the right features from your data
Term weighting schemes
Feature hashing
Extracting the TF-IDF features from the 20 Newsgroups dataset
Exploring the 20 Newsgroups data
Applying basic tokenization
Improving our tokenization
Removing stop words
Excluding terms based on frequency
A note about stemming
Training a TF-IDF model
Analyzing the TF-IDF weightings
Using a TF-IDF model
Document similarity with the 20 Newsgroups dataset and
TF-IDF features
Training a text classifier on the 20 Newsgroups dataset
using TF-IDF
Evaluating the impact of text processing
Comparing raw features with processed TF-IDF features on the
20 Newsgroups dataset
Word2Vec models
Word2Vec on the 20 Newsgroups dataset
Summary
Chapter 10: Real-time Machine Learning withSpark Streaming
Online learning
Stream processing
An introduction to Spark Streaming
Input sources
Transformations
Actions
Window operators
Caching and fault tolerance with Spark Streaming
Creating a Spark Streaming application
The producer application
Creating a basic streaming application
Streaming analytics
Stateful streaming
Online learning with Spark Streaming
Streaming regression
A simple streaming regression program
Creating a streaming data producer
Creating a streaming regression model
Streaming K-means
Online model evaluation
Comparing model performance with Spark Streaming
Summary
Index
· · · · · · (收起)
丛书信息
· · · · · ·
Packt Publishing 影印版丛书(共4册),
这套丛书还有
《Python 语言构建机器学习系统 第2版(影印版)》《精通R语言(影印版)》《学习Highcharts 4(影印版)》
。
Spark机器学习(影印版 英文版)的书评 · · · · · · ( 全部 5 条 )
Spark 2.x 机器学习实战(算法篇:基于Kaggle竞赛数据集,六大算法模型构建)
Spark 2.x 机器学习实战(算法篇:基于Kaggle竞赛数据集,六大算法模型构建) 百度网盘下载地址:https://pan.baidu.com/s/1UYHu1gqhqDfHacNAKH7Yvg 提取码: kzxw 备用地址(腾讯微云):https://share.weiyun.com/5fALwJu 密码:ih4u5s 本课程主要讲解基于Spark 2.x的机器学习...
(展开)
读Scala机器学习
这篇书评可能有关键情节透露
首先再次感谢大数据公众平台给我一次学习的机会,让我有幸能够读到《Scala机器学习》这本IT名著。在此我要预祝大数据公众号越办越好。 本人研究生智能计算及其应用方向,主修机器学习和人工智能,虽然自知智力有限,但仍不放弃对该领域的热爱。言归正传,浅谈我对这本书的感受... (展开)Spark机器学习视频
深入浅出Spark机器学习实战(用户行为分析) 课程观看地址:http://www.xuetuwuyou.com/course/144 课程出自学途无忧网:http://www.xuetuwuyou.com 一、课程目标 熟练掌握SparkSQL的各种操作,深入了解Spark内部实现原理 深入了解SparkML机器学习各种算法模型的构建和运行...
(展开)
> 更多书评 5篇
论坛 · · · · · ·
在这本书的论坛里发言这本书的其他版本 · · · · · · ( 全部3 )
-
人民邮电出版社 (2015)7.7分 91人读过
-
Packt Publishing - ebooks Account (2014)暂无评分 10人读过
谁读这本书? · · · · · ·
二手市场
· · · · · ·
- 在豆瓣转让 有2人想读,手里有一本闲着?
订阅关于Spark机器学习(影印版 英文版)的评论:
feed: rss 2.0
还没人写过短评呢