出版社: O'Reilly Media
副标题: Straight Talk from the Frontline
出版年: 20131030
页数: 352
定价: USD 44.99
装帧: Paperback
ISBN: 9781449358655
内容简介 · · · · · ·
Now that answering complex and compelling questions with data can make the difference in an election or a business model, data science is an attractive discipline. But how can you learn this wideranging, interdisciplinary field? With this book, you’ll get material from Columbia University’s "Introduction to Data Science" class in an easytofollow format.
Each chapterlong lec...
Now that answering complex and compelling questions with data can make the difference in an election or a business model, data science is an attractive discipline. But how can you learn this wideranging, interdisciplinary field? With this book, you’ll get material from Columbia University’s "Introduction to Data Science" class in an easytofollow format.
Each chapterlong lecture features a guest data scientist from a prominent company such as Google, Microsoft, or eBay teaching new algorithms, methods, or models by sharing case studies and actual code they use. You’ll learn what’s involved in the lives of data scientists and be able to use the techniques they present.
Guest lectures focus on topics such as:
Machine learning and data mining algorithms
Statistical models and methods
Prediction vs. description
Exploratory data analysis
Communication and visualization
Data processing
Big data
Programming
Ethics
Asking good questions
If you’re familiar with linear algebra, probability and statistics, and have some programming experience, this book will get you started with data science.
Doing Data Science is collaboration between course instructor Rachel Schutt (also employed by Google) and data science consultant Cathy O’Neil (former quantitative analyst for D.E. Shaw) who attended and blogged about the course.
作者简介 · · · · · ·
Cathy O’Neil earned a Ph.D. in math from Harvard, was postdoc at the MIT math department, and a professor at Barnard College where she published a number of research papers in arithmetic algebraic geometry. She then chucked it and switched over to the private sector. She worked as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis, and then for RiskMetrics,...
Cathy O’Neil earned a Ph.D. in math from Harvard, was postdoc at the MIT math department, and a professor at Barnard College where she published a number of research papers in arithmetic algebraic geometry. She then chucked it and switched over to the private sector. She worked as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis, and then for RiskMetrics, a risk software company that assesses risk for the holdings of hedge funds and banks. She is currently a data scientist on the New York startup scene, writes a blog at mathbabe.org, and is involved with Occupy Wall Street.
Rachel Schutt is a Senior Research Scientist at Johnson Research Labs, and most recently was a Senior Statistician at Google Research in the New York office. She is also an adjunct assistant professor in the Department of Statistics at Columbia University where she taught Introduction to Data Science. She earned a PhD from Columbia University in statistics, and masters degrees in mathematics and operations research from the Courant Institute and Stanford University, respectively. Her statistical research interests include modeling and analyzing social networks, epidemiology, hierarchical modeling and Bayesian statistics. Her educationrelated research interests include curriculum design.
Rachel enjoys designing and creating complex, thoughtprovoking situations for other people. She won the Howard Levene Outstanding Teaching Award at Columbia and also taught probability and statistics at Cooper Union, and remedial math as a high school teacher in San Jose, CA. She was a mathematics curriculum expert for the Princeton Review, and won a game design award for best family game at the Come Out and Play Festival in New York.
原文摘录 · · · · · ·
喜欢读"Doing Data Science"的人也喜欢的电子书 · · · · · ·
喜欢读"Doing Data Science"的人也喜欢 · · · · · ·
Doing Data Science的话题 · · · · · · ( 全部 条 )
Doing Data Science的书评 · · · · · · ( 全部 4 条 )
Doing Data Science
> 更多书评 4篇
读书笔记 · · · · · ·
我来写笔记
YellowOrgans (礼岂为我辈设也？)

Being humanist in the context of data science means recognizing the role your own humanity plays in building models and algorithms, thinking about qualities you have as a human that a computer does not have (which includes the ability to make ethical decisions), and thinking about the humans whose lives you are impacting when you unleash a model onto the world.
20131123 21:55

panco (啥)
Linear Regression: 1. Concepts 2. In R: lm(y ~ x) 3. Adding errors 4. Evaluation metric: Rsquared, pvalue, crossvalidation 5. Assumptions: 5.1 Linearity 5.2 Errors normally distributed with mean 0 5.3 Errors are independent 5.4 Errors have constant variance 5.5 The predictors are the right ones kNN: 1. Concepts 2. Processes 3. Determining similarity： Cosine Similarity, Jaccard Similarity, M...20131120 00:04
Linear Regression: 1. Concepts 2. In R: lm(y ~ x) 3. Adding errors 4. Evaluation metric: Rsquared, pvalue, crossvalidation 5. Assumptions: 5.1 Linearity 5.2 Errors normally distributed with mean 0 5.3 Errors are independent 5.4 Errors have constant variance 5.5 The predictors are the right ones kNN: 1. Concepts 2. Processes 3. Determining similarity： Cosine Similarity, Jaccard Similarity, Mahalanobis Distance, Hamming Distance, Manhattan 4. Evaluation Metric: sensitivity, specificity, precision, and accuracy 5. In R: knn(train, test, cl, k) 6. Assumption: 6.1 Data is in the feature space where "distance" makes sense 6.2 Training data is classified into 2+ classes kmeans: 1. Processes 2. Issues: choosing k, solution may not exist, the answer doesn't make sense 3. In R: kmeans(x, centers, iter.max, nstart, algorithm)
回应 20131120 00:04 
panco (啥)
Organization: Chap1: introduction of data science Chap23: Overview of statistics modeling and machine learning algorithms as a foundation for the rest of the book Chap46, 8: Specific examples of models and algorithms in context Chap7: Extract meaning from data and create features to incorporate in models Chap9, 10: Data visualization and social networks Chap11, 12: Causality analysis Chap13, ...20131103 13:03
Organization: Chap1: introduction of data science Chap23: Overview of statistics modeling and machine learning algorithms as a foundation for the rest of the book Chap46, 8: Specific examples of models and algorithms in context Chap7: Extract meaning from data and create features to incorporate in models Chap9, 10: Data visualization and social networks Chap11, 12: Causality analysis Chap13, 14: Data preparation and engineering Chap15: Students' feedbacks Chap16: The future of data science  BTW, the supplemental reading list can be a good source or map of the subject.
回应 20131103 13:03

YellowOrgans (礼岂为我辈设也？)

Being humanist in the context of data science means recognizing the role your own humanity plays in building models and algorithms, thinking about qualities you have as a human that a computer does not have (which includes the ability to make ethical decisions), and thinking about the humans whose lives you are impacting when you unleash a model onto the world.
20131123 21:55

panco (啥)
Linear Regression: 1. Concepts 2. In R: lm(y ~ x) 3. Adding errors 4. Evaluation metric: Rsquared, pvalue, crossvalidation 5. Assumptions: 5.1 Linearity 5.2 Errors normally distributed with mean 0 5.3 Errors are independent 5.4 Errors have constant variance 5.5 The predictors are the right ones kNN: 1. Concepts 2. Processes 3. Determining similarity： Cosine Similarity, Jaccard Similarity, M...20131120 00:04
Linear Regression: 1. Concepts 2. In R: lm(y ~ x) 3. Adding errors 4. Evaluation metric: Rsquared, pvalue, crossvalidation 5. Assumptions: 5.1 Linearity 5.2 Errors normally distributed with mean 0 5.3 Errors are independent 5.4 Errors have constant variance 5.5 The predictors are the right ones kNN: 1. Concepts 2. Processes 3. Determining similarity： Cosine Similarity, Jaccard Similarity, Mahalanobis Distance, Hamming Distance, Manhattan 4. Evaluation Metric: sensitivity, specificity, precision, and accuracy 5. In R: knn(train, test, cl, k) 6. Assumption: 6.1 Data is in the feature space where "distance" makes sense 6.2 Training data is classified into 2+ classes kmeans: 1. Processes 2. Issues: choosing k, solution may not exist, the answer doesn't make sense 3. In R: kmeans(x, centers, iter.max, nstart, algorithm)
回应 20131120 00:04 
panco (啥)
Organization: Chap1: introduction of data science Chap23: Overview of statistics modeling and machine learning algorithms as a foundation for the rest of the book Chap46, 8: Specific examples of models and algorithms in context Chap7: Extract meaning from data and create features to incorporate in models Chap9, 10: Data visualization and social networks Chap11, 12: Causality analysis Chap13, ...20131103 13:03
Organization: Chap1: introduction of data science Chap23: Overview of statistics modeling and machine learning algorithms as a foundation for the rest of the book Chap46, 8: Specific examples of models and algorithms in context Chap7: Extract meaning from data and create features to incorporate in models Chap9, 10: Data visualization and social networks Chap11, 12: Causality analysis Chap13, 14: Data preparation and engineering Chap15: Students' feedbacks Chap16: The future of data science  BTW, the supplemental reading list can be a good source or map of the subject.
回应 20131103 13:03

YellowOrgans (礼岂为我辈设也？)

Being humanist in the context of data science means recognizing the role your own humanity plays in building models and algorithms, thinking about qualities you have as a human that a computer does not have (which includes the ability to make ethical decisions), and thinking about the humans whose lives you are impacting when you unleash a model onto the world.
20131123 21:55

panco (啥)
Linear Regression: 1. Concepts 2. In R: lm(y ~ x) 3. Adding errors 4. Evaluation metric: Rsquared, pvalue, crossvalidation 5. Assumptions: 5.1 Linearity 5.2 Errors normally distributed with mean 0 5.3 Errors are independent 5.4 Errors have constant variance 5.5 The predictors are the right ones kNN: 1. Concepts 2. Processes 3. Determining similarity： Cosine Similarity, Jaccard Similarity, M...20131120 00:04
Linear Regression: 1. Concepts 2. In R: lm(y ~ x) 3. Adding errors 4. Evaluation metric: Rsquared, pvalue, crossvalidation 5. Assumptions: 5.1 Linearity 5.2 Errors normally distributed with mean 0 5.3 Errors are independent 5.4 Errors have constant variance 5.5 The predictors are the right ones kNN: 1. Concepts 2. Processes 3. Determining similarity： Cosine Similarity, Jaccard Similarity, Mahalanobis Distance, Hamming Distance, Manhattan 4. Evaluation Metric: sensitivity, specificity, precision, and accuracy 5. In R: knn(train, test, cl, k) 6. Assumption: 6.1 Data is in the feature space where "distance" makes sense 6.2 Training data is classified into 2+ classes kmeans: 1. Processes 2. Issues: choosing k, solution may not exist, the answer doesn't make sense 3. In R: kmeans(x, centers, iter.max, nstart, algorithm)
回应 20131120 00:04 
panco (啥)
Organization: Chap1: introduction of data science Chap23: Overview of statistics modeling and machine learning algorithms as a foundation for the rest of the book Chap46, 8: Specific examples of models and algorithms in context Chap7: Extract meaning from data and create features to incorporate in models Chap9, 10: Data visualization and social networks Chap11, 12: Causality analysis Chap13, ...20131103 13:03
Organization: Chap1: introduction of data science Chap23: Overview of statistics modeling and machine learning algorithms as a foundation for the rest of the book Chap46, 8: Specific examples of models and algorithms in context Chap7: Extract meaning from data and create features to incorporate in models Chap9, 10: Data visualization and social networks Chap11, 12: Causality analysis Chap13, 14: Data preparation and engineering Chap15: Students' feedbacks Chap16: The future of data science  BTW, the supplemental reading list can be a good source or map of the subject.
回应 20131103 13:03
论坛 · · · · · ·
话说这本书就是那个线上课程的纸质版本  来自panco  2 回应  20131105 
这本书的其他版本 · · · · · · ( 全部2 )

人民邮电出版社版（2015）8.2分 156人读过
以下豆列推荐 · · · · · · ( 全部 )
 开智书友会书单 (开智学堂)
 data science (今天_晴)
 学习BigData (视界)
 ML&IR&DA (神雕侠觅侣)
 我偶然看到…… (明小生)
谁读这本书?
二手市场
订阅关于Doing Data Science的评论:
feed: rss 2.0
0 有用 beren 20160405
各种data scientist出来现身说法讲经验，挺受益的
0 有用 菊 20151022
读了中文版，对数据的理解比较有意思
1 有用 阿道克 20141223
一本400页的书，讲明白data science，勉为其难啊。不过总得有人给数据科学作为一个完整的主题开个著书立说的头不是。
0 有用 F  GTI 20190710
这本够科普扫盲了。
0 有用 蝉 20131209
:无
0 有用 安吉 20200702
大学的时候请过Cathy ONeil给我们讲课
0 有用 辣条 20190927
很好的入门书
0 有用 F  GTI 20190710
这本够科普扫盲了。
0 有用 Bodhin 20170716
很多地方都讲到了，语言也很简练，易理解
0 有用 YZ 20170224
"Data scientists should become problem solvers and question askers, to think deeply about appropriate design and process, and to use data responsibly and make the world better, not worse. "