出版社: O'Reilly Media
副标题: Straight Talk from the Frontline
出版年: 20131030
页数: 352
定价: USD 44.99
装帧: Paperback
ISBN: 9781449358655
内容简介 · · · · · ·
Now that answering complex and compelling questions with data can make the difference in an election or a business model, data science is an attractive discipline. But how can you learn this wideranging, interdisciplinary field? With this book, you’ll get material from Columbia University’s "Introduction to Data Science" class in an easytofollow format.
Each chapterlong lec...
Now that answering complex and compelling questions with data can make the difference in an election or a business model, data science is an attractive discipline. But how can you learn this wideranging, interdisciplinary field? With this book, you’ll get material from Columbia University’s "Introduction to Data Science" class in an easytofollow format.
Each chapterlong lecture features a guest data scientist from a prominent company such as Google, Microsoft, or eBay teaching new algorithms, methods, or models by sharing case studies and actual code they use. You’ll learn what’s involved in the lives of data scientists and be able to use the techniques they present.
Guest lectures focus on topics such as:
Machine learning and data mining algorithms
Statistical models and methods
Prediction vs. description
Exploratory data analysis
Communication and visualization
Data processing
Big data
Programming
Ethics
Asking good questions
If you’re familiar with linear algebra, probability and statistics, and have some programming experience, this book will get you started with data science.
Doing Data Science is collaboration between course instructor Rachel Schutt (also employed by Google) and data science consultant Cathy O’Neil (former quantitative analyst for D.E. Shaw) who attended and blogged about the course.
作者简介 · · · · · ·
Cathy O’Neil earned a Ph.D. in math from Harvard, was postdoc at the MIT math department, and a professor at Barnard College where she published a number of research papers in arithmetic algebraic geometry. She then chucked it and switched over to the private sector. She worked as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis, and then for RiskMetrics,...
Cathy O’Neil earned a Ph.D. in math from Harvard, was postdoc at the MIT math department, and a professor at Barnard College where she published a number of research papers in arithmetic algebraic geometry. She then chucked it and switched over to the private sector. She worked as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis, and then for RiskMetrics, a risk software company that assesses risk for the holdings of hedge funds and banks. She is currently a data scientist on the New York startup scene, writes a blog at mathbabe.org, and is involved with Occupy Wall Street.
Rachel Schutt is a Senior Research Scientist at Johnson Research Labs, and most recently was a Senior Statistician at Google Research in the New York office. She is also an adjunct assistant professor in the Department of Statistics at Columbia University where she taught Introduction to Data Science. She earned a PhD from Columbia University in statistics, and masters degrees in mathematics and operations research from the Courant Institute and Stanford University, respectively. Her statistical research interests include modeling and analyzing social networks, epidemiology, hierarchical modeling and Bayesian statistics. Her educationrelated research interests include curriculum design.
Rachel enjoys designing and creating complex, thoughtprovoking situations for other people. She won the Howard Levene Outstanding Teaching Award at Columbia and also taught probability and statistics at Cooper Union, and remedial math as a high school teacher in San Jose, CA. She was a mathematics curriculum expert for the Princeton Review, and won a game design award for best family game at the Come Out and Play Festival in New York.
喜欢读"Doing Data Science"的人也喜欢的电子书 · · · · · ·
喜欢读"Doing Data Science"的人也喜欢 · · · · · ·
Doing Data Science的话题 · · · · · · ( 全部 条 )
Doing Data Science的书评 · · · · · · ( 全部 4 条 )
Doing Data Science
> 更多书评4篇
读书笔记 · · · · · ·
我来写笔记
QzMxOUVDOEFFRU (喂，你挡住我晒太阳了！)
Being humanist in the context of data science means recognizing the role your own humanity plays in building models and algorithms, thinking about qualities you have as a human that a computer does not have (which includes the ability to make ethical decisions), and thinking about the humans whose lives you are impacting when you unleash a model onto the world.20131123 21:55
Being humanist in the context of data science means recognizing the role your own humanity plays in building models and algorithms, thinking about qualities you have as a human that a computer does not have (which includes the ability to make ethical decisions), and thinking about the humans whose lives you are impacting when you unleash a model onto the world.
回应 20131123 21:55 
panco (啥)
Linear Regression: 1. Concepts 2. In R: lm(y ~ x) 3. Adding errors 4. Evaluation metric: Rsquared, pvalue, crossvalidation 5. Assumptions: 5.1 Linearity 5.2 Errors normally distributed with mean 0 5.3 Errors are independent 5.4 Errors have constant variance 5.5 The predictors are the right ones kNN: 1. Concepts 2. Processes 3. Determining similarityï¼š Cosine Similarity, Jaccard ...20131120 00:04
Linear Regression:1. Concepts2. In R: lm(y ~ x)3. Adding errors4. Evaluation metric: Rsquared, pvalue, crossvalidation5. Assumptions:5.1 Linearity5.2 Errors normally distributed with mean 05.3 Errors are independent5.4 Errors have constant variance5.5 The predictors are the right oneskNN:1. Concepts2. Processes3. Determining similarity： Cosine Similarity, Jaccard Similarity, Mahalanobis Distance, Hamming Distance, Manhattan4. Evaluation Metric: sensitivity, specificity, precision, and accuracy5. In R: knn(train, test, cl, k)6. Assumption:6.1 Data is in the feature space where "distance" makes sense6.2 Training data is classified into 2+ classeskmeans:1. Processes2. Issues: choosing k, solution may not exist, the answer doesn't make sense3. In R: kmeans(x, centers, iter.max, nstart, algorithm)回应 20131120 00:04 
panco (啥)
Organization: Chap1: introduction of data science Chap23: Overview of statistics modeling and machine learning algorithms as a foundation for the rest of the book Chap46, 8: Specific examples of models and algorithms in context Chap7: Extract meaning from data and create features to incorporate in models Chap9, 10: Data visualization and social networks Chap11, 12: Causality an...20131103 13:03
Organization:Chap1: introduction of data scienceChap23: Overview of statistics modeling and machine learning algorithms as a foundation for the rest of the bookChap46, 8: Specific examples of models and algorithms in contextChap7: Extract meaning from data and create features to incorporate in modelsChap9, 10: Data visualization and social networksChap11, 12: Causality analysisChap13, 14: Data preparation and engineeringChap15: Students' feedbacksChap16: The future of data scienceBTW, the supplemental reading list can be a good source or map of the subject.回应 20131103 13:03

QzMxOUVDOEFFRU (喂，你挡住我晒太阳了！)
Being humanist in the context of data science means recognizing the role your own humanity plays in building models and algorithms, thinking about qualities you have as a human that a computer does not have (which includes the ability to make ethical decisions), and thinking about the humans whose lives you are impacting when you unleash a model onto the world.20131123 21:55
Being humanist in the context of data science means recognizing the role your own humanity plays in building models and algorithms, thinking about qualities you have as a human that a computer does not have (which includes the ability to make ethical decisions), and thinking about the humans whose lives you are impacting when you unleash a model onto the world.
回应 20131123 21:55 
panco (啥)
Linear Regression: 1. Concepts 2. In R: lm(y ~ x) 3. Adding errors 4. Evaluation metric: Rsquared, pvalue, crossvalidation 5. Assumptions: 5.1 Linearity 5.2 Errors normally distributed with mean 0 5.3 Errors are independent 5.4 Errors have constant variance 5.5 The predictors are the right ones kNN: 1. Concepts 2. Processes 3. Determining similarityï¼š Cosine Similarity, Jaccard ...20131120 00:04
Linear Regression:1. Concepts2. In R: lm(y ~ x)3. Adding errors4. Evaluation metric: Rsquared, pvalue, crossvalidation5. Assumptions:5.1 Linearity5.2 Errors normally distributed with mean 05.3 Errors are independent5.4 Errors have constant variance5.5 The predictors are the right oneskNN:1. Concepts2. Processes3. Determining similarity： Cosine Similarity, Jaccard Similarity, Mahalanobis Distance, Hamming Distance, Manhattan4. Evaluation Metric: sensitivity, specificity, precision, and accuracy5. In R: knn(train, test, cl, k)6. Assumption:6.1 Data is in the feature space where "distance" makes sense6.2 Training data is classified into 2+ classeskmeans:1. Processes2. Issues: choosing k, solution may not exist, the answer doesn't make sense3. In R: kmeans(x, centers, iter.max, nstart, algorithm)回应 20131120 00:04 
panco (啥)
Organization: Chap1: introduction of data science Chap23: Overview of statistics modeling and machine learning algorithms as a foundation for the rest of the book Chap46, 8: Specific examples of models and algorithms in context Chap7: Extract meaning from data and create features to incorporate in models Chap9, 10: Data visualization and social networks Chap11, 12: Causality an...20131103 13:03
Organization:Chap1: introduction of data scienceChap23: Overview of statistics modeling and machine learning algorithms as a foundation for the rest of the bookChap46, 8: Specific examples of models and algorithms in contextChap7: Extract meaning from data and create features to incorporate in modelsChap9, 10: Data visualization and social networksChap11, 12: Causality analysisChap13, 14: Data preparation and engineeringChap15: Students' feedbacksChap16: The future of data scienceBTW, the supplemental reading list can be a good source or map of the subject.回应 20131103 13:03

QzMxOUVDOEFFRU (喂，你挡住我晒太阳了！)
Being humanist in the context of data science means recognizing the role your own humanity plays in building models and algorithms, thinking about qualities you have as a human that a computer does not have (which includes the ability to make ethical decisions), and thinking about the humans whose lives you are impacting when you unleash a model onto the world.20131123 21:55
Being humanist in the context of data science means recognizing the role your own humanity plays in building models and algorithms, thinking about qualities you have as a human that a computer does not have (which includes the ability to make ethical decisions), and thinking about the humans whose lives you are impacting when you unleash a model onto the world.
回应 20131123 21:55 
panco (啥)
Linear Regression: 1. Concepts 2. In R: lm(y ~ x) 3. Adding errors 4. Evaluation metric: Rsquared, pvalue, crossvalidation 5. Assumptions: 5.1 Linearity 5.2 Errors normally distributed with mean 0 5.3 Errors are independent 5.4 Errors have constant variance 5.5 The predictors are the right ones kNN: 1. Concepts 2. Processes 3. Determining similarityï¼š Cosine Similarity, Jaccard ...20131120 00:04
Linear Regression:1. Concepts2. In R: lm(y ~ x)3. Adding errors4. Evaluation metric: Rsquared, pvalue, crossvalidation5. Assumptions:5.1 Linearity5.2 Errors normally distributed with mean 05.3 Errors are independent5.4 Errors have constant variance5.5 The predictors are the right oneskNN:1. Concepts2. Processes3. Determining similarity： Cosine Similarity, Jaccard Similarity, Mahalanobis Distance, Hamming Distance, Manhattan4. Evaluation Metric: sensitivity, specificity, precision, and accuracy5. In R: knn(train, test, cl, k)6. Assumption:6.1 Data is in the feature space where "distance" makes sense6.2 Training data is classified into 2+ classeskmeans:1. Processes2. Issues: choosing k, solution may not exist, the answer doesn't make sense3. In R: kmeans(x, centers, iter.max, nstart, algorithm)回应 20131120 00:04 
panco (啥)
Organization: Chap1: introduction of data science Chap23: Overview of statistics modeling and machine learning algorithms as a foundation for the rest of the book Chap46, 8: Specific examples of models and algorithms in context Chap7: Extract meaning from data and create features to incorporate in models Chap9, 10: Data visualization and social networks Chap11, 12: Causality an...20131103 13:03
Organization:Chap1: introduction of data scienceChap23: Overview of statistics modeling and machine learning algorithms as a foundation for the rest of the bookChap46, 8: Specific examples of models and algorithms in contextChap7: Extract meaning from data and create features to incorporate in modelsChap9, 10: Data visualization and social networksChap11, 12: Causality analysisChap13, 14: Data preparation and engineeringChap15: Students' feedbacksChap16: The future of data scienceBTW, the supplemental reading list can be a good source or map of the subject.回应 20131103 13:03
论坛 · · · · · ·
话说这本书就是那个线上课程的纸质版本  来自panco  2 回应  20131105 
这本书的其他版本 · · · · · · ( 全部2 )
 人民邮电出版社版 20153 / 131人读过 / 有售
以下豆列推荐 · · · · · · ( 全部 )
 开智书友会书单 (开智学堂)
 data science (今天_晴)
 学习BigData (视界)
 ML&IR&DA (神雕侠觅侣)
 我偶然看到…… (明小生)
谁读这本书?
二手市场
订阅关于Doing Data Science的评论:
feed: rss 2.0
0 有用 Bing 20140101
不属于技术类硬货书，给的几个片段代码还有bug。像是大牛访谈回忆录。好几个章节都值得一读再读，洗脑！审校不太严格，读下来记忆里有十来处typo。
0 有用 ww 20150903
尝试定义data science和data scientist 简要介绍数据科学的技能和应用领域
1 有用 大啸 20150807
什么都有 扫一遍可以查漏补缺 我的问题是缺乏代码和可视化 plus #治愈失眠无效#
0 有用 Selaginella 20131106
基本翻了一遍。想在这么薄一本书里把doing data science的思想、方法和主要应用方面说清楚很难，只能做到提纲挈领，适合想初步了解数据分析的人。着重看了生物统计的几章，点出实验设计和难以重复的弊病，以及极为中肯的3条核心建议（305页）。但是想知道具体每个领域怎么操作实在不够，需要看更针对的书籍。最后实在感谢船长老师的书！
1 有用 cc 20150811
给商学院的教材，案例多，模型讲的少，太简单
0 有用 Bodhin 20170716
很多地方都讲到了，语言也很简练，易理解
0 有用 syzdemonhunter 20170224
"Data scientists should become problem solvers and question askers, to think deeply about appropriate design and process, and to use data responsibly and make the world better, not worse. "
0 有用 来自徳勒姆市 20161024
算是大概了解了DS是做什么的了，作为入门级的书还是不错的，但看完之后深深怀疑自己能否胜任，特别是CS这一块。。。
0 有用 beren 20160405
各种data scientist出来现身说法讲经验，挺受益的
0 有用 在电脑前打喷嚏 20160713
《数据科学实战》的原版，语言很棒，可参考写ps😌