Now that answering complex and compelling questions with data can make the difference in an election or a business model, data science is an attractive discipline. But how can you learn this wide-ranging, interdisciplinary field? With this book, you’ll get material from Columbia University’s "Introduction to Data Science" class in an easy-to-follow format.
Now that answering complex and compelling questions with data can make the difference in an election or a business model, data science is an attractive discipline. But how can you learn this wide-ranging, interdisciplinary field? With this book, you’ll get material from Columbia University’s "Introduction to Data Science" class in an easy-to-follow format.
Each chapter-long lecture features a guest data scientist from a prominent company such as Google, Microsoft, or eBay teaching new algorithms, methods, or models by sharing case studies and actual code they use. You’ll learn what’s involved in the lives of data scientists and be able to use the techniques they present.
Guest lectures focus on topics such as:
Machine learning and data mining algorithms
Statistical models and methods
Prediction vs. description
Exploratory data analysis
Communication and visualization
Data processing
Big data
Programming
Ethics
Asking good questions
If you’re familiar with linear algebra, probability and statistics, and have some programming experience, this book will get you started with data science.
Doing Data Science is collaboration between course instructor Rachel Schutt (also employed by Google) and data science consultant Cathy O’Neil (former quantitative analyst for D.E. Shaw) who attended and blogged about the course.
作者简介
· · · · · ·
Cathy O’Neil earned a Ph.D. in math from Harvard, was postdoc at the MIT math department, and a professor at Barnard College where she published a number of research papers in arithmetic algebraic geometry. She then chucked it and switched over to the private sector. She worked as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis, and then for RiskMetrics,...
Cathy O’Neil earned a Ph.D. in math from Harvard, was postdoc at the MIT math department, and a professor at Barnard College where she published a number of research papers in arithmetic algebraic geometry. She then chucked it and switched over to the private sector. She worked as a quant for the hedge fund D.E. Shaw in the middle of the credit crisis, and then for RiskMetrics, a risk software company that assesses risk for the holdings of hedge funds and banks. She is currently a data scientist on the New York start-up scene, writes a blog at mathbabe.org, and is involved with Occupy Wall Street.
Rachel Schutt is a Senior Research Scientist at Johnson Research Labs, and most recently was a Senior Statistician at Google Research in the New York office. She is also an adjunct assistant professor in the Department of Statistics at Columbia University where she taught Introduction to Data Science. She earned a PhD from Columbia University in statistics, and masters degrees in mathematics and operations research from the Courant Institute and Stanford University, respectively. Her statistical research interests include modeling and analyzing social networks, epidemiology, hierarchical modeling and Bayesian statistics. Her education-related research interests include curriculum design.
Rachel enjoys designing and creating complex, thought-provoking situations for other people. She won the Howard Levene Outstanding Teaching Award at Columbia and also taught probability and statistics at Cooper Union, and remedial math as a high school teacher in San Jose, CA. She was a mathematics curriculum expert for the Princeton Review, and won a game design award for best family game at the Come Out and Play Festival in New York.
原文摘录
· · · · · ·
Exploratory data analysis
Visualization (for exploratory data analysis and reporting)
Dashboards and metrics
Find business insights
Data-driven decision making
Data engineering/Big Data (Mapreduce, Hadoop, Hive, and Pig)
Get the data themselves
Build data pipelines (logs→mapreduce→dataset→join with other data→mapreduce→scrape some data→join)
Build products instead of describing existing product usage
Hack
Patent writing
Detective work
Predict future behavior or performance
Write up findings in reports, presentations, and journals
Programming (proficiency in R, Python, C, Java, etc.)
Conditional probability
Optimization
Algorithms, statistical models, and machine learning
Tell and interpret stories
Ask good questions
Investigation
Research
Make inferences from data
Build data products
Find ... (查看原文)
Being humanist in the context of data science means recognizing the role your own humanity plays in building models and algorithms, thinking about qualities you have as a human that a computer does not have (which includes the ability to make ethical decisions), and thinking about the humans whose lives you are impacting when you unleash a model onto the world. (查看原文)
Now that answering complex and compelling questions with data can make the difference in an election or a business model, data science is an attractive discipline. But how can you learn this wide-ranging, interdisciplinary field? With this book, you’ll get...
(展开)
0 有用 beren 2016-04-05 17:59:13
各种data scientist出来现身说法讲经验,挺受益的
0 有用 crackcell 2013-11-25 17:07:36
结合案例,由一线实践者现身说法,作为入门来看比较合适。btw,字体排版不错。
0 有用 碳基体 2015-08-01 11:11:09
使用R来学习数据科学,有算法,有实例,不错
1 有用 cc 2015-08-11 08:37:46
给商学院的教材,案例多,模型讲的少,太简单
0 有用 在电脑前打喷嚏 2016-07-13 05:33:46
《数据科学实战》的原版,语言很棒,可参考写ps😌