Chapter 1 Introduction
Overview
Data Science Is OSEMN
Intermezzo Chapters
What Is the Command Line?
Why Data Science at the Command Line?
A Real-World Use Case
Further Reading
Chapter 2 Getting Started
Overview
Setting Up Your Data Science Toolbox
Essential Concepts and Tools
Further Reading
Chapter 3 Obtaining Data
Overview
Copying Local Files to the Data Science Toolbox
Decompressing Files
Converting Microsoft Excel Spreadsheets
Querying Relational Databases
Downloading from the Internet
Calling Web APIs
Further Reading
Chapter 4 Creating Reusable Command-Line Tools
Overview
Converting One-Liners into Shell Scripts
Creating Command-Line Tools with Python and R
Further Reading
Chapter 5 Scrubbing Data
Overview
Common Scrub Operations for Plain Text
Working with CSV
Working with HTML/XML and JSON
Common Scrub Operations for CSV
Further Reading
Chapter 6 Managing Your Data Workflow
Overview
Introducing Drake
Installing Drake
Obtain Top Ebooks from Project Gutenberg
Every Workflow Starts with a Single Step
Well, That Depends
Rebuilding Specific Targets
Discussion
Further Reading
Chapter 7 Exploring Data
Overview
Inspecting Data and Its Properties
Computing Descriptive Statistics
Creating Visualizations
Further Reading
Chapter 8 Parallel Pipelines
Overview
Serial Processing
Parallel Processing
Distributed Processing
Discussion
Further Reading
Chapter 9 Modeling Data
Overview
More Wine, Please!
Dimensionality Reduction with Tapkee
Clustering with Weka
Regression with SciKit-Learn Laboratory
Classification with BigML
Further Reading
Chapter 10 Conclusion
Let’s Recap
Three Pieces of Advice
Where to Go from Here?
Getting in Touch
· · · · · · (
收起)
0 有用 Nova 2014-12-06 03:24:49
刚开始读,介绍的全是近年来新开发的工具。手边没电脑,读起来很陌生啊 :(
0 有用 大嘴巴灵机一动 2015-11-02 19:59:46
讲那么多csv、json,我用不上啊……
0 有用 wavefancy 2019-11-05 00:47:39
非常好的一本书,特别推荐。和我十几年的数据分析经验非常吻合。不是所有的工具都适合每个人,但是思想非常契合。因为每个人的分析数据差异非常大,完全可以自己定制自己的工具集。经验和思想的力量。
2 有用 阿道克 2015-10-20 21:51:50
一种个人化轻量级的数据处理思路
1 有用 沂水弦歌 2015-11-29 21:16:56
命令行的强大毋庸置疑,字符界面的简洁高效也叫人觉得异常舒服,然而书中介绍的这个工具不怎么在意,还是喜欢Anaconda,还有书里所采用的环境设置用的是Vagrant,我比较偏好Docker