Chapter 1: Introduction
Nostril (因真理得自由以服务)
Supervised learning: most successful
Row: sample, data point, observation
Column: feature, variable, field
Feature engineering = feature extraction = building a good representation of data
Python: General-purpose programming language + Domain-specific scripting language (e.g. MATLAB, R) + Creating GUIs and web services
NumPy: for scientific computing
- functionality for multidimensional arrays
- high-level math functions (e.g. linear algebra operations, the Fourier傅立叶 transform, and pseudorandom number generators)
- the NumPy array: the fundamental data structure in scikit-learn
- the ndarray class
SciPy: for scientific computing
- advanced linear algebra routines
- math function optimization
- statistical distribution
- scipy.sparse: sparse matrices (a 2D array that contains mostly zero)
matplotlib: for scientific plotting
- line charts, histograms, scatter plots...
- %matplotlib inline
- %matplotlib notebook
- plt.show
pandas: for data wrangling and analysis
- DataFrame (modeled after R)
- valid file formats: SQL, excel, CSV...
scikit-learn: for machine learning
- nomenclature命名法: capital X = data (n-dimensional); lowercase y = labels (1-dimensional)
- A pair scatter plot (pd.scatter_matrix):
- Algorithm: e.g. KNN
- Model: e.g. KNN+training data (fit)
Action:
- Browse the scikit-learn user guide and API documentation
- Read SciPy Lecture Notes to be familiar with NumPy and matplotlib
说明 · · · · · ·
表示其中内容是对原文的摘抄