Lecture 1

Steelwings

章节名：Lecture 1
2013-06-03 05:13:13

没有书啊,不过视频和讲义倒是有.先跟这学吧~ If you find out there are the following components in your field, then you have machine learning you can apply to. The essence of machine learning: 1. A pattern exists. If there are no pattern existing, there's nothing to learn. (but how do we know if there is a pattern?) 2. We cannot pin it down mathematically. 3. We have DATA on it. No data, you have nothing. Components of learning Input: X, a d dimensional vector Output: y, the decision or prediction Target function: f, X -> Y (the magic formula which we don't know and will never know in all machine learning problem. If we know it, then we don't need learning!) Data: (X1, y1), (X2, y2), ..., (XN, yN) (could be historical records) Hypothesis: we use the data to get the Hypothesis. It is the formula we get to APPROXIMATE the target function----g, X -> Y The goal of learning is that the value of g does approximate that of f well.

我们想要知道这个target function是什么,但是这不可能,只能是估计一个近似的function. 为此, 我们需要从数据学习. 数据就是这个target function的一种具体表现形式. 学习的过程就是基于数据,在一个特定的假设集合中选出一个最接近target function的candidate. 我们在这里限定了假设的集合,这个action的直接体现就是我们选方法的过程(比如你要用linear regression或者SVM). 当然,如果你够勇敢,你可以将所有的假设都放到这个集合里面. 不过，即使我们选定了一个方法，比如SVM，我们仍然有无限的假设，因为每一个不同的SVM的参数就是一个新的假设了！假设集合以及学习算法一起构成了learning model. 比如,你选择neural network作为你的假设,并且使用了back propagation作为学习算法(进行search,找到最合适的neural network),这构成了一个learning model.

视频中提到的这个例子,选取了一个假设h,它的形式由图中所示. h由你如何选择w和threshold而定,这是区分h以及其他假设的factor(这个地方,我觉得是不是应该说是同类型的假设?), 也是学习算法需要search的内容（通过minimize an error function）.

这里的PLA，w+yx里面的w应该是红色。如果y是+1，但是你misclassified，那么w和x向量的夹角大于90度，为了进行调整，需要更新w使之与x之前的夹角小于90度，我们将w和yx相加并将它们的和作为新的w。y=-1的情况同理。 Basic premium of learning:

Using a set of observations to uncover an underlying process

引自 Lecture 1

observations in our case are the data while the underlying process is the target function. Q: How do we know if the data are linearly separable? A: Well, usually we don't know. We assume that the data are not linearly separable in practice because they are very unlikely to be. We have algorithms that can map non-linear separable data to the opposite (next lecture) as well as methods that deal with this kind of data directly. PLA算法在线性可分数据上，会converge，找到solution。对于线性不可分的数据，你就悲剧了，因为你总有分错的点，所以你就一直要去调整weight，何时是个头？啊那什么，PLA不是什么特别好的方法，这里用它是因为它简单，可以很快的解释。多少数据算足够多去学习？ practical answer：呃你管不着，你控制不了，你能拿到多少，你就用多少...好吧。 theoretical answer：之后讲。

83人阅读

> Steelwings的所有笔记（70篇）

Steelwings对本书的所有笔记 · · · · · ·

Lecture 1
Lecture 2

（这节课的东西，有一些没有听太明白，不过大意是清楚了。具体的内容，多半还要看看书才能理...
Lecture 3 The Linear Model

这一节课讲的是Linear Model (Linear Classification and Regression) Linear in what? Linea...

> 查看全部9篇

说明 · · · · · ·

表示其中内容是对原文的摘抄

Lecture 1

Steelwings

Steelwings对本书的所有笔记 · · · · · ·

Lecture 1

Lecture 2

Lecture 3 The Linear Model

说明 · · · · · ·