Lecture 1
- 章节名:Lecture 1
没有书啊,不过视频和讲义倒是有.先跟这学吧~ If you find out there are the following components in your field, then you have machine learning you can apply to. The essence of machine learning: 1. A pattern exists. If there are no pattern existing, there's nothing to learn. (but how do we know if there is a pattern?) 2. We cannot pin it down mathematically. 3. We have DATA on it. No data, you have nothing. Components of learning Input: X, a d dimensional vector Output: y, the decision or prediction Target function: f, X -> Y (the magic formula which we don't know and will never know in all machine learning problem. If we know it, then we don't need learning!) Data: (X1, y1), (X2, y2), ..., (XN, yN) (could be historical records) Hypothesis: we use the data to get the Hypothesis. It is the formula we get to APPROXIMATE the target function----g, X -> Y The goal of learning is that the value of g does approximate that of f well.
我们想要知道这个target function是什么,但是这不可能,只能是估计一个近似的function. 为此, 我们需要从数据学习. 数据就是这个target function的一种具体表现形式. 学习的过程就是基于数据,在一个特定的假设集合中选出一个最接近target function的candidate. 我们在这里限定了假设的集合,这个action的直接体现就是我们选方法的过程(比如你要用linear regression或者SVM). 当然,如果你够勇敢,你可以将所有的假设都放到这个集合里面. 不过,即使我们选定了一个方法,比如SVM,我们仍然有无限的假设,因为每一个不同的SVM的参数就是一个新的假设了! 假设集合以及学习算法一起构成了learning model. 比如,你选择neural network作为你的假设,并且使用了back propagation作为学习算法(进行search,找到最合适的neural network),这构成了一个learning model.
视频中提到的这个例子,选取了一个假设h,它的形式由图中所示. h由你如何选择w和threshold而定,这是区分h以及其他假设的factor(这个地方,我觉得是不是应该说是同类型的假设?), 也是学习算法需要search的内容(通过minimize an error function).
这里的PLA,w+yx里面的w应该是红色。如果y是+1,但是你misclassified,那么w和x向量的夹角大于90度,为了进行调整,需要更新w使之与x之前的夹角小于90度,我们将w和yx相加并将它们的和作为新的w。y=-1的情况同理。 Basic premium of learning:
observations in our case are the data while the underlying process is the target function. Q: How do we know if the data are linearly separable? A: Well, usually we don't know. We assume that the data are not linearly separable in practice because they are very unlikely to be. We have algorithms that can map non-linear separable data to the opposite (next lecture) as well as methods that deal with this kind of data directly. PLA算法在线性可分数据上,会converge,找到solution。对于线性不可分的数据,你就悲剧了,因为你总有分错的点,所以你就一直要去调整weight,何时是个头? 啊那什么,PLA不是什么特别好的方法,这里用它是因为它简单,可以很快的解释。 多少数据算足够多去学习? practical answer:呃你管不着,你控制不了,你能拿到多少,你就用多少...好吧。 theoretical answer:之后讲。
Steelwings对本书的所有笔记 · · · · · ·
-
Lecture 1
-
Lecture 2
(这节课的东西,有一些没有听太明白,不过大意是清楚了。具体的内容,多半还要看看书才能理...
-
Lecture 3 The Linear Model
这一节课讲的是Linear Model (Linear Classification and Regression) Linear in what? Linea...
说明 · · · · · ·
表示其中内容是对原文的摘抄