# 《Machine Learning》的笔记-第201页

###### Ms.蔬菜 (新的开始)

- 章节名：第七章
- 页码：第201页 2015-02-13 12:26:47

This chapter answers two questions: 1. Under what conditions is successful learning possible and impossible?2. Under what conditions is a particular learning algorithm assured of learning successfully？ This explains why and when ML works?Hypothesis Class: We can partiton the data. Test: Sampe errors erros： 30%， compare with the true (population) 45% 30% + epsilonerr(h) ($H[S] m \geq \frac{1}{\xi} [ln(|H|) + h(\frac{1}{\delta })]$)s.t prob of delta ($error_{h}\leqslant \varepsilon$) The growth function: ($H[m] = max_{|S|=m}H[S]$) Growth Function 说明的是我们需要用一个classifier 能把点分得最多。effective size hypothesis for one particular hypothesis: ($m \leqslant \frac{2}{\varepsilon }[log(\frac{1}{\varepsilon}) \cdot log(m^{VC})]$)for any hypothesis: ($m \leqslant \frac{2}{\varepsilon }[log(\frac{1}{\varepsilon}) \cdot log(m^{VC})\cdot log(\frac{1}{\delta })]$)PAC learning Theory 前提： (1) data points 必须是iid (2) Finite number of Hypothesis (3) A learner first found that the error is consitent with training data- what I mean is that the in-the-sample error would be 0 ; There are some interesting things we would love to consider is : What is the probability that a bad hypothesis that gets m data points right ? consider we have the hypothesis that ($ true (error) \geq \xi$) The probability to get one data point right is: ($ \leq 1-\xi$) Therefore, we have k data points, and each data point as we take the assumption is iid: ($ \leq (1-\xi)^{k}$) The second question we consider is that: What is probability that the modeler would choose a bad classifier? UNION BOUND Probabilty: $P(h_{1}\vee h_{2}\vee h_{3}....\vee h_{n}) $\leq P(h_{1})+P(h_{2})+P(h_{3})+P(h_{4}).....+P(h_{n})($ Therefore we have the polynomial equations， and it would be very easy for us to choose Upper bound (Big O) Then we come into in this conclusion: $)P(h_{1}\vee h_{2}\vee h_{3}....\vee h_{n})$\leqP(h_{1})+P(h_{2})+P(h_{3})+P(h_{4}).....+P(h_{n})\leq n(1-\xi)^{m}($ We can also come into the conclusion of the bounds of polynomial functions: $)n(1-\xi)^{m} \leq |H|\cdot n(1-\xi)^{m} \leq |H|\cdot e^{-\varepsilon \cdot m}$m is the number of data points, H is the size of the hypothesis. Another conclusion we can draw is that with the m increases, with the true error increases and with the complexity of the model increases. The probabilty to pick a bad model is decreasing, decreasing, and increasing. The first one seems very straight forward. It means that with more data point, we can judge the model with good accurarcy. The second is that with the error increase, the probailty of choosing the bad model is lesser since we loosen our criteria in choosing the model. Lastly, with the complexity of the model increases, the probabilty for choosing the bad model will be increasing. PAC bound: ($P(error _{true}(h) > \xi ) \leq |H|e^{-m\epsilon } \leq \delta $) PAC bound with this bound, we can calculate a lot of things based on these equations, we can get the training data: ($m\geq \frac{ln|H|+ln\frac{1}{\delta }}{\epsilon }$) Controlling the epsilon(error rate), and delta, how many data will we get estimation. we can also get the error bound: ($\varepsilon \geq \frac{ln|H|+ln\frac{1}{\delta }}{m }$)This is called the Haussler's bound: Limitations: true error must be zero. What if the classifier may not have 100% training data error: We assume that error true = ($\theta$) estimate of the ($\widehat{\theta}$)Therefore we have the Hoeffding‘s Bounds:To simply put, this one is just how far the estimated from the true parameters.

11人阅读

## 说明 · · · · · ·

表示其中内容是对原文的摘抄