《The Master Algorithm》的笔记-Ch.III Hume's problem of induction
- 章节名：Ch.III Hume's problem of induction
- 2018-02-18 05:12:02
At this point you crumple your notes in frustration and fling the into the wastebasket. There's no way to know! What can you do? The ghost of Hume nods sadly over your shoulder. You have no basis to pick one generalization over another.
That's harsh.Tom Mitchell, a leading symbolist, calls it "the futility of bias-free learning". In ordinary life, bias is a pejorative word: preconceived notions are bad. But in machine learning, preconceived notions are indispensable; you can't learn without them. In fact, preconceived notions are also indispensable to human cognition, but they're hardwired into the brain, and we take them for granted. It's biases over and beyond those that are questionable.
Sterotypes or prejudices are the prerequisite for any decisions in life. The difference between fixed mindsets and open mindsets is only the allowance and tolerance for exceptions and new rules.Newton's principle is the first unwritten rule of machine learning. We induce the most widely applicable rules we can and reduce their scope only when the data forces us to. At first sight this may seem ridiculously overconfident, but it's been working for science for over three hundred years. It's certainly possible to imagine a universe so varied and capricious that Newton's principle would systematically fail, but that's not our universe.
作者之前就提过，也许在另一个宇宙是错的，但是又与我们何干呢，我们关心的只有这个宇宙。Computers are the ultimate idiot savants: they can remember everything with no trouble at all, but that's not what we want them to do. The problem is not limited to memorizing instances wholesale. Whenever a learner finds a pattern in the data that is not actually true in the real world, we say that it has overfit the data. Overfitting is the central problem in machine learning. [...] Thus a good learner is forever walking the narrow path between blindness and hallucination. [...] It's even been said that data mining means "torturing the data until it confesses."
过度解读。Overfitting happens when you have too many hypotheses and not enough data to tell them apart.
我真的怀疑自己，统计学上到底要多少个sample才能信服，怎样择选sample才能最优化产出，为什么，这些都从来没有被提及过呢？就算咨询不是科学，但也不可以不懂科学吧？Bottom line: learning is a race between the amount of data you have and the number of hypotheses you consider. More data exponentially reduces the number of hypotheses that survive, but if you start with a lot of them, you may still have some bad ones left at the end. As a rule of thumb, if the learner only considers an exponential number of hypotheses (for example, all possible conjunctive concepts), then the data's exponential payoff cancels it and you're OK, provided you have plenty of examples and not too many attributes. On the other hand, if it considers a doubly exponential number (for example, all possible rule sets), then the data cancels only one of the exponentials and you're still in trouble.
Hypotheses-driven approach becomes even more important. The machine learning, so far, is still a tool. And like every other tool, it needs a purpose and also needs a hypothese.Accuracy on previously unseen data is a pretty stringent test; so much so, in fact, that a lot of science fails it. That does not make it useless, because science is not just about prediction; it's also about explanation and understanding. But ultimately, if your models don't make accurate predictions on new data, you can't be sure you've truly understood or explained the underlying phenomena. And for machine learning, testing on unseen data is indispensable because it's the only way to tell whether the learner has overfit or not.
Bias (High/Low) and Variance (High/Low)
Starts with a basic knowledge >> inverse deduction to hypothesize >> test, revise the hypotheses, and repeatsAnother limitation of inverse deduction is that it's very computationally intensive, which makes it hard to scale to massive data sets. For these, the symbolist algorithm of choice is decision tree induction. Decision trees can be viewed as an answer to the question of what to do if rules of more than once concept match an instance. [...] Sets of concepts with this property are called sets of classes, and the algorithm that predicts them is a classifier. A single concept implicitly defines two classes: the concept itself and its negation. Classifiers are the most widespread form of machine learning.
决策树Shortcomings...The number of possible inductions is vast, and unless we stay close to our initial knowledge, it's easy to get lost in space. Inverse deduction is easily confused by noise: how do we figure out what the missing deductive steps are, if the premises or conclusions are themselves wrong? Most seriously, real concepts can seldom be concisely defined by a set of rules. They're not black and white: there's a large gray area between, say span and nonspam. They require weighing and accumulating weak evidence until a clear picture emerges. Diagnosing an illness involves giving more weight to some symptoms than others, and being OK with incomplete evidence. No one has ever succeeded in learning a set of rules that will recognize a cat by looking at the pixels in an image, and probably no one ever will.
私对本书的所有笔记 · · · · · ·
The solution is to marry machine learning with game theory, something I've worked on in...
P and NP are the two most important classes of problems in computer science. A problem ...
Ch.III Hume's problem of induction
Hebb's rule, as it has come to be known, is the cornerstone of connectionism. Indeed, t...
...the stage was set for the second coming of evolution: in silico instead of in vivo, ...
说明 · · · · · ·