私对《The Master Algorithm》的笔记(7)

The Master Algorithm
  • 书名: The Master Algorithm
  • 作者: Pedro Domingos
  • 副标题: How the Quest for the Ultimate Learning Machine Will Remake Our World
  • 页数: 352
  • 出版社: Basic Books
  • 出版年: 2015-9-22
  • Ch.I The Machine-Learning Algorithm
    The solution is to marry machine learning with game theory, something I've worked on in the past: don't just learn to defeat what your opponent does now; learn to parry what he might do against your learner. Factoring in the costs and benefits of different actions, as game theory does, can also help strike the right balance between privacy and security.

    This is truly enlightening insight.

    2018-02-18 02:53:30 回应
  • Ch. II The master algorithm
    P and NP are the two most important classes of problems in computer science. A problem is in P if we can solve it efficiently, and it's in NP if we can efficiently check its solution. The famous P = NP question is whether every efficiently cheackable problem is also efficiently solvable.

    知易行难。

    The future belongs to those who understand at a very deep level how to combine their unique expertise with what algorithms do best.

    I want to become a superconsultant lol

    Besides, knowledge is not just a long list of facts. Knowledge is general, and has structure.

    So wise!

    2018-02-18 03:18:52 回应
  • Ch.III Hume's problem of induction
    At this point you crumple your notes in frustration and fling the into the wastebasket. There's no way to know! What can you do? The ghost of Hume nods sadly over your shoulder. You have no basis to pick one generalization over another.

    That's harsh.

    Tom Mitchell, a leading symbolist, calls it "the futility of bias-free learning". In ordinary life, bias is a pejorative word: preconceived notions are bad. But in machine learning, preconceived notions are indispensable; you can't learn without them. In fact, preconceived notions are also indispensable to human cognition, but they're hardwired into the brain, and we take them for granted. It's biases over and beyond those that are questionable.

    Sterotypes or prejudices are the prerequisite for any decisions in life. The difference between fixed mindsets and open mindsets is only the allowance and tolerance for exceptions and new rules.

    Newton's principle is the first unwritten rule of machine learning. We induce the most widely applicable rules we can and reduce their scope only when the data forces us to. At first sight this may seem ridiculously overconfident, but it's been working for science for over three hundred years. It's certainly possible to imagine a universe so varied and capricious that Newton's principle would systematically fail, but that's not our universe.

    作者之前就提过,也许在另一个宇宙是错的,但是又与我们何干呢,我们关心的只有这个宇宙。

    Computers are the ultimate idiot savants: they can remember everything with no trouble at all, but that's not what we want them to do. The problem is not limited to memorizing instances wholesale. Whenever a learner finds a pattern in the data that is not actually true in the real world, we say that it has overfit the data. Overfitting is the central problem in machine learning. [...] Thus a good learner is forever walking the narrow path between blindness and hallucination. [...] It's even been said that data mining means "torturing the data until it confesses."

    过度解读。

    Overfitting happens when you have too many hypotheses and not enough data to tell them apart.

    我真的怀疑自己,统计学上到底要多少个sample才能信服,怎样择选sample才能最优化产出,为什么,这些都从来没有被提及过呢?就算咨询不是科学,但也不可以不懂科学吧?

    Bottom line: learning is a race between the amount of data you have and the number of hypotheses you consider. More data exponentially reduces the number of hypotheses that survive, but if you start with a lot of them, you may still have some bad ones left at the end. As a rule of thumb, if the learner only considers an exponential number of hypotheses (for example, all possible conjunctive concepts), then the data's exponential payoff cancels it and you're OK, provided you have plenty of examples and not too many attributes. On the other hand, if it considers a doubly exponential number (for example, all possible rule sets), then the data cancels only one of the exponentials and you're still in trouble.

    Hypotheses-driven approach becomes even more important. The machine learning, so far, is still a tool. And like every other tool, it needs a purpose and also needs a hypothese.

    Accuracy on previously unseen data is a pretty stringent test; so much so, in fact, that a lot of science fails it. That does not make it useless, because science is not just about prediction; it's also about explanation and understanding. But ultimately, if your models don't make accurate predictions on new data, you can't be sure you've truly understood or explained the underlying phenomena. And for machine learning, testing on unseen data is indispensable because it's the only way to tell whether the learner has overfit or not.

    Bias (High/Low) and Variance (High/Low)

    Starts with a basic knowledge >> inverse deduction to hypothesize >> test, revise the hypotheses, and repeats

    Another limitation of inverse deduction is that it's very computationally intensive, which makes it hard to scale to massive data sets. For these, the symbolist algorithm of choice is decision tree induction. Decision trees can be viewed as an answer to the question of what to do if rules of more than once concept match an instance. [...] Sets of concepts with this property are called sets of classes, and the algorithm that predicts them is a classifier. A single concept implicitly defines two classes: the concept itself and its negation. Classifiers are the most widespread form of machine learning.

    决策树

    Shortcomings...The number of possible inductions is vast, and unless we stay close to our initial knowledge, it's easy to get lost in space. Inverse deduction is easily confused by noise: how do we figure out what the missing deductive steps are, if the premises or conclusions are themselves wrong? Most seriously, real concepts can seldom be concisely defined by a set of rules. They're not black and white: there's a large gray area between, say span and nonspam. They require weighing and accumulating weak evidence until a clear picture emerges. Diagnosing an illness involves giving more weight to some symptoms than others, and being OK with incomplete evidence. No one has ever succeeded in learning a set of rules that will recognize a cat by looking at the pixels in an image, and probably no one ever will.

    2018-02-18 15:59:27 1回应
  • Ch. IV How does your brain learn
    Hebb's rule, as it has come to be known, is the cornerstone of connectionism. Indeed, the field derives its name from the belief that knolwedge is stored in the connections between neurons.

    从这一章开始我就有些看得云里雾里,需要发挥一些脱离地面的想象……

    In symbolist learning, there is a one-to-one correspondence between symbols and the concepts they represent. In contrast, connectionist representations are distributed: each concept is represented by many neurons, and each neuron participates in representing many different concepts.
    [...] Another difference between symbolist and connectionist learning is that the former is sequential, while the latter is parallel.
    [...]In Hemingway's The Sun Also Rises, when Mike Campbell is asked how he went bankrupt, he replies, "Two ways. Gradually and then suddenly." The same could be said of Lehman Brothers. Thta's the essence of an S curve.

    机器和大脑都遵循S曲线模型。刚学习一样新东西新技能的时候总是很慢,但是过一段时间就会突发猛进,打开了一个缺口,之后又慢下来。这和瓶颈的关系是什么……以及,不是所有人都能熬到S pick up的那个turning point吧?

    A living cell is a quintessential example of a non-linear system. The cell performs all of its functions by turning raw materials into end products through a complex web of chemical reactions. We can discover the structure of this network using symbolist methods like inverse deduction [...] this is difficult because there is no simple linear relationship between these quantitiaes. Rather, the cell maintains its stability through interlocking feedback loops, leading a very complex behavior. Backpropagation is well suited to this problem because of its ability to efficiently learn nonlinear functions.

    世界很少是线性的。

    2018-02-24 15:50:27 回应
  • Ch V. Evolution: Nature's Learning Algorithm
    ...the stage was set for the second coming of evolution: in silico instead of in vivo, and a billion times faster. [...] The key input to a genetic algorithm, as Holland's creation came to be known, is a fitness function. Given a candidate program and some purpose it is meant to fill, the fitness function assigns the program a numeric score reflecting how well it fits the purpose.

    是不是听起来很熟悉,像深蓝也像Alpha

    Notice how uch genetic algorithms differ from multilayer perceptrons. Backprop entertains a single hypothesis at any given time, and the hypothesis changes gradually until it settles into a local optimum. Genetic algorithms consider an entire population of hypotheses at each step, and these can make big jumps from one generation to the next, thanks to crossover. Backprop proceeds deterministically after setting the initial weights to small random values. Genetic algorithms, in contrast, are full of random choices: which hypotheses to keep alive and cross over (with fitter hypotheses being more likely candidates), where to cross two strings, which bits to mutate. Backprop learns weights for a predefined network architecture; denser networks are more flexible but also harder to learn. Genetic algorithms make no a priori assumptions abot the structures they will learn, other than their general form.

    从单一假设扩张到多个随机假设,思考力不是不被需要了,反而更加重要:因为如何保证一开始设定的building blocks就要相对有意义?

    The better a slot machine looks, the more you should play it, but never completed give up on the other one, in case it turns out to be the best one after all.

    Find the most intersting thing to devote your energy, time and resources (money) but never give it all you have and open for other things that could come later that looks even more interesting?

    2018-02-24 22:34:03 回应
  • Ch. VI In the church of the reverend Bayes
    Real probability distributions are usually very peaked, with vast wastelands of minuscule probability punctuated by sudden Everests. The Markov chain then converages to the nearest peak and stays there, leading t very biased probability estimates. It's as if the drunkard followed the scent of alcohol to the nearest tavern and stayed there all night, instead of wandering all around the city like we wanted him to.

    The Master Algorithm is likely depend on multiple different discplines and maybe, some "knots" have not been tackled yet.

    All of the tribes we've met so far have one thing in common: they learn an explicity model of the phenomenon under consideration, whether it's a set of rules, a multilayer perceptron, a genetic program, or a Bayesian network. When they don't have enought data to do that, they're stumped. But analogizers can learn from as little as one example because they never form a model. Let's see what they do instead.

    Introduction to Ch. VII

    2018-03-04 16:38:07 1回应
  • Ch. VII You are what you resemble
    In fact, no learner is immune to the curse of dimensionality. It's the second worst problem in machine learning, after overfitting. The term curse of dimensionality was coined by Richard Bellman, a control theorist, in the fifties. He observed that control algorithms that worked fine in three dimensions became hopelessly inefficient in higher-dimensional spaces, such as when you want to control every joint in a robot arm or every knob in a chemical plant. But in machine learning the problem is more than just computational cost - it's that learning itself becomes harder and hearder as the dimensionality goes up.

    This is sciense. Something weired happened in the higher-dimensional spaces that needs to be explained by theory, proved by data scientist.

    Am I going to that space? No. But I could be the application data scientist.

    2018-03-10 17:37:08 回应

私的其他笔记  · · · · · ·  ( 全部487条 )

文明的冲突与世界秩序的重建
1
It's Your Hormones
4
草原狼导师
1
灵境追踪师
1
哈佛非虚构写作课
1
The Euro and the Battle of Ideas
4
学箭悟禅录
1
重新定义公司
6
波斯少年
1
Market Research in Practice
3
Case Interview Secrets
10
金字塔原理
1
当我谈跑步时我谈些什么
5
你一定爱读的极简欧洲史
2
Pedigree
11
思考的技术
10
用图表说话
1
麦肯锡传奇
1
艾伦•图灵传
1
The Dragon's Gift
9
影像中的国
4
当下的冲击
5
The Law and Policy of the World Trade Organization
4
Why Nations Fail
5
心理治疗师的问答艺术
3
自私的基因
3
Case in Point
9
世界是平的
6
浩荡两千年
6
常识的正面与反面
5
习惯的力量
5
心智探奇
8
沃顿商学院最受欢迎的谈判课
16
Vault Career Guide to Consulting
1
Lean In
3
人在欧洲
7
基辛格
4
Sex and The Single Girl
1
大棋局
1
大国游戏
1
偶发空缺
1
怪诞行为学
5
吾国吾民
8
幻影书
1
像我这样笨拙地生活
2
曼哈顿的中国女人
5
华为的世界
4
中性
2
重新发现社会
5
The Big Book of the 70's
1
津巴多时间心理学
9
人的境况
1
经济学是什么
3
经济与管理法语
148
女性主义
3
职场路线图
2
魅力何来
1
没有任何借口
10
心理学统治世界1
1
二八法则
5
超越自由与尊严
5
周一清晨的领导课
5
性别麻烦
1
心理学与生活
4
追求卓越
1
当心!你的记忆会犯罪
10
太傻十日谈
7
改变心理学的40项研究
12
生命中不能承受之轻
6
论自由
1
人都是要死的
8
带一本书去巴黎
4
第二性
4
沉默的大多数
3
爱你就像爱生命
2
时间管理幸福学
4
世界500强工作规范
1
天才在左 疯子在右
14