《Machine Learning》的原文摘录

  • The inductive learning hypothesis. Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples (查看原文)
    长脸方 1赞 2013-11-15 10:08:29
    —— 引自章节:Concept Learning and The Gener
  • We shall see that most current theory of machine learning rests on the crucial assumption that the distribution of training examples is identical to the distribution of test examples. Concept learning. Inferring a boolean-valued function from training examples of its input and output. (查看原文)
    长脸方 1赞 2013-11-15 10:08:29
    —— 引自章节:Concept Learning and The Gener
  • As illustrated by these first two steps, positive training example may force the S boundary of the version space to become increasingly general. Negative training examples play the complimentary role of forcing the G boundary to become increasing specific. (查看原文)
    长脸方 1赞 2013-11-15 10:08:29
    —— 引自章节:Concept Learning and The Gener
  • When gradient descent falls in a local minimum with respect to one of these weights, it will not necessarily be in a local minimum with respect to the other weights. In fact, the more weights in network, the more dimensions that might provide "escape routs" for gradient descent to fall away from the local minimum with respect to this single weight. (查看原文)
    长脸方 1赞 2014-05-03 20:55:53
    —— 引自章节:Artificial Neural Network
  • The proof of this involoves showing that any function can e approximated by a inear combination of some samll region, and then showing that two layers of sigmoid units are sufficient to produce good local approximations. (查看原文)
    长脸方 1赞 2014-05-03 20:55:53
    —— 引自章节:Artificial Neural Network
  • The only likely impact on the final error is that different error-minimization procedures may fall into different local minima. Bishop (1996) contains a general discussion of several parameter optimization methods for training networks A variety of methods have been proposed to dynamically grow or shrink the number of network units and interconnections in an attempt to improve generalization accuracy and training efficiency (查看原文)
    长脸方 1赞 2014-05-03 20:55:53
    —— 引自章节:Artificial Neural Network
  • Every hypothesis consistent with D is a MAP hypothesis (查看原文)
    长脸方 4回复 2014-04-21 11:42:34
    —— 引自章节:Bayesian Learning
  • By identifying probability distributions P(h) and P(D|h) under which th e algotithm outputs optimal (i.e., MAP) hypotheses, we can characterize the implicit assumptions under which this algorithm behaves optimally. (查看原文)
    长脸方 4回复 2014-04-21 11:42:34
    —— 引自章节:Bayesian Learning
  • here the implicit assumptions that we attribute to the learner are assumptions of the form "the prior probabilities over H are given by the distribution P(h), and the strength of data in rejecting or accepting a hypothesis is given by P(D|h)" (查看原文)
    长脸方 4回复 2014-04-21 11:42:34
    —— 引自章节:Bayesian Learning
  • A straightforward Bayesian analysis will show that under certain assumptions any learning alogorithm that minimizes the squared error between the output hypothesis predictions and the training data will output a maximum likelihood hypothesis. (查看原文)
    长脸方 4回复 2014-04-21 11:42:34
    —— 引自章节:Bayesian Learning
  • Does the prove once and for all tha short hypotheses are best? No. What we have show is only that if a representation of hyptheses is chosen so that the size of hypothesis h is -log_2 P(h), and if a representation for exceptions is chosen so that the encoding length of D given h is equal to -log_2 P(D | h), then the MDL principle produces MAP hypotheses. (查看原文)
    长脸方 4回复 2014-04-21 11:42:34
    —— 引自章节:Bayesian Learning
  • No other classification method using the same hypothesis space and same prior knowledge can outperform this method on averge. (查看原文)
    长脸方 4回复 2014-04-21 11:42:34
    —— 引自章节:Bayesian Learning
  • Exact inference of probabilities in general for an arbitrary Bayesian network is knovwn to be NP-hard (Cooper 1990). (查看原文)
    长脸方 4回复 2014-04-21 11:42:34
    —— 引自章节:Bayesian Learning
  • We call these methods lazy because they defer the decision of how to generalize beyond the training data until each new query instance is encountered (查看原文)
    长脸方 2014-05-07 10:57:44
    —— 引自章节:Instance-Based Learing
  • We call this method eager because it generalizes beyond the training data before observing the new query, committing at training time to the network structure and weights that define its approximation to the target function (查看原文)
    长脸方 2014-05-07 10:57:44
    —— 引自章节:Instance-Based Learing
  • This is because using a single sample S eliminates the variance due to random differences in the compositions generally be an overly conservative, but still correct, interval. (查看原文)
    长脸方 2014-05-19 22:27:09
    —— 引自章节:Evaluating Hypotheses