孔明对《Pattern Recognition and Machine Learning》的笔记(15)

孔明 (Find it yourself.)

在读 Pattern Recognition and Machine Learning

书名: Pattern Recognition and Machine Learning
作者: Christopher Bishop
页数: 738
出版社: Springer
出版年: 2007-10-1

第434页 9. Mixtures models and EM

A further issue in finding maximum likelihood solutions arises from the fact that for any given maximum likelihood solution, a K-component mixture will have a total of K! equivalent solutions corresponding to the K! ways of assigning K sets of parameters to K components.

引自 9. Mixtures models and EM

不明白这段话，做个标记。

2013-09-29 21:52:50 2人喜欢回应

第703页 Appendix D. Calculus of Variations

In the same way, we can define a functional F[y] to be an operator that takes a function y(x) and return an output value F.引自 Appendix D. Calculus of Variations

对于Python来说，functional是参数为函数的函数～

2013-09-30 16:56:18 回应

第186页第4章 Linear models for classification

One way to view a linear classification model is in terms of dimensionality reduction. 引自 第4章 Linear models for classification

对于最简单的
($$
y = w^Tx
$$)
来说，其实是($w$)将D维的($x$)降到了1维，也就是一个标量。这里可以跟PCA比较。

有意思的思考角度！

2013-10-17 21:54:04 1人推荐回应

第303页 6. Kernel Methods

The Bayesian viewpoint: everything has a distribution.

2013-11-15 16:47:47 回应

第305页 6. Kernel Methods

关于高斯过程（Gaussian Process）：对于linear regression来说，高斯过程相当与是对输出y写出了一个分布（后面将其作为y的先验），在假设y的均值为0的情况下，y的协方差矩阵是一个gram矩阵。假设有N个样本，那么gram矩阵的大小是N*N，那么y的维数应该是N，所以说，如果从这个高斯过程中采样，得到的y的维度就只能是N了么？

2013-11-16 21:16:09 回应

第389页 8. Graphical Models

在这个经典的MRF应用的例子中，Bishop大神还是没能说清楚为什么clique ($\{x_i, y_i\}$) 和 ($\{x_i, x_j\}$)对应的energy function是($-\eta x_iy_i$)和($-\beta x_ix_j$)，还有为什么要添加($hx_i$)这一项。
potential function的定义还真是随意啊！如果使用其他的能量函数，也能达到image denoising的效果么？

2014-07-02 10:08:39 回应

第373页 8. Graphical Models

终于，我可以说，我对D-分离（D-separation）有个比较完整的了解了。第一次听说是在研一的人工智能课上，老师把自己讲糊涂了（呵呵。。。）。当时没有意识到D-分离的重要性，所以我也就得过且过。考试前突击时发现要考，还在网上搜索关于D-分离的介绍，考试中蒙混过关。第二次是去年，模式识别课上老师详细介绍了D-分离的三种情况，但是并没有说明为什么是这三种情况，我还以为是人为定义呢。。。这一次理解就是通过PRML这本书，Bishop大神将D-分离的来龙去脉讲的非常清晰，从三个例子入手，通过概率公式，而且同时考虑conditioning variables是否可见（observed），将D-分离和条件独立连接起来。

2014-07-02 11:39:07 回应

第89页 2. Probability distributions

今天重新读Gaussian distribution部分，严格按照书中的思路手工进行推导（不包括习题），感慨conditional Gaussian distribution和marginal Gaussian distribution的结果之美。

2015-05-13 20:35:46 回应

第120页 2. Probability Distributions

关于non-parametric methods: non-parametric和parametric方法是两种概率密度估计方法，两者的不同是：parametric方法“having specific functional forms governed by a small number of parameters whose values are to be determined from a data set.”，例如Gaussian中的均值和方差参数，相反，non-parametric方法则不包含控制概率密度函数的参数，如KDE、k近邻等。注意，non-parametric方法也可能含参数，例如k近邻方法中的k，只不过这里的k并不控制概率密度函数的形式（functional form）。关于这两者，虽然提到它们的次数很多，但根据我的经验，很多人说不明白它们之间的不同。

2015-05-18 10:11:44 回应

第228页 5. Neural Networks

NN和Graphical models的区别：

... the internal nodes represent deterministic variables rather than stochastic ones.

引自 5. Neural Networks

2015-10-28 10:55:30 1人喜欢回应

<前页 1 2 后页>

孔明的其他笔记 · · · · · · ( 全部135条 )

自然语言处理：基于预训练模型的方法: 1
一本书读懂Web3.0：区块链、NFT、元宇宙和DAO: 1
Tensorflow：实战Google深度学习框架: 3
机器学习: 4
计算机程序的构造和解释(原书第2版): 15
人类简史: 1
Programming Scala: 3
计算机程序设计艺术（第3卷英文版·第2版）: 1
The C Programming Language: 3
算法导论（原书第2版）: 5
编程珠玑: 3
The Elements of Statistical Learning: 11
房地产的繁荣与萧条: 2
Matplotlib for Python Developers: 1
MongoDB权威指南: 1
Foundations of Machine Learning: 4
伟大的博弈: 1
深入理解计算机系统（原书第2版）: 20
学术研究，你的成功之道: 1
Learning Deep Architectures for AI: 1
模式分类: 5
时间的形状: 1
我编程，我快乐: 3
史蒂夫·乔布斯传: 5
C陷阱与缺陷: 3
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning): 1
上帝掷骰子吗？: 3
七周七语言: 2
统计自然语言处理: 1
人工智能: 1
软件随想录: 1
The Art of R Programming: 4
松本行弘的程序世界: 1
Facebook效应: 2
Python基础教程（第2版）: 5