05_bayesian_learning
Bayesian learning
conditional probability
事件B发生的条件下事件A发生的概率:
Bayesian theory
We know $P(A|B)$, but we focus more on $P(B|A)$ :
optimal classifier
If $P(y_K|x)=max{P(y_1|x),P(y_2|x)\cdots P(y_k|x)}$, then $x\in y_k$.
According to Bayesian theory:
For $\forall P(y_i|x)$, $P(x)$ is the same. So only numerators matter(Naive Bayes assumption: features are conditionally independent):
Beta distribution
对概率的概率分布,区间(0,1). 概率密度函数:
expectation: $\frac{\alpha}{\alpha+\beta}$
二项分布:$P(data|\theta)\varpropto \theta^z(1-\theta)^{N-z}$
beta 分布: $Beta(a,b)\varpropto \theta^{a-1}(1-\theta)^{b-1}$
在贝叶斯估计中: 需要在给定数据情况下求出$\theta$的值. 现在我们将Beta分布代进$P(\theta)$, 将二项分布代入$P(data|\theta)$:
得到的贝叶斯估计服从Beta(a',b')分布,即"Beta distribution is binomial conjugate prior(共轭先验)."用B函数将它标准化就得到后验概率:
- maximum likelihood estimation(MLE): choose value that maximizes the probability of observed data
-
maximum a posterior estimation(MAP): choose value that is most probable given observed data and prior belief
Naive Bayes
naive bayes assumption: features are conditionally independent given class.
GNB
GNB: Gaussian Naive Bayes, which is designed for continuous features.
Assumptions
- Y is boolean, goverened by a Bernoulli distribution, with parameter $\pi=P(Y=1)$
- each $x_i$ is a continuous random variable
- for each $x_i,\quad P(x_i|Y=y_k)$ is a Gaussian distribution of the form $N(\mu,\sigma)$
- for each $x_i$, they are conditionally independent
With the conditional assumption:
Define: $\theta_{i1}=P(X_i=1|Y=1),\quad \theta_{i0}=P(X_i=1|Y=0)$, and then:
So: