Poisson Regression and Exponential Family


(a)

Consider the Poisson distribution paramterized by λ:

p(y;λ)=eλλyy!p(y;λ) = \frac {e^{-λ} λ^y} {y!}

Show that the Poisson distribution is in the exponential family, and state what are b(y),η,T(y)b(y), η, T(y) and a(η)a(η).

Exponential family format:

p(y;η)=b(y)exp(ηTT(y)a(η))p(y;η) = b(y)exp(η^T T(y) - a(η))

Poisson:

p(y;λ)=1y!e(λ)λyp(y;λ) = \frac{1}{y!} e^{(-λ)} λ^y
=1y!exp(λ)exp(ln(λy))= \frac{1}{y!} exp(-λ) exp(ln(λ^y))
=1y!exp(ylnλλ)= \frac{1}{y!} exp(ylnλ-λ)

(b)

Consider performing regression using a GLM model with a Poisson response variable. What is the canonical response function for the family?

(A Poisson random variable with parameter λ has mean λ)

hΘ(x)=E[yx;Θ]h_Θ(x) = E[y|x;Θ]
=g(η)= g(η)
=E[T(y);η]= E[T(y);η]
=λ= λ
=eη= e^η
=eΘTx= e^{Θ^Tx}

(c)

For a training set {(x(1),y(1)),...,(x(i),y(i))} \{(x^{(1)}, y^{(1)}), ... ,(x^{(i)}, y^{(i)})\} ,let the log-likelihood of an example be log p(y(i)x(i)) log~p(y^{(i)}|x^{(i)}) . By taking the derivative of the log-likelihood with respect to ΘjΘ_j, derive the stochastic gradient ascent rule for learning using a GLM with Possion response y and the canonical response function.

l(Θ)=log(1y!e(λ)λy)l(Θ) = log(\frac{1}{y!} e^{(-λ)} λ^y)
=ylog(λ)log(y!)λ= ylog(λ) - log(y!) - λ
=ylog(eΘTx)log(y!)eΘTx= ylog(e^{Θ^Tx}) - log(y!) - e^{Θ^Tx}
=yΘTxlog(y!)eΘTx= yΘ^Tx - log(y!) - e^{Θ^Tx}

 

lΘi=yxieΘTxxi=(yeΘTx)xi\frac{\partial l} {\partial Θ_i} = yx_i - e^{Θ^Tx}x_i = ( y - e^{Θ^Tx} ) x_i

Stochastic Gradient Ascent rule:

for each (x(j),y(j))(x^{(j)},y^{(j)}) in training examples {

Θi:=Θiα((eΘTx(j)y(j))xi(j))Θ_i := Θ_i - α((e^{Θ^Tx^{(j)}}- y^{(j)} ) x_i^{(j)})

}

Surprisingly, it's the same as logistic regression !!!

(d)

Consider using GLM with a response veriable from any member of the exponential family in which T(y)=yT(y)=y, and the canonical response function h(x)h(x) for the family. Show that stochastic gradient ascent on the log-likelihood log(YX;Θ)log(Y|X;Θ) results in the update rule Θi:=Θiα(h(x)y)xiΘ_i := Θ_i - α(h(x)-y)x_i.

l(Θ)=log p(yx;Θ)l(Θ) = log~p(y|x;Θ)
=log p(y;η)= log~p(y;η)
=log [b(y) exp[ηT T(y)a(η)]]= log~[b(y)~exp[η^T~T(y) - a(η)]]
=log [b(y) exp[ηya(η)]= log~[b(y)~exp[ηy-a(η)]
=log(b(y))+ηya(η)= log(b(y)) + ηy - a(η)

 

lΘi=yx(i)aηx(i)\frac{\partial l}{\partial Θ_i} = yx^{(i)} - \frac{\partial a}{\partial η} x^{(i)}
=(y(i)aη)x(i)= (y^{(i)} - \frac{\partial a}{\partial η} )x^{(i)}

Almost there, so what's aη \frac{\partial a}{\partial η} ??

p(y;η)dy=1\int p(y;η)dy = 1
ηp(y;η)dy=0\frac{\partial}{\partial η} \int p(y;η)dy = 0
ηp(y;η)dy=0\int \frac{\partial}{\partial η} p(y;η)dy = 0
b(y) exp[ηya(η)](yη)dy=0\int b(y)~exp[ηy-a(η)] (y-\frac{\partial}{\partial η})dy = 0
p(y;η)(yη)dy=0\int p(y;η)(y-\frac{\partial}{\partial η})dy = 0
p(y;η)ydyp(y;η)ηdy=0\int p(y;η)ydy - \int p(y;η)\frac{\partial}{\partial η}dy = 0
y=η=h(x)y = \frac{\partial}{\partial η} = h(x)

So aη \frac{\partial a}{\partial η} is exactly h(x)h(x) !!

lΘi=(y(i)aη)x(i)=(y(i)h(x(i)))x(i)\frac{\partial l}{\partial Θ_i} = (y^{(i)} - \frac{\partial a}{\partial η}) x^{(i)} = (y^{(i)} - h(x^{(i)})) x^{(i)}

So the Stochastic Gradient Ascent rule:

for each (x(j),y(j))(x^{(j)},y^{(j)}) in training examples {

Θi:=Θiα((h(x)y(j))xi(j))Θ_i := Θ_i - α((h(x)- y^{(j)} ) x_i^{(j)})

}