Poisson Regression and Exponential Family

(a)

Consider the Poisson distribution paramterized by λ:

p(y;λ) = \frac {e^{-λ} λ^y} {y!}

Show that the Poisson distribution is in the exponential family, and state what are $b(y), η, T(y)$ and $a(η)$ .

Exponential family format:

p(y;η) = b(y)exp(η^T T(y) - a(η))

Poisson:

p(y;λ) = \frac{1}{y!} e^{(-λ)} λ^y

= \frac{1}{y!} exp(-λ) exp(ln(λ^y))

= \frac{1}{y!} exp(ylnλ-λ)

$b(y) = \frac{1}{y!}$
$T(y) = y$
$η^T = η = lnλ$ -> $λ = exp(η)$
$a(η) = λ = exp(η)$

(b)

Consider performing regression using a GLM model with a Poisson response variable. What is the canonical response function for the family?

(A Poisson random variable with parameter λ has mean λ)

h_Θ(x) = E[y|x;Θ]

= g(η)

= E[T(y);η]

= λ

= e^η

= e^{Θ^Tx}

(c)

For a training set $\{(x^{(1)}, y^{(1)}), ... ,(x^{(i)}, y^{(i)})\}$ ,let the log-likelihood of an example be $log~p(y^{(i)}|x^{(i)})$ . By taking the derivative of the log-likelihood with respect to $Θ_j$ , derive the stochastic gradient ascent rule for learning using a GLM with Possion response y and the canonical response function.

l(Θ) = log(\frac{1}{y!} e^{(-λ)} λ^y)

= ylog(λ) - log(y!) - λ

= ylog(e^{Θ^Tx}) - log(y!) - e^{Θ^Tx}

= yΘ^Tx - log(y!) - e^{Θ^Tx}

\frac{\partial l} {\partial Θ_i} = yx_i - e^{Θ^Tx}x_i = ( y - e^{Θ^Tx} ) x_i

Stochastic Gradient Ascent rule:

for each $(x^{(j)},y^{(j)})$ in training examples {

Θ_i := Θ_i - α((e^{Θ^Tx^{(j)}}- y^{(j)} ) x_i^{(j)})

}

Surprisingly, it's the same as logistic regression !!!

(d)

Consider using GLM with a response veriable from any member of the exponential family in which $T(y)=y$ , and the canonical response function $h(x)$ for the family. Show that stochastic gradient ascent on the log-likelihood $log(Y|X;Θ)$ results in the update rule $Θ_i := Θ_i - α(h(x)-y)x_i$ .

l(Θ) = log~p(y|x;Θ)

= log~p(y;η)

= log~[b(y)~exp[η^T~T(y) - a(η)]]

= log~[b(y)~exp[ηy-a(η)]

= log(b(y)) + ηy - a(η)

\frac{\partial l}{\partial Θ_i} = yx^{(i)} - \frac{\partial a}{\partial η} x^{(i)}

= (y^{(i)} - \frac{\partial a}{\partial η} )x^{(i)}

Almost there, so what's $\frac{\partial a}{\partial η}$ ??

\int p(y;η)dy = 1

\frac{\partial}{\partial η} \int p(y;η)dy = 0

\int \frac{\partial}{\partial η} p(y;η)dy = 0

\int b(y)~exp[ηy-a(η)] (y-\frac{\partial}{\partial η})dy = 0

\int p(y;η)(y-\frac{\partial}{\partial η})dy = 0

\int p(y;η)ydy - \int p(y;η)\frac{\partial}{\partial η}dy = 0

y = \frac{\partial}{\partial η} = h(x)

So $\frac{\partial a}{\partial η}$ is exactly $h(x)$ !!

\frac{\partial l}{\partial Θ_i} = (y^{(i)} - \frac{\partial a}{\partial η}) x^{(i)} = (y^{(i)} - h(x^{(i)})) x^{(i)}

So the Stochastic Gradient Ascent rule:

for each $(x^{(j)},y^{(j)})$ in training examples {

Θ_i := Θ_i - α((h(x)- y^{(j)} ) x_i^{(j)})

}