Featured image of post CS229作业0

CS229作业0

cs229 homework0

1. Gradients and Hessians

a.

由第一项,我们可以得到:

$$ f_1(x) = \frac{1}{2} x^T A x + b^T x $$

因为:

$$ \nabla_x(x^T A x) = (A + A^T)x $$

因此:

$$ \nabla_x\left( \frac{1}{2} x^T A x \right) = \frac{1}{2}(A + A^T)x $$

因为 \(A\) 是对称矩阵(\(A^T = A\)),所以:

$$ \frac{1}{2}(A + A)x = Ax $$

第二项:

$$ f_2(x) = b^T x = \sum_i b_i x_i $$

梯度为:

$$ \nabla_x(b^T x) = b $$

因此:

$$ \nabla f(x) = Ax + b $$

b.

令 \(z = h(x)\),则:

$$ f(x) = g(z) = g(h(x)) $$

对每个分量有:

$$ \frac{\partial f}{\partial x_i} = g'(h(x)) \frac{\partial h(x)}{\partial x_i} $$

因此:

$$ \nabla f(x) = \begin{pmatrix} g'(h(x)) \frac{\partial h}{\partial x_1} \\ g'(h(x)) \frac{\partial h}{\partial x_2} \\ \vdots \\ g'(h(x)) \frac{\partial h}{\partial x_n} \end{pmatrix} = g'(h(x)) \nabla h(x) $$

c.

由 a 得:

$$ (\nabla f(x))_i = \sum_{j=1}^n a_{ij} x_j + b_i $$

Hessian 的第 \(i, j\) 项为:

$$ (\nabla^2 f(x))_{ij} = \frac{\partial}{\partial x_j} \left( \sum_{k=1}^n a_{ik} x_k + b_i \right) $$

利用:

$$ \frac{\partial}{\partial x_j}(a_{ik} x_k) = a_{ik}\delta_{kj} $$

所以:

$$ (\nabla^2 f(x))_{ij} = \sum_{k=1}^n a_{ik}\delta_{kj} = a_{ij} $$

因此:

$$ \nabla^2 f(x) = A $$
使用 Hugo 构建
主题 StackJimmy 设计