1. Gradients and Hessians
a.
由第一项,我们可以得到:
$$ f_1(x) = \frac{1}{2} x^T A x + b^T x $$因为:
$$ \nabla_x(x^T A x) = (A + A^T)x $$因此:
$$ \nabla_x\left( \frac{1}{2} x^T A x \right) = \frac{1}{2}(A + A^T)x $$因为 \(A\) 是对称矩阵(\(A^T = A\)),所以:
$$ \frac{1}{2}(A + A)x = Ax $$第二项:
$$ f_2(x) = b^T x = \sum_i b_i x_i $$梯度为:
$$ \nabla_x(b^T x) = b $$因此:
$$ \nabla f(x) = Ax + b $$b.
令 \(z = h(x)\),则:
$$ f(x) = g(z) = g(h(x)) $$对每个分量有:
$$ \frac{\partial f}{\partial x_i} = g'(h(x)) \frac{\partial h(x)}{\partial x_i} $$因此:
$$
\nabla f(x)
=
\begin{pmatrix}
g'(h(x)) \frac{\partial h}{\partial x_1} \\
g'(h(x)) \frac{\partial h}{\partial x_2} \\
\vdots \\
g'(h(x)) \frac{\partial h}{\partial x_n}
\end{pmatrix}
=
g'(h(x)) \nabla h(x)
$$
c.
由 a 得:
$$ (\nabla f(x))_i = \sum_{j=1}^n a_{ij} x_j + b_i $$Hessian 的第 \(i, j\) 项为:
$$
(\nabla^2 f(x))_{ij}
=
\frac{\partial}{\partial x_j}
\left(
\sum_{k=1}^n a_{ik} x_k + b_i
\right)
$$
利用:
$$ \frac{\partial}{\partial x_j}(a_{ik} x_k) = a_{ik}\delta_{kj} $$所以:
$$ (\nabla^2 f(x))_{ij} = \sum_{k=1}^n a_{ik}\delta_{kj} = a_{ij} $$因此:
$$ \nabla^2 f(x) = A $$