We have seen that in general situation we cannot do too much, for example, it is impossible to guarantee convergence even to a local minimum. Thus we have to restrict our functions class. In chap. 2, the smooth (smooth enough) functions are considered. Also, the (strongly) convex functions are introduced. We can get lower bounds for these objectives, and propose optimal gradient based algorithms.
Convex Functions
In general, the gradient method converges only to a stationary point, because of the weakness of first-order optimality condition (i.e., it is only necessary condition). Along with some intuition, the following assumptions are required for the hypothetical class $\mathcal{F}$:
- For any $f \in \mathcal{F}$ , the first-order optimality condition is sufficient for a point to be a global solution
- If $f_1, f_2 \in \mathcal{F}$, then $\alpha f_1 + \beta f_2 \in \mathcal{F}, \forall \alpha, \beta \geq 0$
- Any linear function $f(x) = \alpha + \langle a, x\rangle $ belongs to $\mathcal{F}$.
The assumptions define the socalled convex functions:
Definition: A continuously differentiable fucntion $f(x)$ is called convex on $\mathbb{R}^n$, $f \in \mathcal{F}^1(\mathbb{R}^n)$, if for any $x, y$ we have
$$
f(y) \geq f(x) + \langle f’(x), y -x \rangle
$$
The most important property of smooth convex function is its stationary point is global minimum, if the function is strictly convex, that is the inequality in above definition is strictly statisfied when $x \neq y$, then the stationary point is the unique global minimum.
Lower complexity bounds for (strongly) convex functions
Smooth convex functions
For an iterative method $\mathcal{M}$ that generates a sequence of test points $\{x_k\}$, such that
$$
x_k \in x_0 + \text{Lin} \{ f’(x_0), \ldots, f’(x_{k-1})\}, \quad k \geq 1
$$,
we can prove the following theorem:
Theorem: For any $k$, $1 \leq k \leq \frac{1}{2}(n-1)$, and any $x_0 \in \mathcal{R}^n$, there exists a function $f \in \mathcal{F}_L^{\infty, 1}(\mathcal{R}^n)$ such that for method $\mathcal{M}$, we have
$$
\begin{align}
f(x_k) - f(x^*) \geq \frac{3L \|x_0 - x^* \|^2}{32(k+1)^2} \\
\|x_k - x^* \|^2 \geq \frac{1}{8} \| x_0 - x^* \|^2
\end{align}
$$
where $x^*$ is the solution of optimization problem.
Actually, the “worst function in the world” introduced in the in book is
$$
f_{p}(x) = \frac{L}{4} \left\{ \frac{1}{2} \left[ (x^1)^2 + \sum_{i=1}^{p-1} (x^i - x^{i+1})^2 + (x^p)^2 \right] - x^1\right\}
$$
with $p = 2k+1$.
Although this bound is valid only when number of iteartions is not too large $(k \leq \frac{1}{2}(n-1))$, it provides the potential performance of numberical methods on the initial stage of the minimization process. Note the lower bound for objective is rather optimistic (quadratic), while the convergence of the optimal point can be arbitrarily slow.