Why Are Convex Optimisation Problems Considered to Be Easy to Solve

Convex Problems

Mathematical programming problems
Convex optimization problems
Optimality
Equivalent problems
Complexity

Mathematical programming problems

Definition

A mathematical programming problem is one of the form
$p^ast := min_x : f_0(x) mbox{ subject to } f_i(x) le 0, ;; i=1,ldots, m.$

$x in mathbf{R}^n$ is the decision variable;
$f_0 : mathbf{R}^n rightarrow mathbf{R}$ is the objective function;
$f_i : mathbf{R}^n rightarrow mathbf{R}$ , represent the constraints;
is the optimal value.

The term "programming" (or "program") does not refer to a computer code. It is used mainly for historical purposes. A more rigorous (but less popular) term is "optimization problem". The term "subject to" is often replaced by a colon.

By convention, the quantity p^ast is set to +infty if the problem is not feasible. The optimal value can also assume the value -infty , in which case we say that the problem is unbounded below. An example of a problem that is unbounded below is an unconstrained problem with f_0(x) = -log x , with domain $mathbf{R}_{++}$ .

The set (

{cal X} = left{ x in mathbf{R}^n : f_i(x) le 0, ;; i=1,ldots, m right} ) is called the feasible set. Any x in {cal X} is called feasible (with respect to the specific optimization problem at hand).

Alternate representations

Sometimes, the model is described in terms of the feasible set, as follows:
$min_{x in {cal X}} : f_0(x) .$
The representation is not strictly equivalent to the more explicit form (ref{eq:math-program-CVXPBS}), since the set {cal X} may have several equivalent representations in terms of explicit constraints. (Think for example about a polytope described by its vertices or as the intersection of half-spaces.)

Also, sometimes a maximization problem is considered:
$p^ast = max_{x in {cal X}} : f_0(x).$
Of course, we can always change the sign of f_0 and transform the maximization problem into a minimization one. For maximization problem, the optimal value p^star is set by convention to -infty if the problem is not feasible.

Feasibility problems

In some instances, we do not care about any objective function, and simply seek a feasible point. This so-called feasibility problem can be formulated in the standard form, using a zero (or constant) objective.

Convex optimization problems

Standard form

The problem
$displaystylemin_x : f_0(x) ~:~ begin{array}[t]{l} f_i(x) le 0,quad i=1,cdots, m, Ax = b, quad i=1,cdots,p, end{array}$
is called a convex optimization problem if the objective function f_0 is convex; the functions defining the inequality constraints f_i , i=1,ldots,m are convex; and $A in mathbf{R}^{p times n}$ , $b in mathbf{R}^p$ define the affine equality constraints. Note that, in the convex optimization model, we do not tolerate equality constraints unless they are affine.

Examples:

Least-squares problem:

where $A in mathbf{R}^{m times n}$ , $b in mathbf{R}^m$ , and denotes the Euclidean norm. This problem arises in many situations, for example in statistical estimation problems such as linear regression. The problem dates back many years, at least to Gauss (1777-1855), who solved it to predict the trajectory of the planetoid Ceres.

Linear programming problem:

where $c in mathbf{R}^n$ , $a_i in mathbf{R}^n$ , $b_i in mathbf{R}$ (). This corresponds to the case where the functions () in (ref{eq:convex-pb-L4}) are all affine (that is, linear plus a constant term). This problem was introduced by Dantzig in the 40's in the context of logistical problems arising in military operations. This model of computation is perhaps the most widely used optimization problem today.

Quadratic programming problem:
${eq:quad-prob-L1} min : x^TQx + c^Tx ~:~ a_i^Tx le b_i , ;; i=1,ldots, m,$
where is symmetric and positive semi-definite (all of its eigenvalues are non-negative). This model can be thought of as a generalization of both the least-squares and linear programming problems. QP's are popular in many areas, such as finance, where the linear term in the objective refers to the expected negative return on an investment, and the squared term corresponds to the risk (or variance of the return). This model was introduced by Markowitz (who was a student of Dantzig) in the 50's, to model investment problemsfootnote{Markowitz won the Nobel prize in Economics in 1990, mainly for this work.}.

Optimality

Local and global optima

A feasible point $x^ast in {cal X}$ is a globally optimal (optimal for short) if f_0(x) = p^ast .

A feasible point $x^ast in {cal X}$ is a locally optimal if there exist R>0 such that f(x^ast) equals the optimal value of problem (ref{eq:convex-pb-L4}) with the added constraint |x-x^ast| le R . That is, x^ast solves the problem ''locally''.

For convex problems, any locally optimal point is globally optimal.

Indeed, let x^ast be a local minimizer of f_0 on the set {cal X} , and let y in {cal X} . By definition, $x^ast in {bf dom} f_0$ . We need to prove that f_0(y) ge f_0(x^ast) = p^ast . There is nothing to prove if f_0(y) = +infty , so let us assume that $y in {bf dom} f_0$ . By convexity of f_0 and {cal X} , we have $x_theta := theta y + (1-theta) x^ast in {cal X}$ , and:
f_0(x_theta) - f_0(x^ast) le theta (f_0(y) - f_0(x^ast)).
Since x^ast is a local minimizer, the left-hand side in this inequality is nonnegative for all small enough values of theta > 0 . We conclude that the right hand side is nonnegative, i.e., f_0(y) ge f_0(x^ast) , as claimed.

Optimal set

The optimal set, ${cal X}^{rm opt}$ , is the set of optimal points. This set may be empty: for example, the feasible set may be empty. Another example is when the optimal value is only reached in the limit; think for example of the case when n=1 , f_0(x) = exp x , and there are no constraints.

In any case, the optimal set is convex, since it can be written (

{cal X}^{rm opt} = left{ x in mathbf{R}^n : f_0(x) le p^ast, ;; x in {cal X} right} . )

Optimality condition

When f_0 is differentiable, then we know that for every $x,y in {bf dom} f_0$ ,
f_0(y) ge f_0(x) + nabla f_0(x) ^T (y-x).
Then is optimal if and only if
$forall : y in {cal X} ~:~ nabla f_0(x) ^T (y-x) ge 0 .$
If nabla f_0(x) ne 0 , then it defines a supporting hyperplane to the feasible set at .

Examples:

For unconstrained problems, the optimality condition reduces to .
For problems with equality constraints only, the condition is that there exists a vector $nu in mathbf{R}^p$ such that
$x in {bf dom} f_0, ;; Ax= b , ;; nabla f_0(x) = A^Tnu .$
Indeed, the optimality condition can be written as: for every , which is the same as for every . In turn, the latter means that belongs to ${cal R}(A^T)$ , as claimed. The optimality conditions given above might be hard to solve. We will return to this issue later.

Equivalent problems

We can transform a convex problem into an equivalent one via a number of transformations. Sometimes the transformation is useful to obtain an explicit solution, or is done for algorithmic purposes. The transformation does not necessarily preserve the convexity properties of the problem. Here is a list of transformations that do preserve convexity.

Epigraph form

Sometimes it is convenient to work with the equivalent epigraph form:
$min_{(x,t)} : t ~:~ t ge f_0(x), ;; x in {cal X},$
in which we observe that we can always assume the cost function to be differentiable (in fact, linear), at the cost of adding one scalar variable.

Implicit constraints

Even though some problems appear to be unconstrained, they might contain implicit constraints. Precisely, the problem above has an implicit constraint x in {cal D} , where {cal D} is the problem's domain (

{cal D} := {bf dom} f_0 bigcap_{i=1}^m {bf dom} f_i . ) For example, the problem
$min_x : c^Tx - sum_{i=1}^m log (b_i-a_i^Tx),$
where $c in mathbf{R}^n$ , $b in mathbf{R}^m$ and a_i^T 's are the rows of $A in mathbf{R}^{m times n}$ , arises as an important sub-problem in some linear optimization algorithms. This problem has the implicit constraint that should belong to the interior of the polyhedron {cal P} = { x ::: Ax le b } .

Making explicit constraints implicit

The problem in standard form can be also written in a form that makes the constraints that are explicit in the original problem, implicit. Indeed, an equivalent formulation is the unconstrained convex problem
min_x : f(x)
where is the sum of the original objective and the indicator function of the feasible set {cal X} :
$f(x) = f_0(x) + mathbf{1}_{cal X}(x) = left{ begin{array}{ll} f_0(x) & x in {cal X} +infty &x notin {cal X}. end{array}right.$
In the unconstrained above, the constraints are implicit. One of the main differences with the original, constrained problem is that now the objective function may not be differentiable, even if all the functions f_i 's are.

A less radical approach involves the convex problem wih one inequality constraint
$min_x : f_0(x) ~:~ Ax= b, ;; g(x) := max_{1 le i le m} : f_i(x) le 0,$
which is equivalent to the original problem. In the above formulation, the structure of the inequality constraint is made implicit. Here, the reduction to a single constraint has a cost, since the function may not be differentiable, even though all the f_i 's are.

The above transformations show the versatility of the convex optimization model. They are also useful in the analysis of such problems.

Equality constraint elimination

We can eliminate the equality constraint Ax = b , by writing them as x = x_0 + Nz , with x_0 a particular solution to the equality constraint, and the columns of span the nullspace of . Then we can rewrite the problem as one without equality constraints:
min_z : f_0(Nz+x_0) ~:~ f_i(Nz+x_0) le 0, ;; i=1,ldots,m.
This transformation preserves convexity of the the function involved. In practice, it may not be a good idea to perform this elimination. For example, if is sparse, the original problem has a sparse structure that may be exploited by some algorithms. In contrast, the reduced problem above does not inherit the sparsity characteristics, since in general the matrix is dense.

Introducing equality constraints

We can also introduce equality constraints in the problem. There might be several justifications for doing so: to reduce a given problem to a standard form used by off-the-shelf algorithms, or to use in decomposition methods for large-scale optimization.

The following example shows that introducing equality constraint may allow to exploit sparsity patterns inherent to the problem. Consider
$min_{(x_k)_{k=1}^K,y} : sum_{k=1}^K f_{0,k}(x_k)~:~ f_k(x_k,y) le 0, ;; k=1,ldots,K.$
In the above the objective involves different optimization variables, which are coupled via the presence of the ''coupling'' variable in each constraint. We can introduce variables and rewrite the problem as
$min_{(x_k)_{k=1}^K,(y_k)_{k=1}^K,y} : sum_{k=1}^K f_{0,k}(x_k)~:~ f_k(x_k,y_k) le 0, ;; y = y_k, ;; k=1,ldots,K.$
Now the objective and inequality constraints are all independent (they involve different optimization variables). The only coupling constraint is now an equality constraint. This can be exploited in parallel algorithms for large-scale optimization.

Slack variables

Sometimes it is useful to introduce slack variables. For example, the problem with affine inequalities
min_x : f_0(x) ~:~ Ax le b
can be written
$min_{x,s} : f_0(x) ~:~ Ax + s = b, ;; sge 0 .$

Minimizing over some variables

We may ''eliminate'' some variables of the problem and reduce it to one with fewer variables. This operation preserves convexity. Specifically, if is a convex function of the variable $x in mathbf{R}^n$ , and is partitioned as x=(x_1,x_2) , with $x_i in mathbf{R}^{n_i}$ , i=1,2 , n=n_1+n_2 , then the function
$tilde{f}(x_1) := min_{x=(x_1,x_2)} : f(x_1,x_2)$
is convex (look at the epigraph of this function). Hence the problem of minimizing can be reduced to one involving x_1 only:
$min_{x_1} : tilde{f}(x_1).$
The reduction may not be easy to carry out explicitly.

Here is an example where it is: consider the problem
$min_{x=(x_1,x_2)} : x^TQx ~:~ A x_1 le b$
where
$Q := left( begin{array}{cc} Q_{11} & Q_{12} Q_{12}^T & Q_{22} end{array} right)$
is positive semi-definite, $Q_{22}$ is positive-definite, and $Q_{ij} in mathbf{R}^{n_i times n_j}$ , i,j=1,2 . Since Q succ 0 , the above problem is convex. Furthermore, since the problem has no constraints on x_2 , it is possible to solve for the minimization with respect to x_2 analytically. We end up with
$min_{x_1} : x_1^Ttilde{Q}x_1 ~:~ Ax_1 le b$
with $tilde{Q} := Q_{11} - Q_{12} Q_{22}^{-1} Q_{12}^T$ . From the reasoning above, we infer that is positive semi-definite, since the objective function of the reduced problem is convex.

Complexity

Complexity of an optimization problem refers to the difficulty of solving the problem on a computer. Roughly speaking, complexity estimates provide the number of flops needed to solve the problem to a given accuracy of the objective epsilon . That problem is usually expressed in terms of the problem size, as well as the accuracy epsilon .

The complexity usually refers to a specific method to solve the problem. For example, we may say that solving a dense linear optimization problem to accuracy epsilon with O(n) variables and constraints using an ''interior-point'' methodfootnote{This term refers to a class of methods that are provably efficient for a large class of convex optimization problems.} grows as O(n^3 log(1/epsilon)) .

The complexity of an optimization problem also depends on its structure. Two seemingly similar problem may require a widely different computational effort to solve. Some problems are "NP-hard", which roughly means that they cannot be solved in reasonable time on a computer. As an example, the quadratic programming problem (ref{eq:quad-prob-L1}) seen above is "easy" to solve, however the apparently similar problem obtained by allowing to have negative eigenvalues is NP-hard.

In the early days of optimization, it was thought that linearity was what distinguished a hard problem from an easy one. Today, it appears that convexity is the relevant notion. Roughly speaking, a convex problem is easy. In the next section, we will refine this statement.

pringleonfiess.blogspot.com

Source: https://inst.eecs.berkeley.edu/~ee227a/fa10/login/l_cvx_pbs.html