Ellipsoid Method and Its Amazing Oracles 🔮
When you have eliminated the impossible, whatever remains, however improbable, must be the truth.
Sir Arthur Conan Doyle, stated by Sherlock Holmes
📖 Introduction
Common Perspective of Ellipsoid Method
-
It is widely believed to be inefficient in practice for large-scale problems.
-
Convergent rate is slow, even when using deep cuts.
-
Cannot exploit sparsity.
-
-
It has since then supplanted by the interior-point methods.
-
Used only as a theoretical tool to prove polynomial-time solvability of some combinatorial optimization problems.
But...
-
The ellipsoid method works very differently compared with the interior point methods.
-
It only requires a separation oracle that provides a cutting plane.
-
Although the ellipsoid method cannot take advantage of the sparsity of the problem, the separation oracle is capable of take advantage of certain structural types.
Consider the ellipsoid method when...
-
The number of design variables is moderate, e.g. ECO flow, analog circuit sizing, parametric problems
-
The number of constraints is large, or even infinite
-
Oracle can be implemented effectively.
🥥 Cutting-plane Method Revisited
Convex Set
- Let be a convex set 🥚.
- Consider the feasibility problem:
- Find a point in ,
- or determine that is empty (i.e., there is no feasible solution)
🔮 Separation Oracle
- When a separation oracle is queried at , it either
- asserts that , or
- returns a separating hyperplane between and :
🔮 Separation Oracle (cont'd)
-
is called a cutting-plane, or cut, because it eliminates the half-space from our search.
-
If ( is on the boundary of halfspace that is cut), the cutting-plane is called neutral cut.
-
If ( lies in the interior of halfspace that is cut), the cutting-plane is called deep cut.
-
If ( lies in the exterior of halfspace that is cut), the cutting-plane is called shallow cut.
Subgradient
-
is usually given by a set of inequalities or for , where is a convex function.
-
A vector is called a subgradient of a convex function at if .
-
Hence, the cut is given by
Remarks:
- If is differentiable, we can simply take
Key components of Cutting-plane method
- A cutting plane oracle
- A search space initially large enough to cover , e.g.
- Polyhedron =
- Interval = (for one-dimensional problem)
- Ellipsoid =
Outline of Cutting-plane method
- Given initial known to contain .
- Repeat
- Choose a point in
- Query the cutting-plane oracle at
- If , quit
- Otherwise, update to a smaller set that covers:
- If or it is small enough, quit.
Corresponding Python code
def cutting_plane_feas(omega, space, options=Options()):
for niter in range(options.max_iters):
cut = omega.assess_feas(space.xc()) # query the oracle
if cut is None: # feasible sol'n obtained
return space.xc(), niter
status = space.update_deep_cut(cut) # update space
if status != CutStatus.Success or space.tsq() < options.tol:
return None, niter
return None, options.max_iters
From Feasibility to Optimization
-
The optimization problem is treated as a feasibility problem with an additional constraint .
-
could be a convex or a quasiconvex function.
-
is also called the best-so-far value of .
Convex Optimization Problem
-
Consider the following general form:
where is the -sublevel set of .
-
👉 Note: if and only if (monotonicity)
-
One easy way to solve the optimization problem is to apply the binary search on .
Shrinking
-
Another possible way is, to update the best-so-far whenever a feasible solution is found by solving the equation:
-
If the equation is difficuit to solve but is also convex w.r.t. , then we may create a new varaible, say and let .
Outline of Cutting-plane method (Optim)
- Given initial known to contain .
- Repeat
- Choose a point in
- Query the separation oracle at
- If , update such that .
- Update to a smaller set that covers:
- If or it is small enough, quit.
Corresponding Python code
def cutting_plane_optim(omega, S, gamma, options=Options()):
x_best = None
for niter in range(options.max_iters):
cut, gamma1 = omega.assess_optim(space.xc(), gamma)
if gamma1 is not None: # better \gamma obtained
gamma = gamma1
x_best = copy.copy(space.xc())
status = space.update_central_cut(cut)
else:
status = space.update_deep_cut(cut)
if status != CutStatus.Success or space.tsq() < options.tol:
return x_best, target, niter
return x_best, gamma, options.max_iters
Example - Profit Maximization Problem
This example is taken from [@Aliabadi2013Robust].
- : Cobb-Douglas production function
- : the market price per unit
- : the scale of production
- : the output elasticities
- : input quantity
- : output price
- : a given constant that restricts the quantity of
Example - Profit maximization (cont'd)
- The formulation is not in the convex form.
- Rewrite the problem in the following form:
Profit maximization in Convex Form
-
By taking the logarithm of each variable:
- , .
-
We have the problem in a convex form:
Corresponding Python code
class ProfitOracle:
def __init__(self, params, elasticities, price_out):
unit_price, scale, limit = params
self.log_pA = math.log(unit_price * scale)
self.log_k = math.log(limit)
self.price_out = price_out
self.el = elasticities
def assess_optim(self, y, gamma):
if (fj := y[0] - self.log_k) > 0.0: # constraint
return (np.array([1.0, 0.0]), fj), None
log_Cobb = self.log_pA + self.el.dot(y)
q = self.price_out * np.exp(y)
qsum = q[0] + q[1]
if (fj := math.log(gamma + qsum) - log_Cobb) > 0.0:
return (q / (gamma + qsum) - self.el, fj), None
Cobb = np.exp(log_Cobb) # shrinking
return (q / Cobb - self.el, 0.0), Cobb - qsum
Main program
import numpy as np
from ellalgo.cutting_plane import cutting_plane_optim
from ellalgo.ell import Ell
from ellalgo.oracles.profit_oracle import ProfitOracle
p, A, k = 20.0, 40.0, 30.5
params = p, A, k
alpha, beta = 0.1, 0.4
v1, v2 = 10.0, 35.0
el = np.array([alpha, beta])
v = np.array([v1, v2])
r = np.array([100.0, 100.0]) # initial ellipsoid (sphere)
ellip = Ell(r, np.array([0.0, 0.0]))
omega = ProfitOracle(params, el, v)
xbest, \gamma, num_iters = cutting_plane_optim(omega, ellip, 0.0)
Area of Applications
- Robust convex optimization
- oracle technique: affine arithmetic
- Semidefinite programming
- oracle technique: Cholesky or factorization
- Parametric network potential problem
- oracle technique: negative cycle detection
Robust Convex Optimization
Robust Optimization Formulation
-
Consider:
where represents a set of varying parameters.
-
The problem can be reformulated as:
Example - Profit Maximization Problem (convex)
- Now assume that:
- and vary and respectively.
- , , , and all vary .
Example - Profit Maximization Problem (oracle)
By detail analysis, the worst case happens when:
- ,
- , ,
- if , , else
- if , , else
Corresponding Python code
class ProfitRbOracle(OracleOptim):
def __init__(self, params, elasticities, price_out, vparams):
e1, e2, e3, e4, e5 = vparams
self.elasticities = elasticities
self.e = [e1, e2]
unit_price, scale, limit = params
params_rb = unit_price - e3, scale, limit - e4
self.omega = ProfitOracle(params_rb, elasticities,
price_out + np.array([e5, e5]))
def assess_optim(self, y, gamma):
el_rb = copy.copy(self.elasticities)
for i in [0, 1]:
el_rb[i] += -self.e[i] if y[i] > 0.0 else self.e[i]
self.omega.el = el_rb
return self.omega.assess_optim(y, gamma)
🔮 Oracle in Robust Optimization Formulation
- The oracle only needs to determine:
- If for some and ,
then
- the cut =
- If for some
, then
- the cut =
- Otherwise, is feasible, then
- Let .
- .
- The cut =
- If for some and ,
then
Remark: for more complicated problems, affine arithmetic could be used [@liu2007robust].
Matrix Inequalities
Problems With Matrix Inequalities
Consider the following problem:
- : a matrix-valued function
- denotes is positive semidefinite.
Problems With Matrix Inequalities
- Recall that a matrix is positive semidefinite if and only if for all .
- The problem can be transformed into:
- Consider is concave for all w. r. t. , then the above problem is a convex programming.
- Reduce to semidefinite programming if is linear w.r.t. , i.e.,
LDLT factorization
-
The LDLT factorization of a symmetric positive definite matrix is the factorization , where is lower triangular with unit diagonal elements and is a diagonal matrix.
-
For example,
Naïve implementation
- Then, start with , the basic algorithm of LDLT factorization is:
- Invoke FLOP's, where is the place the algorithm stops.
Storage representation
First, we pack the solution and the intermediate storage on a single matrix such that:
- For example,
Improved implementation
- Then, start with , the improved implementation of LDLT factorization is:
- Invoke FLOP's (same as Cholesky factorization's), where is the place the algorithm stops.
Witness of indefiniteness
-
In the case of failure, a vector can be constructed to certify that .
-
Let denote the partial sub-matrix where is the row of failure.
-
Then , where
-
Start with , the basic algorithm is:
🔮 Oracle in Matrix Inequalities
The oracle only needs to:
- Perform a row-based LDLT factorization such that .
- Let denotes a submatrix .
- If the process fails at row ,
- there exists a vector
, such
that
- , and
- .
- The cut =
- there exists a vector
, such
that
Lazy evaluation
-
Don't construct the full matrix at each iteration!
-
Only O() per iteration, independent of !
class LMIOracle:
def __init__(self, F, B):
self.F = F
self.F0 = B
self.Q = LDLTMgr(len(B))
def assess_feas(self, x: Arr) -> Optional[Cut]:
def get_elem(i, j):
return self.F0[i, j] - sum(
Fk[i, j] * xk for Fk, xk in zip(self.F, x))
if self.Q.factor(get_elem):
return None
ep = self.Q.witness()
g = np.array([self.Q.sym_quad(Fk) for Fk in self.F])
return g, ep
Google Benchmark 📊 Comparison
2: ----------------------------------------------------------
2: Benchmark Time CPU Iterations
2: ----------------------------------------------------------
2: BM_LMI_Lazy 131235 ns 131245 ns 4447
2: BM_LMI_old 196694 ns 196708 ns 3548
2/4 Test #2: Bench_BM_lmi ..................... Passed 2.57 sec
Example - Matrix Norm Minimization
- Let
- Problem can be reformulated as
- Binary search on can be used for this problem.
Example - Estimation of Correlation Function
-
Let , where
- 's are the unknown coefficients to be fitted
- 's are a family of basis functions.
-
The covariance matrix can be recast as:
where
🧪 Experimental Result
: Data Sample (kern=0.5)
: Least Square Result
🧪 Experimental Result II
: Data Sample (kern=1.0)
🧪 Experimental Result III
: Data Sample (kern=2.0)
: Least Square Result
Multi-parameter Network Problem
Parametric Network Problem
Given a network represented by a directed graph .
Consider:
-
is the concave function of edge ,
-
Assume: network is large, but the number of parameters is small.
Network Potential Problem (cont'd)
Given , the problem has a feasible solution if and only if contains no negative cycle. Let be a set of all cycles of .
-
is a cycle of
-
.
Negative Cycle Finding
There are lots of methods to detect negative cycles in a weighted graph [@cherkassky1999negative], in which Tarjan's algorithm [@Tarjan1981negcycle] is one of the fastest algorithms in practice [@alg:dasdan_mcr; @cherkassky1999negative].
🔮 Oracle in Network Potential Problem
- The oracle only needs to determine:
- If there exists a negative cycle under , then
- the cut =
- Otherwise, the shortest path solution gives the value of .
- If there exists a negative cycle under , then
🐍 Python Code
class NetworkOracle:
def __init__(self, G, u, h):
self._G = G
self._u = u
self._h = h
self._S = NegCycleFinder(G)
def update(self, gamma):
self._h.update(gamma)
def assess_feas(self, x) -> Optional[Cut]:
def get_weight(e):
return self._h.eval(e, x)
for Ci in self._S.find_neg_cycle(self._u, get_weight):
f = -sum(self._h.eval(e, x) for e in Ci)
g = -sum(self._h.grad(e, x) for e in Ci)
return g, f # use the first Ci only
return None
Example - Optimal Matrix Scaling [@orlin1985computing]
-
Given a sparse matrix .
-
Find another matrix where is a nonnegative diagonal matrix, such that the ratio of any two elements of in absolute value is as close to 1 as possible.
-
Let . Under the min-max-ratio criterion, the problem can be formulated as:
Optimal Matrix Scaling (cont'd)
By taking the logarithms of variables, the above problem can be transformed into:
where denotes and .
class OptScalingOracle:
class Ratio:
def __init__(self, G, get_cost):
self._G = G
self._get_cost = get_cost
def eval(self, e, x: Arr) -> float:
u, v = e
cost = self._get_cost(e)
return x[0] - cost if u < v else cost - x[1]
def grad(self, e, x: Arr) -> Arr:
u, v = e
return np.array([1.0, 0.0] if u < v else [0.0, -1.0])
def __init__(self, G, u, get_cost):
self._network = NetworkOracle(G, u, self.Ratio(G, get_cost))
def assess_optim(self, x: Arr, gamma: float):
s = x[0] - x[1]
g = np.array([1.0, -1.0])
if (fj := s - gamma) >= 0.0:
return (g, fj), None
if (cut := self._network.assess_feas(x)):
return cut, None
return (g, 0.0), s
Example - clock period & yield-driven co-optimization
- 👉 Note that is not concave in general in .
- Fortunately, we are most likely interested in optimizing circuits for high yield rather than the low one in practice.
- Therefore, by imposing an additional constraint to , say , the problem becomes convex.
Example - clock period & yield-driven co-optimization
The problem can be reformulated as:
🫒 Ellipsoid Method Revisited
📝 Abstract
This lecture provides a brief history of the ellipsoid method. Then it discusses implementation issues of the ellipsoid method, such as utilizing parallel cuts to update the search space and enhance computation time. In some instances, parallel cuts can drastically reduce computation time, as observed in FIR filter design. Discrete optimization is also investigated, illustrating how the ellipsoid method can be applied to problems that involve discrete design variables. An oracle implementation is required solely for locating the nearest discrete solutions
Some History of Ellipsoid Method [@BGT81]
-
Introduced by Shor and Yudin and Nemirovskii in 1976
-
Used to show that linear programming (LP) is polynomial-time solvable (Kachiyan 1979), settled the long-standing problem of determining the theoretical complexity of LP.
-
In practice, however, the simplex method runs much faster than the method, although its worst-case complexity is exponential.
Basic Ellipsoid Method
- An ellipsoid is specified as a set where is the center of the ellipsoid.
Updating the ellipsoid (deep-cut)
Calculation of minimum volume ellipsoid covering:
Updating the ellipsoid (deep-cut)
-
Let , .
-
If (shallow cut), no smaller ellipsoid can be found.
-
If , intersection is empty.
Otherwise,
where
Updating the ellipsoid (cont'd)
-
Even better, split into two variables
-
Let , , .
-
Reduce multiplications per iteration.
-
👉 Note:
- The determinant of decreases monotonically.
- The range of is .
Central Cut
-
A Special case of deep cut when
-
Deserve a separate implement because it is much simplier.
-
Let , ,
Central Cut
Calculation of minimum volume ellipsoid covering:
🪜 Parallel Cuts
🪜 Parallel Cuts
-
Oracle returns a pair of cuts instead of just one.
-
The pair of cuts is given by and such that:
for all .
-
Only linear inequality constraint can produce such parallel cut:
-
Usually provide faster convergence.
🪜 Parallel Cuts
Calculation of minimum volume ellipsoid covering:
Updating the ellipsoid (old)
-
Let , .
-
If , intersection is empty.
-
If , no smaller ellipsoid can be found.
-
If , it reduces to deep-cut with
-
Otherwise, where
\begin{array}{lll} \zeta_0 &=& \tau^2 - \beta_0^2 \ \zeta_1 &=& \tau^2 - \beta_1^2 \ \xi &=& \sqrt{4\zeta_0\zeta_1 + n^2(\beta_1^2 - \beta_0^2)^2}, \ \sigma &=& (n + (2\tau^2 + 2\beta_0\beta_1 - \xi){\color{red}(\beta_0 + \beta_1)^2} ) / (n + 1), \ \rho &=& \sigma(\beta_0 + \beta_1) / 2, \ \delta &=& (n^2/2(n^2-1)) (\zeta_0 + \zeta_1 + \xi/n) / \tau^2 . \end{array}
Updating the ellipsoid (new)
- Let , .
- If , intersection is empty.
- If , no smaller ellipsoid can be found.
- If , it reduces to deep-cut with
- Otherwise, where
Parallel Central Cuts
Calculation of minimum volume ellipsoid covering:
Updating the ellipsoid
- Let , .
- If , it reduces to central-cut
- Otherwise, where
Example - FIR filter design
- The time response is:
Example - FIR filter design (cont'd)
-
The frequency response:
-
The magnitude constraints on frequency domain are expressed as
where and are the lower and upper (nonnegative) bounds at frequency respectively.
-
The constraint is non-convex in general.
Example - FIR filter design (II)
-
However, via spectral factorization [@goodman1997spectral], it can transform into a convex one [@wu1999fir]:
where
- are the autocorrelation coefficients.
Example - FIR filter design (III)
-
can be determined by :
where for or .
-
The whole problem can be formulated as:
#🧪 Experiment
📊 Google Benchmark Result
3: ------------------------------------------------------------------
3: Benchmark Time CPU Iterations
3: ------------------------------------------------------------------
3: BM_Lowpass_single_cut 627743505 ns 621639313 ns 1
3: BM_Lowpass_parallel_cut 30497546 ns 30469134 ns 24
3/4 Test #3: Bench_BM_lowpass ................. Passed 1.72 sec
Discrete Optimization
Why Discrete Convex Programming
-
Many engineering problems can be formulated as a convex/geometric programming, e.g. digital circuit sizing
-
Yet in an ASIC design, often there is only a limited set of choices from the cell library. In other words, some design variables are discrete.
-
The discrete version can be formulated as a Mixed-Integer Convex programming (MICP) by mapping the design variables to integers.
What's Wrong w/ Existing Methods?
-
Mostly based on relaxation.
-
Then use the relaxed solution as a lower bound and use the branch--and--bound method for the discrete optimal solution.
- 👉 Note: the branch-and-bound method does not utilize the convexity of the problem.
-
What if I can only evaluate constraints on discrete data? Workaround: convex fitting?
Mixed-Integer Convex Programming
Consider:
where
- and are "convex"
- Some design variables are discrete.
🔮 Oracle Requirement
-
The oracle looks for the nearby discrete solution of with the cutting-plane:
-
👉 Note: the cut may be a shallow cut.
-
Suggestion: use different cuts as possible for each iteration (e.g. round-robin the evaluation of constraints)