Ellipsoid Method and Its Amazing Oracles 🔮

When you have eliminated the impossible, whatever remains, however improbable, must be the truth.

Sir Arthur Conan Doyle, stated by Sherlock Holmes

📖 Introduction

Common Perspective of Ellipsoid Method

  • It is widely believed to be inefficient in practice for large-scale problems.

    • Convergent rate is slow, even when using deep cuts.

    • Cannot exploit sparsity.

  • It has since then supplanted by the interior-point methods.

  • Used only as a theoretical tool to prove polynomial-time solvability of some combinatorial optimization problems.

But...

  • The ellipsoid method works very differently compared with the interior point methods.

  • It only requires a separation oracle that provides a cutting plane.

  • Although the ellipsoid method cannot take advantage of the sparsity of the problem, the separation oracle is capable of take advantage of certain structural types.

Consider the ellipsoid method when...

  • The number of design variables is moderate, e.g. ECO flow, analog circuit sizing, parametric problems

  • The number of constraints is large, or even infinite

  • Oracle can be implemented effectively.

🥥 Cutting-plane Method Revisited

Convex Set

  • Let be a convex set 🥚.
  • Consider the feasibility problem:
    • Find a point in ,
    • or determine that is empty (i.e., there is no feasible solution)

image

🔮 Separation Oracle

  • When a separation oracle is queried at , it either
  • asserts that , or
  • returns a separating hyperplane between and :

image

🔮 Separation Oracle (cont'd)

  • is called a cutting-plane, or cut, because it eliminates the half-space from our search.

  • If ( is on the boundary of halfspace that is cut), the cutting-plane is called neutral cut.

  • If ( lies in the interior of halfspace that is cut), the cutting-plane is called deep cut.

  • If ( lies in the exterior of halfspace that is cut), the cutting-plane is called shallow cut.

Subgradient

  • is usually given by a set of inequalities or for , where is a convex function.

  • A vector is called a subgradient of a convex function at if .

  • Hence, the cut is given by

Remarks:

  • If is differentiable, we can simply take

Key components of Cutting-plane method

  • A cutting plane oracle
  • A search space initially large enough to cover , e.g.
    • Polyhedron =
    • Interval = (for one-dimensional problem)
    • Ellipsoid =

Outline of Cutting-plane method

  • Given initial known to contain .
  • Repeat
    • Choose a point in
    • Query the cutting-plane oracle at
    • If , quit
    • Otherwise, update to a smaller set that covers:
    • If or it is small enough, quit.

Corresponding Python code

def cutting_plane_feas(omega, space, options=Options()):
    for niter in range(options.max_iters):
        cut = omega.assess_feas(space.xc())  # query the oracle
        if cut is None:  # feasible sol'n obtained
            return space.xc(), niter
        status = space.update_deep_cut(cut)  # update space
        if status != CutStatus.Success or space.tsq() < options.tol:
            return None, niter
    return None, options.max_iters

From Feasibility to Optimization

  • The optimization problem is treated as a feasibility problem with an additional constraint .

  • could be a convex or a quasiconvex function.

  • is also called the best-so-far value of .

Convex Optimization Problem

  • Consider the following general form:

    where is the -sublevel set of .

  • 👉 Note: if and only if (monotonicity)

  • One easy way to solve the optimization problem is to apply the binary search on .

Shrinking

  • Another possible way is, to update the best-so-far whenever a feasible solution is found by solving the equation:

  • If the equation is difficuit to solve but is also convex w.r.t. , then we may create a new varaible, say and let .

Outline of Cutting-plane method (Optim)

  • Given initial known to contain .
  • Repeat
    • Choose a point in
    • Query the separation oracle at
    • If , update such that .
    • Update to a smaller set that covers:
    • If or it is small enough, quit.

Corresponding Python code

def cutting_plane_optim(omega, S, gamma, options=Options()):
    x_best = None
    for niter in range(options.max_iters):
        cut, gamma1 = omega.assess_optim(space.xc(), gamma)
        if gamma1 is not None:  # better \gamma obtained
            gamma = gamma1
            x_best = copy.copy(space.xc())
            status = space.update_central_cut(cut)
        else:
            status = space.update_deep_cut(cut)
        if status != CutStatus.Success or space.tsq() < options.tol:
            return x_best, target, niter
    return x_best, gamma, options.max_iters

Example - Profit Maximization Problem

This example is taken from [@Aliabadi2013Robust].

  • : Cobb-Douglas production function
  • : the market price per unit
  • : the scale of production
  • : the output elasticities
  • : input quantity
  • : output price
  • : a given constant that restricts the quantity of

Example - Profit maximization (cont'd)

  • The formulation is not in the convex form.
  • Rewrite the problem in the following form:

Profit maximization in Convex Form

  • By taking the logarithm of each variable:

    • , .
  • We have the problem in a convex form:

Corresponding Python code

class ProfitOracle:
    def __init__(self, params, elasticities, price_out):
        unit_price, scale, limit = params
        self.log_pA = math.log(unit_price * scale)
        self.log_k = math.log(limit)
        self.price_out = price_out
        self.el = elasticities

    def assess_optim(self, y, gamma):
        if (fj := y[0] - self.log_k) > 0.0:  # constraint
            return (np.array([1.0, 0.0]), fj), None

        log_Cobb = self.log_pA + self.el.dot(y)
        q = self.price_out * np.exp(y)
        qsum = q[0] + q[1]
        if (fj := math.log(gamma + qsum) - log_Cobb) > 0.0:
            return (q / (gamma + qsum) - self.el, fj), None

        Cobb = np.exp(log_Cobb) # shrinking
        return (q / Cobb - self.el, 0.0), Cobb - qsum

Main program

import numpy as np
from ellalgo.cutting_plane import cutting_plane_optim
from ellalgo.ell import Ell
from ellalgo.oracles.profit_oracle import ProfitOracle

p, A, k = 20.0, 40.0, 30.5
params = p, A, k
alpha, beta = 0.1, 0.4
v1, v2 = 10.0, 35.0
el = np.array([alpha, beta])
v = np.array([v1, v2])
r = np.array([100.0, 100.0])  # initial ellipsoid (sphere)

ellip = Ell(r, np.array([0.0, 0.0]))
omega = ProfitOracle(params, el, v)
xbest, \gamma, num_iters = cutting_plane_optim(omega, ellip, 0.0)

Area of Applications

  • Robust convex optimization
    • oracle technique: affine arithmetic
  • Semidefinite programming
    • oracle technique: Cholesky or factorization
  • Parametric network potential problem
    • oracle technique: negative cycle detection

Robust Convex Optimization

Robust Optimization Formulation

  • Consider:

    where represents a set of varying parameters.

  • The problem can be reformulated as:

Example - Profit Maximization Problem (convex)

  • Now assume that:
    • and vary and respectively.
    • , , , and all vary .

Example - Profit Maximization Problem (oracle)

By detail analysis, the worst case happens when:

  • ,
  • , ,
  • if , , else
  • if , , else

Corresponding Python code

class ProfitRbOracle(OracleOptim):
    def __init__(self, params, elasticities, price_out, vparams):
        e1, e2, e3, e4, e5 = vparams
        self.elasticities = elasticities
        self.e = [e1, e2]
        unit_price, scale, limit = params
        params_rb = unit_price - e3, scale, limit - e4
        self.omega = ProfitOracle(params_rb, elasticities,
                                  price_out + np.array([e5, e5]))

    def assess_optim(self, y, gamma):
        el_rb = copy.copy(self.elasticities)
        for i in [0, 1]:
            el_rb[i] += -self.e[i] if y[i] > 0.0 else self.e[i]
        self.omega.el = el_rb
        return self.omega.assess_optim(y, gamma)

🔮 Oracle in Robust Optimization Formulation

  • The oracle only needs to determine:
    • If for some and , then
      • the cut =
    • If for some , then
      • the cut =
    • Otherwise, is feasible, then
      • Let .
      • .
      • The cut =

Remark: for more complicated problems, affine arithmetic could be used [@liu2007robust].

Matrix Inequalities

Problems With Matrix Inequalities

Consider the following problem:

  • : a matrix-valued function
  • denotes is positive semidefinite.

Problems With Matrix Inequalities

  • Recall that a matrix is positive semidefinite if and only if for all .
  • The problem can be transformed into:
  • Consider is concave for all w. r. t. , then the above problem is a convex programming.
  • Reduce to semidefinite programming if is linear w.r.t. , i.e.,

LDLT factorization

  • The LDLT factorization of a symmetric positive definite matrix is the factorization , where is lower triangular with unit diagonal elements and is a diagonal matrix.

  • For example,

Naïve implementation

  • Then, start with , the basic algorithm of LDLT factorization is:

  • Invoke FLOP's, where is the place the algorithm stops.

Storage representation

First, we pack the solution and the intermediate storage on a single matrix such that:

  • For example,

Improved implementation

  • Then, start with , the improved implementation of LDLT factorization is:

  • Invoke FLOP's (same as Cholesky factorization's), where is the place the algorithm stops.

Witness of indefiniteness

  • In the case of failure, a vector can be constructed to certify that .

  • Let denote the partial sub-matrix where is the row of failure.

  • Then , where

  • Start with , the basic algorithm is:

🔮 Oracle in Matrix Inequalities

The oracle only needs to:

  • Perform a row-based LDLT factorization such that .
  • Let denotes a submatrix .
  • If the process fails at row ,
    • there exists a vector , such that
      • , and
      • .
    • The cut =

Lazy evaluation

  • Don't construct the full matrix at each iteration!

  • Only O() per iteration, independent of !

class LMIOracle:
    def __init__(self, F, B):
        self.F = F
        self.F0 = B
        self.Q = LDLTMgr(len(B))

    def assess_feas(self, x: Arr) -> Optional[Cut]:
        def get_elem(i, j):
            return self.F0[i, j] - sum(
                Fk[i, j] * xk for Fk, xk in zip(self.F, x))

        if self.Q.factor(get_elem):
            return None
        ep = self.Q.witness()
        g = np.array([self.Q.sym_quad(Fk) for Fk in self.F])
        return g, ep

Google Benchmark 📊 Comparison

2: ----------------------------------------------------------
2: Benchmark                Time             CPU   Iterations
2: ----------------------------------------------------------
2: BM_LMI_Lazy         131235 ns       131245 ns         4447
2: BM_LMI_old          196694 ns       196708 ns         3548
2/4 Test #2: Bench_BM_lmi .....................   Passed    2.57 sec

Example - Matrix Norm Minimization

  • Let
  • Problem can be reformulated as
  • Binary search on can be used for this problem.

Example - Estimation of Correlation Function

  • Let , where

    • 's are the unknown coefficients to be fitted
    • 's are a family of basis functions.
  • The covariance matrix can be recast as:

    where

🧪 Experimental Result

image : Data Sample (kern=0.5)

image : Least Square Result

🧪 Experimental Result II

image : Data Sample (kern=1.0)

image

🧪 Experimental Result III

image : Data Sample (kern=2.0)

image : Least Square Result

Multi-parameter Network Problem

Parametric Network Problem

Given a network represented by a directed graph .

Consider:

  • is the concave function of edge ,

  • Assume: network is large, but the number of parameters is small.

Network Potential Problem (cont'd)

Given , the problem has a feasible solution if and only if contains no negative cycle. Let be a set of all cycles of .

  • is a cycle of

  • .

Negative Cycle Finding

There are lots of methods to detect negative cycles in a weighted graph [@cherkassky1999negative], in which Tarjan's algorithm [@Tarjan1981negcycle] is one of the fastest algorithms in practice [@alg:dasdan_mcr; @cherkassky1999negative].

🔮 Oracle in Network Potential Problem

  • The oracle only needs to determine:
    • If there exists a negative cycle under , then
      • the cut =
    • Otherwise, the shortest path solution gives the value of .

🐍 Python Code

class NetworkOracle:
    def __init__(self, G, u, h):
        self._G = G
        self._u = u
        self._h = h
        self._S = NegCycleFinder(G)

    def update(self, gamma):
        self._h.update(gamma)

    def assess_feas(self, x) -> Optional[Cut]:
        def get_weight(e):
            return self._h.eval(e, x)

        for Ci in self._S.find_neg_cycle(self._u, get_weight):
            f = -sum(self._h.eval(e, x) for e in Ci)
            g = -sum(self._h.grad(e, x) for e in Ci)
            return g, f  # use the first Ci only
        return None

Example - Optimal Matrix Scaling [@orlin1985computing]

  • Given a sparse matrix .

  • Find another matrix where is a nonnegative diagonal matrix, such that the ratio of any two elements of in absolute value is as close to 1 as possible.

  • Let . Under the min-max-ratio criterion, the problem can be formulated as:

Optimal Matrix Scaling (cont'd)

By taking the logarithms of variables, the above problem can be transformed into:

where denotes and .

class OptScalingOracle:
    class Ratio:
        def __init__(self, G, get_cost):
            self._G = G
            self._get_cost = get_cost

        def eval(self, e, x: Arr) -> float:
            u, v = e
            cost = self._get_cost(e)
            return x[0] - cost if u < v else cost - x[1]

        def grad(self, e, x: Arr) -> Arr:
            u, v = e
            return np.array([1.0, 0.0] if u < v else [0.0, -1.0])

    def __init__(self, G, u, get_cost):
        self._network = NetworkOracle(G, u, self.Ratio(G, get_cost))

    def assess_optim(self, x: Arr, gamma: float):
        s = x[0] - x[1]
        g = np.array([1.0, -1.0])
        if (fj := s - gamma) >= 0.0:
            return (g, fj), None
        if (cut := self._network.assess_feas(x)):
            return cut, None
        return (g, 0.0), s

Example - clock period & yield-driven co-optimization

  • 👉 Note that is not concave in general in .
  • Fortunately, we are most likely interested in optimizing circuits for high yield rather than the low one in practice.
  • Therefore, by imposing an additional constraint to , say , the problem becomes convex.

Example - clock period & yield-driven co-optimization

The problem can be reformulated as:

🫒 Ellipsoid Method Revisited

📝 Abstract

This lecture provides a brief history of the ellipsoid method. Then it discusses implementation issues of the ellipsoid method, such as utilizing parallel cuts to update the search space and enhance computation time. In some instances, parallel cuts can drastically reduce computation time, as observed in FIR filter design. Discrete optimization is also investigated, illustrating how the ellipsoid method can be applied to problems that involve discrete design variables. An oracle implementation is required solely for locating the nearest discrete solutions

Some History of Ellipsoid Method [@BGT81]

  • Introduced by Shor and Yudin and Nemirovskii in 1976

  • Used to show that linear programming (LP) is polynomial-time solvable (Kachiyan 1979), settled the long-standing problem of determining the theoretical complexity of LP.

  • In practice, however, the simplex method runs much faster than the method, although its worst-case complexity is exponential.

Basic Ellipsoid Method

  • An ellipsoid is specified as a set where is the center of the ellipsoid.

ellipsoid

Updating the ellipsoid (deep-cut)

Calculation of minimum volume ellipsoid covering:

Deep-cut

Updating the ellipsoid (deep-cut)

  • Let , .

  • If (shallow cut), no smaller ellipsoid can be found.

  • If , intersection is empty.

Otherwise,

where

Updating the ellipsoid (cont'd)

  • Even better, split into two variables

  • Let , , .

  • Reduce multiplications per iteration.

  • 👉 Note:

    • The determinant of decreases monotonically.
    • The range of is .

Central Cut

  • A Special case of deep cut when

  • Deserve a separate implement because it is much simplier.

  • Let , ,

Central Cut

Calculation of minimum volume ellipsoid covering:

Central-cut

🪜 Parallel Cuts

🪜 Parallel Cuts

  • Oracle returns a pair of cuts instead of just one.

  • The pair of cuts is given by and such that:

    for all .

  • Only linear inequality constraint can produce such parallel cut:

  • Usually provide faster convergence.

🪜 Parallel Cuts

Calculation of minimum volume ellipsoid covering:

Parallel Cut

Updating the ellipsoid (old)

  • Let , .

  • If , intersection is empty.

  • If , no smaller ellipsoid can be found.

  • If , it reduces to deep-cut with

  • Otherwise, where

    \begin{array}{lll} \zeta_0 &=& \tau^2 - \beta_0^2 \ \zeta_1 &=& \tau^2 - \beta_1^2 \ \xi &=& \sqrt{4\zeta_0\zeta_1 + n^2(\beta_1^2 - \beta_0^2)^2}, \ \sigma &=& (n + (2\tau^2 + 2\beta_0\beta_1 - \xi){\color{red}(\beta_0 + \beta_1)^2} ) / (n + 1), \ \rho &=& \sigma(\beta_0 + \beta_1) / 2, \ \delta &=& (n^2/2(n^2-1)) (\zeta_0 + \zeta_1 + \xi/n) / \tau^2 . \end{array}

Updating the ellipsoid (new)

  • Let , .
  • If , intersection is empty.
  • If , no smaller ellipsoid can be found.
  • If , it reduces to deep-cut with
  • Otherwise, where

Parallel Central Cuts

Calculation of minimum volume ellipsoid covering:

Updating the ellipsoid

  • Let , .
  • If , it reduces to central-cut
  • Otherwise, where

Example - FIR filter design

A typical structure of an FIR filter @mitra2006digital.

  • The time response is:

Example - FIR filter design (cont'd)

  • The frequency response:

  • The magnitude constraints on frequency domain are expressed as

    where and are the lower and upper (nonnegative) bounds at frequency respectively.

  • The constraint is non-convex in general.

Example - FIR filter design (II)

  • However, via spectral factorization [@goodman1997spectral], it can transform into a convex one [@wu1999fir]:

    where

    • are the autocorrelation coefficients.

Example - FIR filter design (III)

  • can be determined by :

    where for or .

  • The whole problem can be formulated as:

#🧪 Experiment

Result

📊 Google Benchmark Result

3: ------------------------------------------------------------------
3: Benchmark                        Time             CPU   Iterations
3: ------------------------------------------------------------------
3: BM_Lowpass_single_cut    627743505 ns    621639313 ns            1
3: BM_Lowpass_parallel_cut   30497546 ns     30469134 ns           24
3/4 Test #3: Bench_BM_lowpass .................   Passed    1.72 sec

Discrete Optimization

Why Discrete Convex Programming

  • Many engineering problems can be formulated as a convex/geometric programming, e.g. digital circuit sizing

  • Yet in an ASIC design, often there is only a limited set of choices from the cell library. In other words, some design variables are discrete.

  • The discrete version can be formulated as a Mixed-Integer Convex programming (MICP) by mapping the design variables to integers.

What's Wrong w/ Existing Methods?

  • Mostly based on relaxation.

  • Then use the relaxed solution as a lower bound and use the branch--and--bound method for the discrete optimal solution.

    • 👉 Note: the branch-and-bound method does not utilize the convexity of the problem.
  • What if I can only evaluate constraints on discrete data? Workaround: convex fitting?

Mixed-Integer Convex Programming

Consider:

where

  • and are "convex"
  • Some design variables are discrete.

🔮 Oracle Requirement

  • The oracle looks for the nearby discrete solution of with the cutting-plane:

  • 👉 Note: the cut may be a shallow cut.

  • Suggestion: use different cuts as possible for each iteration (e.g. round-robin the evaluation of constraints)

Discrete Cut

Discrete Cut

Example - Multiplier-less FIR filter design (nnz=3)

Lowpass