Ellipsoid Method and Its Amazing Oracles 🔮

When you have eliminated the impossible, whatever remains, however improbable, must be the truth.

Sir Arthur Conan Doyle, stated by Sherlock Holmes

📖 Introduction

Common Perspective of Ellipsoid Method

It is widely believed to be inefficient in practice for large-scale problems.
- Convergent rate is slow, even when using deep cuts.
- Cannot exploit sparsity.
It has since then supplanted by the interior-point methods.
Used only as a theoretical tool to prove polynomial-time solvability of some combinatorial optimization problems.

But...

The ellipsoid method works very differently compared with the interior point methods.
It only requires a separation oracle that provides a cutting plane.
Although the ellipsoid method cannot take advantage of the sparsity of the problem, the separation oracle is capable of take advantage of certain structural types.

Consider the ellipsoid method when...

The number of design variables is moderate, e.g. ECO flow, analog circuit sizing, parametric problems
The number of constraints is large, or even infinite
Oracle can be implemented effectively.

🥥 Cutting-plane Method Revisited

Convex Set

Let $K \subseteq R^{n}$ be a convex set 🥚.
Consider the feasibility problem:
- Find a point $x^{*} \in R^{n}$ in $K$ ,
- or determine that $K$ is empty (i.e., there is no feasible solution)

🔮 Separation Oracle

When a separation oracle $Ω$ is queried at $x_{0}$ , it either
asserts that $x_{0} \in K$ , or
returns a separating hyperplane between $x_{0}$ and $K$ : $g^{T} (x - x_{0}) + β \leq 0, β \geq 0, g \neq = 0, \forall x \in K$

🔮 Separation Oracle (cont'd)

$(g, β)$ is called a cutting-plane, or cut, because it eliminates the half-space ${x ∣ g^{T} (x - x_{0}) + β > 0}$ from our search.
If $β = 0$ ( $x_{0}$ is on the boundary of halfspace that is cut), the cutting-plane is called neutral cut.
If $β > 0$ ( $x_{0}$ lies in the interior of halfspace that is cut), the cutting-plane is called deep cut.
If $β < 0$ ( $x_{0}$ lies in the exterior of halfspace that is cut), the cutting-plane is called shallow cut.

Subgradient

$K$ is usually given by a set of inequalities $f_{j} (x) \leq 0$ or $f_{j} (x) < 0$ for $j = 1 \dots m$ , where $f_{j} (x)$ is a convex function.
A vector $g \equiv \partial f (x_{0})$ is called a subgradient of a convex function $f$ at $x_{0}$ if $f (z) \geq f (x_{0}) + g^{T} (z - x_{0})$ .
Hence, the cut $(g, β)$ is given by $(\partial f (x_{0}), f (x_{0}))$

Remarks:

If $f (x)$ is differentiable, we can simply take $\partial f (x_{0}) = \nabla f (x_{0})$

Key components of Cutting-plane method

A cutting plane oracle $Ω$
A search space $S$ initially large enough to cover $K$ , e.g.
- Polyhedron $P$ = ${z ∣ C z ⪯ d}$
- Interval $I$ = $[l, u]$ (for one-dimensional problem)
- Ellipsoid $E$ = ${z ∣ (z - x_{c}) P^{- 1} (z - x_{c}) \leq 1}$

Outline of Cutting-plane method

Given initial $S$ known to contain $K$ .
Repeat
- Choose a point $x_{0}$ in $S$
- Query the cutting-plane oracle at $x_{0}$
- If $x_{0} \in K$ , quit
- Otherwise, update $S$ to a smaller set that covers: $S^{+} = S \cap {z ∣ g^{T} (z - x_{0}) + β \leq 0}$
- If $S^{+} = \emptyset$ or it is small enough, quit.

Corresponding Python code

def cutting_plane_feas(omega, space, options=Options()):
    for niter in range(options.max_iters):
        cut = omega.assess_feas(space.xc())  # query the oracle
        if cut is None:  # feasible sol'n obtained
            return space.xc(), niter
        status = space.update_deep_cut(cut)  # update space
        if status != CutStatus.Success or space.tsq() < options.tol:
            return None, niter
    return None, options.max_iters

From Feasibility to Optimization

$minimize subject to f_{0} (x), x \in K$

The optimization problem is treated as a feasibility problem with an additional constraint $f_{0} (x) \leq γ$ .
$f_{0} (x)$ could be a convex or a quasiconvex function.
$γ$ is also called the best-so-far value of $f_{0} (x)$ .

Convex Optimization Problem

Consider the following general form:

$minimize subject to γ, Φ (x, γ) \leq 0, x \in K,$

where $K_{γ}^{'} = {x ∣ Φ (x, γ) \leq 0}$ is the $γ$ -sublevel set of ${x ∣ f_{0} (x) \leq γ}$ .
👉 Note: $K^{'}_{γ} \subseteq K^{'}_{ϵ}$ if and only if $γ \leq ϵ$ (monotonicity)
One easy way to solve the optimization problem is to apply the binary search on $γ$ .

Shrinking

Another possible way is, to update the best-so-far $γ$ whenever a feasible solution $x^{'}$ is found by solving the equation: $Φ (x^{'}, γ_{new}) = 0 .$
If the equation is difficuit to solve but $γ$ is also convex w.r.t. $Φ$ , then we may create a new varaible, say $z$ and let $z \leq γ$ .

Outline of Cutting-plane method (Optim)

Given initial $S$ known to contain $K_{γ}$ .
Repeat
- Choose a point $x_{0}$ in $S$
- Query the separation oracle at $x_{0}$
- If $x_{0} \in K_{γ}$ , update $γ$ such that $Φ (x_{0}, γ) = 0$ .
- Update $S$ to a smaller set that covers: $S^{+} = S \cap {z ∣ g^{T} (z - x_{0}) + β \leq 0}$
- If $S^{+} = \emptyset$ or it is small enough, quit.

Corresponding Python code

def cutting_plane_optim(omega, S, gamma, options=Options()):
    x_best = None
    for niter in range(options.max_iters):
        cut, gamma1 = omega.assess_optim(space.xc(), gamma)
        if gamma1 is not None:  # better \gamma obtained
            gamma = gamma1
            x_best = copy.copy(space.xc())
            status = space.update_central_cut(cut)
        else:
            status = space.update_deep_cut(cut)
        if status != CutStatus.Success or space.tsq() < options.tol:
            return x_best, target, niter
    return x_best, gamma, options.max_iters

Example - Profit Maximization Problem

This example is taken from [@Aliabadi2013Robust].

$maximize subject to p (A x_{1}^{α} x_{2}^{β}) - v_{1} x_{1} - v_{2} x_{2} x_{1} \leq k .$

$p (A x_{1}^{α} x_{2}^{β})$ : Cobb-Douglas production function
$p$ : the market price per unit
$A$ : the scale of production
$α, β$ : the output elasticities
$x$ : input quantity
$v$ : output price
$k$ : a given constant that restricts the quantity of $x_{1}$

Example - Profit maximization (cont'd)

The formulation is not in the convex form.
Rewrite the problem in the following form: $maximize subject to γ γ + v_{1} x_{1} + v_{2} x_{2} \leq p A x_{1}^{α} x_{2}^{β} x_{1} \leq k .$

Profit maximization in Convex Form

By taking the logarithm of each variable:
- $y_{1} = lo g x_{1}$ , $y_{2} = lo g x_{2}$ .
We have the problem in a convex form:

$max s.t. γ lo g (γ + v_{1} e^{y_{1}} + v_{2} e^{y_{2}}) - (α y_{1} + β y_{2}) \leq lo g (p A) y_{1} \leq lo g k .$

Corresponding Python code

class ProfitOracle:
    def __init__(self, params, elasticities, price_out):
        unit_price, scale, limit = params
        self.log_pA = math.log(unit_price * scale)
        self.log_k = math.log(limit)
        self.price_out = price_out
        self.el = elasticities

    def assess_optim(self, y, gamma):
        if (fj := y[0] - self.log_k) > 0.0:  # constraint
            return (np.array([1.0, 0.0]), fj), None

        log_Cobb = self.log_pA + self.el.dot(y)
        q = self.price_out * np.exp(y)
        qsum = q[0] + q[1]
        if (fj := math.log(gamma + qsum) - log_Cobb) > 0.0:
            return (q / (gamma + qsum) - self.el, fj), None

        Cobb = np.exp(log_Cobb) # shrinking
        return (q / Cobb - self.el, 0.0), Cobb - qsum

Main program

import numpy as np
from ellalgo.cutting_plane import cutting_plane_optim
from ellalgo.ell import Ell
from ellalgo.oracles.profit_oracle import ProfitOracle

p, A, k = 20.0, 40.0, 30.5
params = p, A, k
alpha, beta = 0.1, 0.4
v1, v2 = 10.0, 35.0
el = np.array([alpha, beta])
v = np.array([v1, v2])
r = np.array([100.0, 100.0])  # initial ellipsoid (sphere)

ellip = Ell(r, np.array([0.0, 0.0]))
omega = ProfitOracle(params, el, v)
xbest, \gamma, num_iters = cutting_plane_optim(omega, ellip, 0.0)

Area of Applications

Robust convex optimization
- oracle technique: affine arithmetic
Semidefinite programming
- oracle technique: Cholesky or $L D L^{T}$ factorization
Parametric network potential problem
- oracle technique: negative cycle detection

Robust Convex Optimization

Robust Optimization Formulation

Consider:

$minimize subject to sup_{q \in Q} f_{0} (x, q), f_{j} (x, q) \leq 0, \forall q \in Q, j = 1, 2, \dots, m,$ where $q$ represents a set of varying parameters.
The problem can be reformulated as: $minimize subject to γ f_{0} (x, q) < γ f_{j} (x, q) \leq 0, \forall q \in Q, j = 1, 2, \dots, m .$

Example - Profit Maximization Problem (convex)

$max s.t. γ lo g (γ + \overset{v}{^}_{1} e^{y_{1}} + \overset{v}{^}_{2} e^{y_{2}}) - (\overset{α}{^} y_{1} + \hat{β} y_{2}) \leq lo g (\overset{p}{^} A) y_{1} \leq lo g \hat{k},$

Now assume that:
- $\overset{α}{^}$ and $\hat{β}$ vary $\overset{α}{ˉ} \pm e_{1}$ and $\overset{ˉ}{β} \pm e_{2}$ respectively.
- $\overset{p}{^}$ , $\hat{k}$ , $\overset{v}{^}_{1}$ , and $\overset{v}{^}_{2}$ all vary $\pm e_{3}$ .

Example - Profit Maximization Problem (oracle)

By detail analysis, the worst case happens when:

$p = \overset{p}{ˉ} - e_{3}$ , $k = \overset{ˉ}{k} - e_{3}$
$v_{1} = \overset{v}{ˉ}_{1} + e_{3}$ , $v_{2} = \overset{v}{ˉ}_{2} + e_{3}$ ,
if $y_{1} > 0$ , $α = \overset{α}{ˉ} - e_{1}$ , else $α = \overset{α}{ˉ} + e_{1}$
if $y_{2} > 0$ , $β = \overset{ˉ}{β} - e_{2}$ , else $β = \overset{ˉ}{β} + e_{2}$

Corresponding Python code

class ProfitRbOracle(OracleOptim):
    def __init__(self, params, elasticities, price_out, vparams):
        e1, e2, e3, e4, e5 = vparams
        self.elasticities = elasticities
        self.e = [e1, e2]
        unit_price, scale, limit = params
        params_rb = unit_price - e3, scale, limit - e4
        self.omega = ProfitOracle(params_rb, elasticities,
                                  price_out + np.array([e5, e5]))

    def assess_optim(self, y, gamma):
        el_rb = copy.copy(self.elasticities)
        for i in [0, 1]:
            el_rb[i] += -self.e[i] if y[i] > 0.0 else self.e[i]
        self.omega.el = el_rb
        return self.omega.assess_optim(y, gamma)

🔮 Oracle in Robust Optimization Formulation

The oracle only needs to determine:
- If $f_{j} (x_{0}, q) > 0$ for some $j$ and $q = q_{0}$ , then
  - the cut $(g, β)$ = $(\partial f_{j} (x_{0}, q_{0}), f_{j} (x_{0}, q_{0}))$
- If $f_{0} (x_{0}, q) \geq γ$ for some $q = q_{0}$ , then
  - the cut $(g, β)$ = $(\partial f_{0} (x_{0}, q_{0}), f_{0} (x_{0}, q_{0}) - γ)$
- Otherwise, $x_{0}$ is feasible, then
  - Let $q_{m a x} = arg max_{q \in Q} f_{0} (x_{0}, q)$ .
  - $γ := f_{0} (x_{0}, q_{m a x})$ .
  - The cut $(g, β)$ = $(\partial f_{0} (x_{0}, q_{m a x}), 0)$

Remark: for more complicated problems, affine arithmetic could be used [@liu2007robust].

Matrix Inequalities

Problems With Matrix Inequalities

Consider the following problem:

$find subject to x, F (x) ≻ 0,$

$F (x)$ : a matrix-valued function
$A ≻ 0$ denotes $A$ is positive semidefinite.

Problems With Matrix Inequalities

Recall that a matrix $A$ is positive semidefinite if and only if $v^{T} A v > 0$ for all $v \in R^{N} - 0^{N}$ .
The problem can be transformed into: $find subject to x, v^{T} F (x) v > 0, \forall v \in R^{N} - 0^{N}$
Consider $v^{T} F (x) v$ is concave for all $v \in R^{N}$ w. r. t. $x$ , then the above problem is a convex programming.
Reduce to semidefinite programming if $F (x)$ is linear w.r.t. $x$ , i.e., $F (x) = F_{0} + x_{1} F_{1} + \dots + x_{n} F_{n}$

LDLT factorization

The LDLT factorization of a symmetric positive definite matrix $A$ is the factorization $A = L D L^{T}$ , where $L$ is lower triangular with unit diagonal elements and $D$ is a diagonal matrix.
For example, $1111121211311214 = 111101010010000110000100002000021000110010101101 .$

Naïve implementation

Then, start with $a_{11} = d_{11}$ , the basic algorithm of LDLT factorization is:

$1 for i = 1 : n 2 for j = 1 : i - 1 3 s = a_{ij} - \sum_{k = 1}^{j - 1} d_{kk} l_{ik} l_{jk} 4 l_{ij} = s / d_{jj} 5 end 6 d_{ii} = a_{ii} - \sum_{k = 1}^{j - 1} d_{kk} l_{ik} l_{jk} 7 end$

Invoke $p^{3}$ FLOP's, where $p$ is the place the algorithm stops.

Storage representation

First, we pack the solution and the intermediate storage on a single matrix $T$ such that:

$t_{ij} = ⎩ ⎨ ⎧ d_{ii} l_{ij} d_{ii} l_{ji} if i = j, if i > j, if j < i .$

For example, $T = d_{11} l_{21} l_{31} l_{41} d_{11} l_{21} d_{22} l_{32} l_{42} d_{11} l_{31} d_{22} l_{32} d_{33} l_{43} d_{11} l_{41} d_{22} l_{42} d_{33} l_{43} d_{44} .$

Improved implementation

Then, start with $a_{11} = t_{11}$ , the improved implementation of LDLT factorization is:

$1 for i = 1 : n 2 for j = 1 : i - 1 3 t_{ji} = a_{ij} - \sum_{k = 1}^{j - 1} t_{ik} t_{jk} 4 t_{ij} = t_{ji} / t_{jj} 5 end 6 t_{ii} = a_{ii} - \sum_{k = 1}^{i - 1} t_{ik} t_{ki} 7 end$

Invoke $\frac{p ^{3}}{2}$ FLOP's (same as Cholesky factorization's), where $p$ is the place the algorithm stops.

Witness of indefiniteness

In the case of failure, a vector $v$ can be constructed to certify that $v^{T} A v \leq 0$ .
Let $L_{1 : p}$ denote the partial sub-matrix $L (1 : p, 1 : p)$ where $p$ is the row of failure.
Then $v = [L_{1 : p}^{- T} e_{p}, 0, \dots, 0]^{T}$ , where $e_{p} = [0, \dots, 0, 1]^{T} \in R^{p}$
Start with $v = e_{p}$ , the basic algorithm is:

$1 for i = p - 1 downto 1 2 for k = i to p 3 v_{i} = v_{i} - t_{k, i} v_{k} 4 end 5 end$

🔮 Oracle in Matrix Inequalities

The oracle only needs to:

Perform a row-based LDLT factorization such that $F (x_{0}) = L D L^{T}$ .
Let $A_{p, p}$ denotes a submatrix $A (1 : p, 1 : p) \in R^{p \times p}$ .
If the process fails at row $p$ ,
- there exists a vector $e_{p} = (0, 0, \dots, 0, 1)^{T} \in R^{p}$ , such that
  - $v = R_{p, p}^{- 1} e_{p}$ , and
  - $v^{T} F_{p, p} (x_{0}) v \leq 0$ .
- The cut $(g, β)$ = $(- v^{T} \partial F_{p, p} (x_{0}) v, - v^{T} F_{p, p} (x_{0}) v)$

Lazy evaluation

Don't construct the full matrix at each iteration!
Only O( $p^{3}$ ) per iteration, independent of $N$ !

class LMIOracle:
    def __init__(self, F, B):
        self.F = F
        self.F0 = B
        self.Q = LDLTMgr(len(B))

    def assess_feas(self, x: Arr) -> Optional[Cut]:
        def get_elem(i, j):
            return self.F0[i, j] - sum(
                Fk[i, j] * xk for Fk, xk in zip(self.F, x))

        if self.Q.factor(get_elem):
            return None
        ep = self.Q.witness()
        g = np.array([self.Q.sym_quad(Fk) for Fk in self.F])
        return g, ep

Google Benchmark 📊 Comparison

2: ----------------------------------------------------------
2: Benchmark                Time             CPU   Iterations
2: ----------------------------------------------------------
2: BM_LMI_Lazy         131235 ns       131245 ns         4447
2: BM_LMI_old          196694 ns       196708 ns         3548
2/4 Test #2: Bench_BM_lmi .....................   Passed    2.57 sec

Example - Matrix Norm Minimization

Let $A (x) = A_{0} + x_{1} A_{1} + \dots + x_{n} A_{n}$
Problem $min_{x} ∥ A (x) ∥$ can be reformulated as $minimize subject to γ, (γ I A^{T} (x) A (x) γ I) ≻ 0,$
Binary search on $γ$ can be used for this problem.

Example - Estimation of Correlation Function

$min_{κ, p} s. t. ∥Σ (p) + κ I - Y ∥ Σ (p) ≻ 0, κ \geq 0 .$

Let $ρ (h) = \sum_{i}^{n} p_{i} Ψ_{i} (h)$ , where
- $p_{i}$ 's are the unknown coefficients to be fitted
- $Ψ_{i}$ 's are a family of basis functions.
The covariance matrix $Σ (p)$ can be recast as: $Σ (p) = p_{1} F_{1} + \dots + p_{n} F_{n}$

where ${F_{k}}_{i, j} = Ψ_{k} (∥ s_{j} - s_{i} ∥_{2})$

🧪 Experimental Result

: Data Sample (kern=0.5)

: Least Square Result

🧪 Experimental Result III

: Data Sample (kern=2.0)

: Least Square Result

Multi-parameter Network Problem

Parametric Network Problem

Given a network represented by a directed graph $G = (V, E)$ .

Consider:

$find subject to x, u u_{j} - u_{i} \leq h_{ij} (x), \forall (i, j) \in E,$

$h_{ij} (x)$ is the concave function of edge $(i, j)$ ,
Assume: network is large, but the number of parameters is small.

Network Potential Problem (cont'd)

Given $x$ , the problem has a feasible solution if and only if $G$ contains no negative cycle. Let $C$ be a set of all cycles of $G$ .

$find subject to x w_{k} (x) \geq 0, \forall C_{k} \in C,$

$C_{k}$ is a cycle of $G$
$w_{k} (x) = \sum_{(i, j) \in C_{k}} h_{ij} (x)$ .

There are lots of methods to detect negative cycles in a weighted graph [@cherkassky1999negative], in which Tarjan's algorithm [@Tarjan1981negcycle] is one of the fastest algorithms in practice [@alg:dasdan_mcr; @cherkassky1999negative].

🔮 Oracle in Network Potential Problem

The oracle only needs to determine:
- If there exists a negative cycle $C_{k}$ under $x_{0}$ , then
  - the cut $(g, β)$ = $(- \partial w_{k} (x_{0}), - w_{k} (x_{0}))$
- Otherwise, the shortest path solution gives the value of $u$ .

🐍 Python Code

class NetworkOracle:
    def __init__(self, G, u, h):
        self._G = G
        self._u = u
        self._h = h
        self._S = NegCycleFinder(G)

    def update(self, gamma):
        self._h.update(gamma)

    def assess_feas(self, x) -> Optional[Cut]:
        def get_weight(e):
            return self._h.eval(e, x)

        for Ci in self._S.find_neg_cycle(self._u, get_weight):
            f = -sum(self._h.eval(e, x) for e in Ci)
            g = -sum(self._h.grad(e, x) for e in Ci)
            return g, f  # use the first Ci only
        return None

Example - Optimal Matrix Scaling [@orlin1985computing]

Given a sparse matrix $A = [a_{ij}] \in R^{N \times N}$ .
Find another matrix $B = U A U^{- 1}$ where $U$ is a nonnegative diagonal matrix, such that the ratio of any two elements of $B$ in absolute value is as close to 1 as possible.
Let $U = diag ([u_{1}, u_{2}, \dots, u_{N}])$ . Under the min-max-ratio criterion, the problem can be formulated as:

$minimize subject to variables π / ψ ψ \leq u_{i} ∣ a_{ij} ∣ u_{j}^{- 1} \leq π, \forall a_{ij} \neq = 0, π, ψ, u, positive π, ψ, u .$

Optimal Matrix Scaling (cont'd)

By taking the logarithms of variables, the above problem can be transformed into:

$minimize subject to variables γ π^{'} - ψ^{'} \leq γ u_{i}^{'} - u_{j}^{'} \leq π^{'} - a_{ij}^{'}, \forall a_{ij} \neq = 0, u_{j}^{'} - u_{i}^{'} \leq a_{ij}^{'} - ψ^{'}, \forall a_{ij} \neq = 0, π^{'}, ψ^{'}, u^{'} .$

where $k^{'}$ denotes $lo g (∣ k ∣)$ and $x = (π^{'}, ψ^{'})^{T}$ .

class OptScalingOracle:
    class Ratio:
        def __init__(self, G, get_cost):
            self._G = G
            self._get_cost = get_cost

        def eval(self, e, x: Arr) -> float:
            u, v = e
            cost = self._get_cost(e)
            return x[0] - cost if u < v else cost - x[1]

        def grad(self, e, x: Arr) -> Arr:
            u, v = e
            return np.array([1.0, 0.0] if u < v else [0.0, -1.0])

    def __init__(self, G, u, get_cost):
        self._network = NetworkOracle(G, u, self.Ratio(G, get_cost))

    def assess_optim(self, x: Arr, gamma: float):
        s = x[0] - x[1]
        g = np.array([1.0, -1.0])
        if (fj := s - gamma) >= 0.0:
            return (g, fj), None
        if (cut := self._network.assess_feas(x)):
            return cut, None
        return (g, 0.0), s

Example - clock period & yield-driven co-optimization

$minimize subject to variables T_{CP} / β u_{i} - u_{j} \leq T_{CP} - F_{ij}^{- 1} (β), u_{j} - u_{i} \leq F_{ij}^{- 1} (1 - β), T_{CP} \geq 0, 0 \leq β \leq 1, T_{CP}, β, u . \forall (i, j) \in E_{s}, \forall (j, i) \in E_{h},$

👉 Note that $F_{ij}^{- 1} (x)$ is not concave in general in $[0, 1]$ .
Fortunately, we are most likely interested in optimizing circuits for high yield rather than the low one in practice.
Therefore, by imposing an additional constraint to $β$ , say $β \geq 0.8$ , the problem becomes convex.

Example - clock period & yield-driven co-optimization

The problem can be reformulated as:

$minimize subject to variables γ T_{CP} - β γ \leq 0 u_{i} - u_{j} \leq T_{CP} - F_{ij}^{- 1} (β), u_{j} - u_{i} \leq F_{ij}^{- 1} (1 - β), T_{CP} \geq 0, 0 \leq β \leq 1, T_{CP}, β, u . \forall (i, j) \in E_{s}, \forall (j, i) \in E_{h},$

🫒 Ellipsoid Method Revisited

📝 Abstract

This lecture provides a brief history of the ellipsoid method. Then it discusses implementation issues of the ellipsoid method, such as utilizing parallel cuts to update the search space and enhance computation time. In some instances, parallel cuts can drastically reduce computation time, as observed in FIR filter design. Discrete optimization is also investigated, illustrating how the ellipsoid method can be applied to problems that involve discrete design variables. An oracle implementation is required solely for locating the nearest discrete solutions

Some History of Ellipsoid Method [@BGT81]

Introduced by Shor and Yudin and Nemirovskii in 1976
Used to show that linear programming (LP) is polynomial-time solvable (Kachiyan 1979), settled the long-standing problem of determining the theoretical complexity of LP.
In practice, however, the simplex method runs much faster than the method, although its worst-case complexity is exponential.

Basic Ellipsoid Method

An ellipsoid $E (x_{c}, P)$ is specified as a set ${x ∣ (x - x_{c}) P^{- 1} (x - x_{c}) \leq 1},$ where $x_{c}$ is the center of the ellipsoid.

ellipsoid

Updating the ellipsoid (deep-cut)

Calculation of minimum volume ellipsoid $E^{+}$ covering:

$E \cap {z ∣ g^{T} (z - x_{c}) + β \leq 0} .$

Deep-cut

Updating the ellipsoid (deep-cut)

Let $\tilde{g} = P g$ , $τ^{2} = g^{T} P g$ .
If $τ + n \cdot β < 0$ (shallow cut), no smaller ellipsoid can be found.
If $β > τ$ , intersection is empty.

Otherwise,

$x_{c}^{+} = x_{c} - \frac{ρ}{τ ^{2}} g, P^{+} = δ \cdot (P - \frac{σ}{τ ^{2}} g \tilde{g}^{T}) .$

where

$ρ = \frac{τ + n \cdot β}{n + 1}, σ = \frac{2 ρ}{τ + β}, δ = \frac{n ^{2} ( τ + β ) ( τ - β )}{( n ^{2} - 1 ) τ ^{2}} .$

Updating the ellipsoid (cont'd)

Even better, split $P$ into two variables $κ \cdot Q$
Let $g = Q \cdot g$ , $ω = g^{T} g$ , $τ = κ \cdot ω$ .

$x_{c}^{+} = x_{c} - \frac{ρ}{ω} g, Q^{+} = Q - \frac{σ}{ω} g \tilde{g}^{T}, κ^{+} = δ \cdot κ .$
Reduce $n^{2}$ multiplications per iteration.
👉 Note:
- The determinant of $Q$ decreases monotonically.
- The range of $δ$ is $(0, \frac{n ^{2}}{n ^{2} - 1})$ .

Central Cut

A Special case of deep cut when $β = 0$
Deserve a separate implement because it is much simplier.
Let $\tilde{g} = Q g$ , $τ = κ \cdot ω$ ,

$ρ = \frac{τ}{n + 1}, σ = \frac{2}{n + 1}, δ = \frac{n ^{2}}{n ^{2} - 1} .$

Central Cut

Calculation of minimum volume ellipsoid $E^{+}$ covering:

$E \cap {z ∣ g^{T} (z - x_{c}) \leq 0} .$

Central-cut

🪜 Parallel Cuts

Oracle returns a pair of cuts instead of just one.
The pair of cuts is given by $g$ and $(β_{0}, β_{1})$ such that:

$g^{T} (x - x_{c}) + β_{0} \leq 0, g^{T} (x - x_{c}) + β_{1} \geq 0,$ for all $x \in K$ .
Only linear inequality constraint can produce such parallel cut: $l \leq a^{T} x + b \leq u, L ⪯ F (x) ⪯ U .$
Usually provide faster convergence.

🪜 Parallel Cuts

Calculation of minimum volume ellipsoid $E^{+}$ covering:

$E \cap {z ∣ g^{T} (z - x_{c}) + β_{0} \leq 0 \land g^{T} (z - x_{c}) + β_{1} \geq 0} .$

Parallel Cut

Updating the ellipsoid (old)

Let $\tilde{g} = Q g$ , $τ^{2} = κ \cdot ω$ .
If $β_{0} > β_{1}$ , intersection is empty.
If $β_{0} β_{1} \leq - τ^{2} / n$ , no smaller ellipsoid can be found.
If $β_{1}^{2} > τ^{2}$ , it reduces to deep-cut with $β = β_{1}$
Otherwise, $x_{c}^{+} = x_{c} - \frac{ρ}{ω} g, Q^{+} = Q - \frac{σ}{ω} g \tilde{g}^{T}, κ^{+} = δ κ .$ where

\begin{array}{lll} \zeta_0 &=& \tau^2 - \beta_0^2 \ \zeta_1 &=& \tau^2 - \beta_1^2 \ \xi &=& \sqrt{4\zeta_0\zeta_1 + n^2(\beta_1^2 - \beta_0^2)^2}, \ \sigma &=& (n + (2\tau^2 + 2\beta_0\beta_1 - \xi){\color{red}(\beta_0 + \beta_1)^2} ) / (n + 1), \ \rho &=& \sigma(\beta_0 + \beta_1) / 2, \ \delta &=& (n^2/2(n^2-1)) (\zeta_0 + \zeta_1 + \xi/n) / \tau^2 . \end{array}

Updating the ellipsoid (new)

Let $\tilde{g} = Q g$ , $τ^{2} = κ \cdot ω$ .
If $β_{0} > β_{1}$ , intersection is empty.
If $τ^{2} + n β_{0} β_{1} \leq 0$ , no smaller ellipsoid can be found.
If $β_{1}^{2} > τ^{2}$ , it reduces to deep-cut with $β = β_{1}$
Otherwise, $x_{c}^{+} = x_{c} - \frac{ρ}{ω} g, Q^{+} = Q - \frac{σ}{ω} g \tilde{g}^{T}, κ^{+} = δ κ .$ where $η \overset{ˉ}{β} h k σ δ = = = = = = τ^{2} + n β_{0} β_{1} (β_{0} + β_{1}) /2 \frac{1}{2} (τ^{2} + β_{0} β_{1}) + n \overset{ˉ}{β}^{2}, h + h^{2} - (n + 1) η \overset{ˉ}{β}^{2}, η / k, ρ = σ \overset{ˉ}{β}, 1 + \frac{η}{τ ^{2} ( k - η )} (\overset{ˉ}{β}^{2} σ - β_{0} β_{1}) .$

Parallel Central Cuts

Calculation of minimum volume ellipsoid $E^{+}$ covering:

$E \cap {z ∣ g^{T} (z - x_{c}) \leq 0 \land g^{T} (z - x_{c}) + β_{1} \geq 0} .$

Updating the ellipsoid

Let $\tilde{g} = Q g$ , $τ^{2} = κ \cdot ω$ .
If $β_{1}^{2} > τ^{2}$ , it reduces to central-cut
Otherwise, $x_{c}^{+} = x_{c} - \frac{ρ}{ω} g, Q^{+} = Q - \frac{σ}{ω} g \tilde{g}^{T}, κ^{+} = δ κ .$ where $α^{2} h r ρ σ δ = = = = = = β^{2} / τ^{2} \frac{n}{2} α^{2} h + h^{2} + 1 - α^{2}, \frac{β}{r + 1} \frac{2}{r + 1}, \frac{r}{r - 1/ n} .$

Example - FIR filter design

A typical structure of an FIR filter @mitra2006digital.

The time response is: $y [t] = k = 0 \sum n - 1 h [k] u [t - k] .$

Example - FIR filter design (cont'd)

The frequency response: $H (ω) = m = 0 \sum n - 1 h (m) e^{- jmω} .$
The magnitude constraints on frequency domain are expressed as

$L (ω) \leq ∣ H (ω) ∣ \leq U (ω), \forall ω \in (- \infty, + \infty) .$

where $L (ω)$ and $U (ω)$ are the lower and upper (nonnegative) bounds at frequency $ω$ respectively.
The constraint is non-convex in general.

Example - FIR filter design (II)

However, via spectral factorization [@goodman1997spectral], it can transform into a convex one [@wu1999fir]: $L^{2} (ω) \leq R (ω) \leq U^{2} (ω), \forall ω \in (0, π),$

where
- $R (ω) = \sum_{i = - 1 + n}^{n - 1} r (t) e^{- j ω t} = ∣ H (ω) ∣^{2}$
- $r = (r (- n + 1), r (- n + 2), ..., r (n - 1))$ are the autocorrelation coefficients.

Example - FIR filter design (III)

$r$ can be determined by $h$ :

$r (t) = i = - n + 1 \sum n - 1 h (i) h (i + t), t \in Z,$

where $h (t) = 0$ for $γ < 0$ or $γ > n - 1$ .
The whole problem can be formulated as:

$min s.t. γ L^{2} (ω) \leq R (ω) \leq U^{2} (ω), \forall ω \in [0, π] R (ω) > 0, \forall ω \in [0, π]$

#🧪 Experiment

Result

📊 Google Benchmark Result

3: ------------------------------------------------------------------
3: Benchmark                        Time             CPU   Iterations
3: ------------------------------------------------------------------
3: BM_Lowpass_single_cut    627743505 ns    621639313 ns            1
3: BM_Lowpass_parallel_cut   30497546 ns     30469134 ns           24
3/4 Test #3: Bench_BM_lowpass .................   Passed    1.72 sec

Discrete Optimization

Why Discrete Convex Programming

Many engineering problems can be formulated as a convex/geometric programming, e.g. digital circuit sizing
Yet in an ASIC design, often there is only a limited set of choices from the cell library. In other words, some design variables are discrete.
The discrete version can be formulated as a Mixed-Integer Convex programming (MICP) by mapping the design variables to integers.

What's Wrong w/ Existing Methods?

Mostly based on relaxation.
Then use the relaxed solution as a lower bound and use the branch--and--bound method for the discrete optimal solution.
- 👉 Note: the branch-and-bound method does not utilize the convexity of the problem.
What if I can only evaluate constraints on discrete data? Workaround: convex fitting?

Mixed-Integer Convex Programming

Consider:

$minimize subject to f_{0} (x), f_{j} (x) \leq 0, \forall j = 1, 2, \dots x \in D$

where

$f_{0} (x)$ and $f_{j} (x)$ are "convex"
Some design variables are discrete.

🔮 Oracle Requirement

The oracle looks for the nearby discrete solution $x_{d}$ of $x_{c}$ with the cutting-plane: $g^{T} (x - x_{d}) + β \leq 0, β \geq 0, g \neq = 0$
👉 Note: the cut may be a shallow cut.
Suggestion: use different cuts as possible for each iteration (e.g. round-robin the evaluation of constraints)

Discrete Cut

Example - Multiplier-less FIR filter design (nnz=3)

Lowpass

Algorithms for Design-for-Manufacturability