When "Convex Optimization" Meets "Network Flow"

📖 Introduction

Overview

Network flow problems can be solved efficiently and have a wide range of applications.
Unfortunately, some problems may have other additional constraints that make them impossible to solve with current network flow techniques.
In addition, in some problems, the objective function is quasi-convex rather than convex.
In this lecture, we will investigate some problems that can still be solved by network flow techniques with the help of convex optimization.

Parametric Potential Problems

Parametric potential problems

Consider:

$maximize subject to g (β), y \leq d (β), A u = y,$

where $g (β)$ and $d (β)$ are concave.

Note: the parametric flow problems can be defined in a similar way.

Network flow says:

For fixed $β$ , the problem is feasible precisely when there exists no negative cycle
Negative cycle detection can be done efficiently using the Bellman-Ford-like methods
If a negative cycle $C$ is found, then $\sum_{(i, j) \in C} d_{ij} (β) < 0$

Convex Optimization says:

If both sub-gradients of $g (β)$ and $d (β)$ are known, then the bisection method can be used for solving the problem efficiently.
Also, for multi-parameter problems, the ellipsoid method can be used.

Quasi-convex Minimization

Consider: $maximize subject to f (β), y \leq d (β), A u = y,$

where $f (β)$ is quasi-convex and $d (β)$ are concave.

Example of Quasi-Convex Functions

$∣ y ∣$ is quasi-convex on $R$
$lo g (y)$ is quasi-linear on $R_{++}$
$f (x, y) = x y$ is quasi-concave on $R_{++}^{2}$
Linear-fractional function:
- $f (x)$ = $(a^{T} x + b) / (c^{T} x + d)$
- dom $f$ = ${x ∣ c^{T} x + d > 0}$
Distance ratio function:
- $f (x)$ = $∥ x - a ∥_{2} /∥ x - b ∥_{2}$
- dom $f$ = ${x ∣ ∥ x - a ∥_{2} \leq ∥ x - b ∥_{2}}$

Convex Optimization says:

If $f$ is quasi-convex, there exists a family of functions $ϕ_{t}$ such that:

$ϕ_{t} (β)$ is convex w.r.t. $β$ for fixed $t$
$ϕ_{t} (β)$ is non-increasing w.r.t. $t$ for fixed $β$
$t$ -sublevel set of $f$ is $0$ -sublevel set of $ϕ_{t}$ , i.e., $f (β) \leq t$ iff $ϕ_{t} (β) \leq 0$

For example:

$f (β) = p (β) / q (β)$ with $p$ convex, $q$ concave $p (β) \geq 0$ , $q (β) > 0$ on dom $f$ ,
can take $ϕ_{t} (β)$ = $p (β) - t \cdot q (β)$

Convex Optimization says:

Consider a convex feasibility problem: $find s. t. f (β), ϕ_{t} (β) \leq 0, y \leq d (β), A u = y,$

If feasible, we conclude that $t \geq p^{*}$ ;
If infeasible, $t < p^{*}$ .

Binary search on $t$ can be used for obtaining $p^{*}$ .

Quasi-convex Network Problem

Again, the feasibility problem ([eq:quasi]) can be solved efficiently by the bisection method or the ellipsoid method, together with the negatie cycle detection technique.
Any EDA's applications ???

Monotonic Minimization

Consider the following problem: $minimize subject to max_{ij} f_{ij} (y_{ij}), A u = y,$

where $f_{ij} (y_{ij})$ is non-decreasing.
The problem can be recast as:

$maximize subject to β, y \leq f^{- 1} (β), A u = y,$

where $f^{- 1} (β)$ is non-deceasing w.r.t. $β$ .

E.g. Yield-driven Optimization

Consider the following problem:

$maximize subject to min_{ij} Pr (y_{ij} \leq \tilde{d}_{ij}) A u = y,$

where $\tilde{d}_{ij}$ is a random variables.
Equivalent to the problem:

$maximize subject to β, β \leq Pr (y_{ij} \leq \tilde{d}_{ij}), A u = y,$

where $f_{ij}^{- 1} (β)$ is non-deceasing w.r.t. $β$ .

E.g. Yield-driven Optimization (II)

Let $F (x)$ is the cdf of $\tilde{d}$ .
Then: $\Rightarrow \Rightarrow β \leq Pr (y_{ij} \leq \tilde{d}_{ij}) \leq t β \leq 1 - F_{ij} (y_{ij}) y_{ij} \leq F_{ij}^{- 1} (1 - β)$
The problem becomes:

$maximize subject to β, y_{ij} \leq F_{ij}^{- 1} (1 - β), A u = y,$

Network flow says

Monotonic problem can be solved efficiently using cycle-cancelling methods such as Howard's algorithm.

Min-cost flow problems

Min-Cost Flow Problem (linear)

Consider:

$min s. t. d^{T} x + p c^{-} \leq x \leq c^{+}, A^{T} x = b, b (V) = 0$

some $c^{+}$ could be $+ \infty$ some $c^{-}$ could be $- \infty$ .
$A^{T}$ is the incidence matrix of a network $G$ .

Conventional Algorithms

Augmented-path based:
- Start with an infeasible solution
- Inject minimal flow into the augmented path while maintaining infeasibility in each iteration
- Stop when there is no flow to inject into the path.
Cycle cancelling based:
- Start with a feasible solution $x_{0}$
- find a better sol'n $x_{1} = x_{0} + α △ x$ , where $α$ is positive and $△ x$ is a negative cycle indicator.

General Descent Method

Input: a starting $x \in$ dom $f$
Output: $x^{*}$
repeat
1. Determine a descent direction $p$ .
2. Line search. Choose a step size $α > 0$ .
3. Update. $x := x + α p$
until a stopping criterion is satisfied.

Some Common Descent Directions

For convex problems, the search direction must satisfy $\nabla f (x)^{T} p < 0$ .
Gradient descent:
- $p = - \nabla f (x)^{T}$
Steepest descent:
- $△ x^{n s d} = arg min {\nabla f (x)^{T} v ∣ ∥ v ∥ = 1}$ .
- $△ x^{s d}$ = $∥\nabla f (x) ∥△ x^{n s d}$ (un-normalized)
Newton's method:
- $p = - \nabla^{2} f (x)^{- 1} \nabla f (x)$

Network flow says (II)

Here, there is a better way to choose $p$ !
Let $x := x + α p$ , then we have: $min s. t. d^{T} x_{0} + α d^{T} p - x_{0} \leq α p \leq c - x_{0} A^{T} p = 0 \Rightarrow d^{T} < 0 \Rightarrow residual graph \Rightarrow p is a cycle!$
In other words, choose $p$ to be a negative cycle with cost $d$ !
- Simple negative cycle, or
- Minimum mean cycle

Network flow says (III)

Step size is limited by the capacity constraints:
- $α_{1} = min_{ij} {c^{+} - x_{0}}$ , for $△ x_{ij} > 0$
- $α_{2} = min_{ij} {x_{0} - c^{-}}$ , for $△ x_{ij} < 0$
- $α_{lin}$ = min ${α_{1}, α_{2}}$
If $α_{lin} = + \infty$ , the problem is unbounded.

Network flow says (IV)

An initial feasible solution can be obtained by a similar construction of the residual graph and cost vector.
The LEMON package implements this cycle cancelling algorithm.

Min-Cost Flow Convex Problem

Problem Formulation: $min s. t. f (x) 0 \leq x \leq c, A^{T} x = b, b (V) = 0$

Common Types of Line Search

Exact line search:
- $t = arg min_{t > 0} f (x + t △ x)$
Backtracking line search (with parameters $α \in (0, 1/2), β \in (0, 1)$ )
- starting from $t = 1$ , repeat $t := βt$ until $f (x + t △ x) < f (x) + α t \nabla f (x)^{T} △ x$
- graphical interpretation: backtrack until $t \leq t_{0}$

Network flow says (V)

The step size is further limited by the following:
- $α_{cvx} = min {α_{lin}, t}$
In each iteration, choose $△ x$ as a negative cycle of $G_{x}$ , with cost $\nabla f (x)$ such that $\nabla f (x)^{T} △ x < 0$

Quasi-convex Minimization (new)

Problem Formulation: $min s. t. f (x) 0 \leq x \leq c, A^{T} x = b, b (V) = 0$
The problem can be recast as: $min s. t. t f (x) \leq t, 0 \leq x \leq c, A^{T} x = b, b (V) = 0$

Convex Optimization says (II)

Consider a convex feasibility problem: $find s. t. x ϕ_{t} (x) \leq 0, 0 \leq x \leq c, A^{T} x = b, b (V) = 0$
- If feasible, we conclude that $t \geq p^{*}$ ;
- If infeasible, $t < p^{*}$ .
Binary search on $t$ can be used for obtaining $p^{*}$ .

Network flow says (VI)

Choose $△ x$ as a negative cycle of $G_{x}$ with cost $\nabla ϕ_{t} (x)$
If no negative cycle is found, and $ϕ_{t} (x) > 0$ , we conclude that the problem is infeasible.
Iterate until $x$ becomes feasible, i.e. $ϕ_{t} (x) \leq 0$ .

E.g. Linear-Fractional Cost

Problem Formulation: $min s. t. (e^{T} x + f) / (g^{T} x + h) 0 \leq x \leq c, A^{T} x = b, b (V) = 0$
The problem can be recast as: $min s. t. t (e^{T} x + f) - t (g^{T} x + h) \leq 0 0 \leq x \leq c, A^{T} x = b, b (V) = 0$

Convex Optimization says (III)

Consider a convex feasibility problem: $find s. t. x (e - t \cdot g)^{T} x + (f - t \cdot h) \leq 0, 0 \leq x \leq c, A^{T} x = b, b (V) = 0$
- If feasible, we conclude that $t \geq p^{*}$ ;
- If infeasible, $t < p^{*}$ .
Binary search on $t$ can be used for obtaining $p^{*}$ .

Network flow says (VII)

Choose $△ x$ to be a negative cycle of $G_{x}$ with cost $(e - t \cdot g)$ , i.e. $(e - t \cdot g)^{T} △ x < 0$
If no negative cycle is found, and $(e - t \cdot g)^{T} x_{0} + (f - t \cdot h) > 0$ , we conclude that the problem is infeasible.
Iterate until $(e - t \cdot g)^{T} x_{0} + (f - t \cdot h) \leq 0$ .

E.g. Statistical Optimization

Consider the quasi-convex problem:

$min s. t. Pr (d^{T} x > α) 0 \leq x \leq c, A^{T} x = b, b (V) = 0$
- $d$ is random vector with mean $d$ and covariance $Σ$ .
- Hence, $d^{T} x$ is a random variable with mean $d^{T} x$ and variance $x^{T} Σ x$ .

📈 Statistical Optimization

The problem can be recast as: $min s. t. t Pr (d^{T} x > α) \leq t 0 \leq x \leq c, A^{T} x = b, b (V) = 0$

👉 Note: $\Rightarrow Pr (d^{T} x > α) \leq t d^{T} x + F^{- 1} (1 - t) ∥ Σ^{1/2} x ∥_{2} \leq α$ (convex quadratic constraint w.r.t $x$ )

Recall...

Recall that the gradient of $d^{T} x + F^{- 1} (1 - t) ∥ Σ^{1/2} x ∥_{2}$ is $d + F^{- 1} (1 - t) (∥ Σ^{1/2} x ∥_{2})^{- 1} Σ x$ .

Problem w/ additional Constraints (new)

Problem Formulation: $min s. t. f (x) 0 \leq x \leq c, A^{T} x = b, b (V) = 0 s^{T} x \leq γ$

E.g. Yield-driven Delay Padding

Consider the following problem: $maximize subject to γ β - c^{T} p, β \leq Pr (y_{ij} \leq d_{ij} + p_{ij}), A u = y, p \geq 0$
- $p$ : delay padding
- $γ$ : weight (determined by a trade-off curve of yield and buffer cost)
- $d_{ij}$ : Gaussian random variable with mean $d_{ij}$ and variance $s_{ij}$ .

E.g. Yield-driven Delay Padding (II)

The problem is equivalent to: $max s.t. γ β - c^{T} p, y \leq d - β s + p, A u = y, p \geq 0$
or its dual: $min s.t. d^{T} x 0 \leq x \leq c, A^{T} x = b, b (V) = 0 s^{T} x \leq γ$

Recall ...

Yield drive CSS: $max s.t. β, y \leq d - β s, A u = y,$
Delay padding $max s.t. - c^{T} p, y \leq d + p, A u = y, p \geq 0$

Considering Barrier Method

Approximation via logarithmic barrier:

$min s.t. f (x) + (1/ t) ϕ (x) 0 \leq x \leq c, A^{T} x = b, b (V) = 0$
- where $ϕ (x) = - lo g (γ - s^{T} x)$
- Approximation improves as $t \to \infty$
- Here, $\nabla ϕ (x) = s / (γ - s^{T} x)$

Barrier Method

Input: a feasible $x$ , $t := t^{(0)}$ , $μ > 1$ , tolerance $ε > 0$
Output: $x^{*}$
repeat
1. Centering step. Compute $x^{*} (t)$ by minimizing $t f + ϕ$
2. Update $x := x^{*} (t)$ .
3. Increase $t$ . $t := μ t$
until $1/ t < ε$ .

👉 Note: Centering is usually done by Newton's method in general.

Network flow says (VIII)

In the centering step, instead of using the Newton descent direction, we can replace it with a negative cycle on the residual graph.

Useful Skew Design Flow

Useful Skew Design: Why vs. Why Not {#sec:first}

Why not

Some common challenges when implementing useful skew design include:

need more engineer training
difficulty in building a balanced clock-tree
uncertainty in how to handle process variation and multi-corner multi-mode issues ..., etc.

Why

If these challenges are overcome and useful skew design is implemented correctly,

it can lead to less time spent on timing issues
get better chip performance or yield

Clock Arrival Time vs. Clock Skew

Clock signal runs periodically.
Thus, absolute clock arrival time $u_{i}$ is not so important.
Instead, the skew $y_{ij} = u_{i} - u_{j}$ is more important in this scenario.

Useful Skew Design vs. Zero-Skew Design

"Critical cycle" instead of "critical path".
"Negative cycle" instead of "negative slack".
If there is a negative cycle, it means that there is no positive slack solution no matter how to schedule.
Others are pretty much the same.
Same design principle:
- Always tackle the most critical one first!

Linear Programming vs. Network Flow Formulation

Linear programming formulation
- can handle more complex constraints
Network flow formulation
- usually more efficient
- return the most critical cycle as a bonus
- can handle quantized buffer delay (???)
Anyway, timing analysis is much more time-consuming than the optimization solving.

Target Skew vs. Actual Skew

Don't mess up these two concepts:

Target skew:
- the skew we want to achieve in the scheduling stage.
- Usually deterministic (we schedule a meeting at 10:00, rather than 10:00 $\pm$ 34 minutes, right?)
Actual skew
- the skew that the clock tree actually generates.
- Can be formulated as a random variable.

A Simple Case

To warm up, let us start with a simple case:

Assume equal path delay variations.
Single-corner.
Before a clock tree is built.
No adjustable delay buffer (ADB).

Network

Definition (Network)

A network is a collection of finite-dimensional vector spaces of nodes and edges/arcs:

$V = {v_{1}, v_{2}, \dots, v_{N}}$ , where $∣ V ∣ = N$
$E = {e_{1}, e_{2}, e_{3}, \dots, e_{M}}$ , where $∣ E ∣ = M$

which satisfies 2 requirements:

The boundary of each edge is comprised of the union of nodes
The intersection of any edges is either empty or a boundary node of both edges.

Example

\begin{figure}[hp]
\centering
\input{lec07.files/network.tikz}
\caption{A network}%
\label{fig:network}
\end{figure}

Orientation

Definition (Orientation)

An orientation of an edge is an ordering of its boundary node $(s, t)$ , where

$s$ is called a source/initial node
$t$ is called a target/terminal node

Definition (Coherent)

Two orientations to be the same is called coherent

Node-edge Incidence Matrix

Definition (Incidence Matrix)

A $N \times M$ matrix $A^{T}$ is a node-edge incidence matrix with entries: $A (i, j) = ⎩ ⎨ ⎧ + 1 - 1 0 if e_{i} is coherent with v_{j}, if e_{i} is not coherent with v_{j}, otherwise.$

Timing Constraint

Setup time constraint $y_{skew} (i, f) \leq T_{CP} - D_{i f} - T_{setup} = u_{i f}$ While this constraint destroyed, cycle time violation (zero clocking) occurs.
Hold time constraint $y_{skew} (i, f) \geq T_{hold} - d_{i f} = l_{i f}$ While this constraint destroyed, race condition (double clocking) occurs.

Timing Constraint Graph

Create a graph (network) by
- replacing the hold time constraint with an h-edge with cost $- (T_{hold} - d_{ij})$ from $FF_{i}$ to $FF_{j}$ , and
- replacing the setup time constraint with an s-edge with cost $T_{CP} - D_{ij} - T_{setup}$ from $FF_{j}$ to $FF_{i}$ .
Two sets of constraints stemming from clock skew definition:
- The sum of skews for paths having the same starting and ending flip-flop to be the same;
- The sum of clock skews of all cycles to be zero

Timing Constraint Graph (TCG)

Example circuit

\begin{figure}[h!]
\centering
\input{lec05.files/tcgraph.tikz}
\end{figure}

First Thing First

Meet all timing constraints

Find $y$ in ${y \in R^{n} ∣ y \leq d, A u = y}$
How to solve:
1. Find a negative cycle, fix it.
2. Iterate until no negative cycle is found.
Bellman-Ford-like algorithm (and its variants are publicly available):
- Strongly suggest "Lazy Evaluation":
  - Don't do full timing analysis on the whole timing graph at the beginning!
  - Instead, perform timing analysis only when the algorithm needs.
- Stop immediately whenever a negative cycle is detected.

Delay Padding (DP)

Delay padding is a technique that fixes the timing issue by intentionally solely "increasing" delays.
Usually formulated as:
- Find $p, y$ in ${p, y \in R^{n} ∣ y \leq d + p, A u = y, p \geq 0}$
If the objective is to minimize the sum of $p$ , then the problem is the dual of the standard min-cost flow problem, which can be solved efficiently by the network simplex algorithm (publicly available).
Beautiful right?

Delay Padding (II)

No, the above formulation is impractical.
In modern design, "inserting" a delay may mean swapping a faster cell with a slower cell from the cell library. Thus, no need to minimize the sum of $p$ .
More importantly, it may not be possible to find a position to insert delay for some delay paths.
Some papers consider only allowing insert delays to the max-delay path only. Some papers consider only allowing insert delays to both the max- and min-delay paths together only. None of them are perfect.

Delay Padding (III)

My suggestion. Instead of calculating the necessary $p^{'} s$ and then look for the suitable position to insert, it is easier (and more flexible) to determine the position first and then calculate the suitable values.
It can be achieved by modifying the timing graph and solve a feasibility problem. Easy enough!
Quantized delay can be handled too (???).

Four possible ways to insert delay

\begin{figure}[htpb]
\centering
\subfigure[No delay can be inserted]{
\input{lec07.files/no_delay.tikz}
}
\subfigure[$p_s$, $p_h$ independently]{
\input{lec07.files/independent.tikz}
}
\subfigure[$p_s = p_h$]{
\input{lec07.files/same_delay.tikz}
}
\subfigure[$p_s \geq p_h$]{
\input{lec07.files/setup_greater.tikz}
}
\caption{}
\end{figure}

Delay Padding (cont'd)

If there exists a negative cycle in the modified timing graph, it implies that the timing problem cannot be fixed by simply the delay padding technique.
- Then, try decrease $D_{ij}$ , or increase $T_{CP}$
Be aware of the min-delay path is still the min-delay path after a certain amount of delay is inserted (how???).

Variation Issue

Yield-driven Clock Skew Scheduling

Assume all timing issues are fixed.
Now, how to schedule the arrival times to maximize yield?
According to the critical-first principle, we seek for the most critical cycle first.
The problem can be formulated as:
- $max {β \in R ∣ y \leq d - β, A u = y}$ .
It is equivalent to the minimum mean cycle problem, which can be solved efficiently by for example Howard's algorithm (publicly available).

Minimum Balancing Algorithm

Then we evenly distribute the slack on this cycle.
To continue the next most critical cycle, we contract the first one into a "super vertex" and repeat the process.
The process stops when the timing graph remains only a single vertex.
The overall method is known as minimum balancing (MB) algorithm in the literature.

Example: Most timing-critical cycle

The most vulnerable timing constraint

\input{lec05.files/tcgraph2.tikz}

Example: Distribute the slack

Distribute the slack evenly along the most timing-critical cycle.

\input{lec05.files/tcgraph3.tikz}

Example: Distribute the slack (cont'd)

To determine the optimal slacks and skews for the rest of the graph, we replace the critical cycle with a super vertex.

\input{lec05.files/tcgraph4.tikz}
\input{lec05.files/tcgraph5.tikz}

Repeat the process iteratively

\input{lec05.files/tcgraph6.tikz}

Repeat the process iteratively (II)

\input{lec05.files/tcgraph7.tikz}

Final result

Skew $_{12}$ = 0.75
Skew $_{23}$ = -0.25
Skew $_{31}$ = -0.5
Slack $_{12}$ = 1.75
Slack $_{23}$ = 1.75
Slack $_{31}$ = 1

where Slack $_{ij}$ = CP - D $_{ij}$ - T $_{setup}$ - Skew $_{ij}$

\begin{tikzpicture}
\def \radius {2cm}

\node[draw, circle, fill=cyan!20] at ({30}:\radius) (n1) {0.25};
\node[draw, circle, fill=cyan!20] at ({150}:\radius) (n2) {0.75};
\node[draw, circle, fill=cyan!20] at ({270}:\radius) (n3) {0};

\path[->, >=latex] (n2) edge [bend left=45] node[above]{0.5} (n1);
\path[->, >=latex] (n3) edge [bend left=45] node[left]{2.5} (n2);
\path[->, >=latex] (n1) edge [bend left=45] node[right]{1.5} (n3);

\path[dashed, ->, >=latex] (n1) edge [bend left=15] node[above]{1.5} (n2);
\path[dashed, ->, >=latex] (n2) edge [bend left=15] node[left]{2} (n3);
\path[dashed, ->, >=latex] (n3) edge [bend left=15] node[right]{3} (n1);

\end{tikzpicture}

What the MB algorithm really give us?

The MB algorithm not only give us the scheduling solution, but also a tree-topology that represents the order of "criticality"!

\begin{figure}
\centering
\input{lec05.files/hierachy.tikz}
\end{figure}

Clock-tree Synthesis and Placement

I strongly suggest that the topology of the clock-tree precisely follows the order of "criticality"!
- since the lower branch of clock-tree has smaller skew variation.
I also suggest that the placer should follow the topology of the clock-tree:
- Physically place the registers of the same branch together.
- The locality implies stronger correlation of variations and implies even smaller skew variation due to the cancellation effect.
- Note that the current SSTA does not provide the correlation information, so this is the best you can do!

Second Example: Yield-driven Clock Skew Scheduling

Now assume that SSTA (or STA+OCV, POCV, AOCV) is performed.
Let ( $\overset{ˉ}{d}$ , $s$ ) be the (mean, variance) of $d$
The most critical cycle can be obtained by solving:
- $max {β \in R ∣ y \leq \overset{ˉ}{d} - β s, A u = y}$
It is equivalent to the minimum cost-to-time ratio cycle problem, which can be solved efficiently by for example Howard's algorithm (publicly available).
Gaussian distribution is assumed. For arbitrary distribution, see my DAC'08 paper.

What About the Correlation?

In the above formulation, we minimum the maximum possibility of timing violation of each individual timing constraint. So only individual delay distribution is needed.
Yes, the objective function is not the true timing-yield. But it is reasonable, easy to solve, and is the best you can do so far.

Multi-Corner Issue

Meet all timing constraints in Multi-Corner

Assume no Adjustable Delay Buffer (ADB)
Find $y$ in ${y \in R^{n} ∣ y \leq d^{(k)}, A u = y, \forall k \in [1.. K]}$
Equivalent to finding $y$ in ${y \in R^{n} ∣ y \leq min_{k} {d^{(k)}}, A u = y}$
Feasibility problem
How to solve:
1. Find a negative cycle, fix it.
2. Iterate until no negative cycle is found.
Better avoid fixing the timing issue corner-by-corner. Inducing ping-pong effect.

Delay padding (DP) in Multi-Corner

The problem CANNOT be formulated as a network flow problem. But still you can solve it by a linear programming formulation.
Or, decompose the problem into sub-problems for each corner.
Again use the modified timing graph technique.
Then, $y$ 's are shared variables of sub-problems.
If we solve each sub-problem individually, the solution will not agree with each other. Induce ping-pong effect.
Need something to drive the agreement.

Delay Padding (DP) in Multi-Corner (cont'd)

Follow the idea of dual decomposition: If a solution is above the average. then introduce a punishment cost. If a solution is below the average, then introduce a rewarding cost.
Then, each subproblem is a min-cost potential problem, which can be solved efficiently.
If some subproblems do not have feasible solutions, it implies that the problem cannot be fixed by simply delay padding.
The process repeats until all solutions converge. If not, it implies that the problem cannot be fixed by simply delay padding.

Yield-driven Clock Skew Scheduling

$max {β \in R ∣ y \leq d^{(k)} - β s, A u = y, \forall k \in [1.. K]}$
More or less the same as in Single Corner.

Clock-Tree Issue

Clock Tree Synthesis (CTS)

Construct merging location
- DME algorithm, Elmore delay, buffer insertion
Some research on bounded-skew DME algorithm. But the algorithm is too complicated in my opinion.
If the previous stage is over-optimized, the clock tree is hard to implement. If it happens, some budgeting techniques should be invoked (engineering issue)
After a clock tree is constructed, more detailed timing (rather than Elmore delay) can be obtained via timing analysis.

Co-optimization Issue

After a clock tree is built, we have a clearer picture.
Should I perform the re-scheduling? And how?
Some papers suggest adding a factor to the timing constraint, say: $1.2 u_{i} - 0.8 u_{j} \leq w_{ij}$ .
Then the formulation is not a kind of network-flow, but may still be solvable by linear programming.
Need to investigate more deeply.

Adjustable Delay Buffer Issue

Adjustable delay buffers in Multi-Mode

Assume adjustable delay buffers are added solely to the clock tree
Hence, each mode can have a different set of arrival times.
Easier for clock skew scheduling, harder for clock-tree synthesis.

Meet timing constraint in Multi-Mode:

find $y^{(m)}$ in ${y^{(m)} \in R^{n} ∣ y^{(m)} \leq d^{(m)}, A u^{(m)} = y^{(m)}, \forall m \in [1.. M]}$
Can be done in parallel.
find a negative cycle, fix it (do not need to know all $d_{i}^{(m)}$ at the beginning) for every mode in parallel.

Delay Padding (DP) in Multi-mode

Again use a modified timing graph technique.
NOT a network flow problem. Use LP, or
Dual decomposition -> min-cost potential problem for each mode
- Only $p$ 's are shared variables.
- Initial feasible solution obtained by the single-mode method
  - A negative cycle => problem cannot be fixed by DP
Not converge => problem cannot be fixed by DP
- Try decrease $D_{ij}$ , or increase $T_{CP}$

Yield-driven Clock Skew Scheduling

$max {β \in R ∣ y^{(m)} \leq d^{(m)} - β s, A u^{(m)} = y^{(m)}, \forall m \in [1.. M]}$
Pretty much the same as Single-Mode.

Difficulty in ADB Multi-Mode Design

How to design the clock-tree?
What is the order of criticality?
How to determine the minimum range of ADB?

Algorithms for Design-for-Manufacturability