When "Convex Optimization" Meets "Network Flow"

📖 Introduction

Overview

  • Network flow problems can be solved efficiently and have a wide range of applications.

  • Unfortunately, some problems may have other additional constraints that make them impossible to solve with current network flow techniques.

  • In addition, in some problems, the objective function is quasi-convex rather than convex.

  • In this lecture, we will investigate some problems that can still be solved by network flow techniques with the help of convex optimization.

Parametric Potential Problems

Parametric potential problems

Consider:

where and are concave.

Note: the parametric flow problems can be defined in a similar way.

Network flow says:

  • For fixed , the problem is feasible precisely when there exists no negative cycle

  • Negative cycle detection can be done efficiently using the Bellman-Ford-like methods

  • If a negative cycle is found, then

Convex Optimization says:

  • If both sub-gradients of and are known, then the bisection method can be used for solving the problem efficiently.

  • Also, for multi-parameter problems, the ellipsoid method can be used.

Quasi-convex Minimization

Consider:

where is quasi-convex and are concave.

Example of Quasi-Convex Functions

  • is quasi-convex on

  • is quasi-linear on

  • is quasi-concave on

  • Linear-fractional function:

    • =

    • dom =

  • Distance ratio function:

    • =

    • dom =

Convex Optimization says:

If is quasi-convex, there exists a family of functions such that:

  • is convex w.r.t. for fixed

  • is non-increasing w.r.t. for fixed

  • -sublevel set of is -sublevel set of , i.e., iff

For example:

  • with convex, concave , on dom ,

  • can take =

Convex Optimization says:

Consider a convex feasibility problem:

  • If feasible, we conclude that ;

  • If infeasible, .

Binary search on can be used for obtaining .

Quasi-convex Network Problem

  • Again, the feasibility problem ([eq:quasi]) can be solved efficiently by the bisection method or the ellipsoid method, together with the negatie cycle detection technique.

  • Any EDA's applications ???

Monotonic Minimization

  • Consider the following problem:

    where is non-decreasing.

  • The problem can be recast as:

    where is non-deceasing w.r.t. .

E.g. Yield-driven Optimization

  • Consider the following problem:

    where is a random variables.

  • Equivalent to the problem:

    where is non-deceasing w.r.t. .

E.g. Yield-driven Optimization (II)

  • Let is the cdf of .

  • Then:

  • The problem becomes:

Network flow says

  • Monotonic problem can be solved efficiently using cycle-cancelling methods such as Howard's algorithm.

Min-cost flow problems

Min-Cost Flow Problem (linear)

Consider:

  • some could be some could be .
  • is the incidence matrix of a network .

Conventional Algorithms

  • Augmented-path based:
    • Start with an infeasible solution
    • Inject minimal flow into the augmented path while maintaining infeasibility in each iteration
    • Stop when there is no flow to inject into the path.
  • Cycle cancelling based:
    • Start with a feasible solution
    • find a better sol'n , where is positive and is a negative cycle indicator.

General Descent Method

  1. Input: a starting dom
  2. Output:
  3. repeat
    1. Determine a descent direction .
    2. Line search. Choose a step size .
    3. Update.
  4. until a stopping criterion is satisfied.

Some Common Descent Directions

  • For convex problems, the search direction must satisfy .
  • Gradient descent:
  • Steepest descent:
    • .
    • = (un-normalized)
  • Newton's method:

Network flow says (II)

  • Here, there is a better way to choose !
  • Let , then we have:
  • In other words, choose to be a negative cycle with cost !
    • Simple negative cycle, or
    • Minimum mean cycle

Network flow says (III)

  • Step size is limited by the capacity constraints:
    • , for
    • , for
    • = min
  • If , the problem is unbounded.

Network flow says (IV)

  • An initial feasible solution can be obtained by a similar construction of the residual graph and cost vector.
  • The LEMON package implements this cycle cancelling algorithm.

Min-Cost Flow Convex Problem

  • Problem Formulation:
  • Exact line search:
  • Backtracking line search (with parameters )
    • starting from , repeat until
    • graphical interpretation: backtrack until

Network flow says (V)

  • The step size is further limited by the following:
  • In each iteration, choose as a negative cycle of , with cost such that

Quasi-convex Minimization (new)

  • Problem Formulation:

  • The problem can be recast as:

Convex Optimization says (II)

  • Consider a convex feasibility problem:
    • If feasible, we conclude that ;
    • If infeasible, .
  • Binary search on can be used for obtaining .

Network flow says (VI)

  • Choose as a negative cycle of with cost
  • If no negative cycle is found, and , we conclude that the problem is infeasible.
  • Iterate until becomes feasible, i.e. .

E.g. Linear-Fractional Cost

  • Problem Formulation:

  • The problem can be recast as:

Convex Optimization says (III)

  • Consider a convex feasibility problem:
    • If feasible, we conclude that ;
    • If infeasible, .
  • Binary search on can be used for obtaining .

Network flow says (VII)

  • Choose to be a negative cycle of with cost , i.e. 
  • If no negative cycle is found, and , we conclude that the problem is infeasible.
  • Iterate until .

E.g. Statistical Optimization

  • Consider the quasi-convex problem:

    • is random vector with mean and covariance .
    • Hence, is a random variable with mean and variance .

📈 Statistical Optimization

  • The problem can be recast as:

👉 Note: (convex quadratic constraint w.r.t )

Recall...

Recall that the gradient of is .

Problem w/ additional Constraints (new)

  • Problem Formulation:

E.g. Yield-driven Delay Padding

  • Consider the following problem:

    • : delay padding
    • : weight (determined by a trade-off curve of yield and buffer cost)
    • : Gaussian random variable with mean and variance .

E.g. Yield-driven Delay Padding (II)

  • The problem is equivalent to:

  • or its dual:

Recall ...

  • Yield drive CSS:

  • Delay padding

Considering Barrier Method

  • Approximation via logarithmic barrier:

    • where
    • Approximation improves as
    • Here,

Barrier Method

  • Input: a feasible , , , tolerance
  • Output:
  • repeat
    1. Centering step. Compute by minimizing
    2. Update .
    3. Increase .
  • until .

👉 Note: Centering is usually done by Newton's method in general.

Network flow says (VIII)

In the centering step, instead of using the Newton descent direction, we can replace it with a negative cycle on the residual graph.

Useful Skew Design Flow

Useful Skew Design: Why vs. Why Not {#sec:first}

Why not

Some common challenges when implementing useful skew design include:

  • need more engineer training
  • difficulty in building a balanced clock-tree
  • uncertainty in how to handle process variation and multi-corner multi-mode issues ..., etc.

Why

If these challenges are overcome and useful skew design is implemented correctly,

  • it can lead to less time spent on timing issues
  • get better chip performance or yield

Clock Arrival Time vs. Clock Skew

  • Clock signal runs periodically.

  • Thus, absolute clock arrival time is not so important.

  • Instead, the skew is more important in this scenario.

Useful Skew Design vs. Zero-Skew Design

  • "Critical cycle" instead of "critical path".
  • "Negative cycle" instead of "negative slack".
  • If there is a negative cycle, it means that there is no positive slack solution no matter how to schedule.
  • Others are pretty much the same.
  • Same design principle:
    • Always tackle the most critical one first!

Linear Programming vs. Network Flow Formulation

  • Linear programming formulation
    • can handle more complex constraints
  • Network flow formulation
    • usually more efficient
    • return the most critical cycle as a bonus
    • can handle quantized buffer delay (???)
  • Anyway, timing analysis is much more time-consuming than the optimization solving.

Target Skew vs. Actual Skew

Don't mess up these two concepts:

  • Target skew:
    • the skew we want to achieve in the scheduling stage.
    • Usually deterministic (we schedule a meeting at 10:00, rather than 10:00 34 minutes, right?)
  • Actual skew
    • the skew that the clock tree actually generates.
    • Can be formulated as a random variable.

A Simple Case

To warm up, let us start with a simple case:

  • Assume equal path delay variations.
  • Single-corner.
  • Before a clock tree is built.
  • No adjustable delay buffer (ADB).

Network

Definition (Network)

A network is a collection of finite-dimensional vector spaces of nodes and edges/arcs:

  • , where
  • , where

which satisfies 2 requirements:

  1. The boundary of each edge is comprised of the union of nodes
  2. The intersection of any edges is either empty or a boundary node of both edges.

Example

\begin{figure}[hp]
\centering
\input{lec07.files/network.tikz}
\caption{A network}%
\label{fig:network}
\end{figure}

Orientation

Definition (Orientation)

An orientation of an edge is an ordering of its boundary node , where

  • is called a source/initial node
  • is called a target/terminal node

Definition (Coherent)

Two orientations to be the same is called coherent

Node-edge Incidence Matrix

Definition (Incidence Matrix)

A matrix is a node-edge incidence matrix with entries:

Timing Constraint

  • Setup time constraint While this constraint destroyed, cycle time violation (zero clocking) occurs.
  • Hold time constraint While this constraint destroyed, race condition (double clocking) occurs.

Timing Constraint Graph

  • Create a graph (network) by
    • replacing the hold time constraint with an h-edge with cost from to , and
    • replacing the setup time constraint with an s-edge with cost from to .
  • Two sets of constraints stemming from clock skew definition:
    • The sum of skews for paths having the same starting and ending flip-flop to be the same;
    • The sum of clock skews of all cycles to be zero

Timing Constraint Graph (TCG)

Example circuit

\begin{figure}[h!]
\centering
\input{lec05.files/tcgraph.tikz}
\end{figure}

First Thing First

Meet all timing constraints

  • Find in
  • How to solve:
    1. Find a negative cycle, fix it.
    2. Iterate until no negative cycle is found.
  • Bellman-Ford-like algorithm (and its variants are publicly available):
    • Strongly suggest "Lazy Evaluation":
      • Don't do full timing analysis on the whole timing graph at the beginning!
      • Instead, perform timing analysis only when the algorithm needs.
    • Stop immediately whenever a negative cycle is detected.

Delay Padding (DP)

  • Delay padding is a technique that fixes the timing issue by intentionally solely "increasing" delays.
  • Usually formulated as:
    • Find in
  • If the objective is to minimize the sum of , then the problem is the dual of the standard min-cost flow problem, which can be solved efficiently by the network simplex algorithm (publicly available).
  • Beautiful right?

Delay Padding (II)

  • No, the above formulation is impractical.
  • In modern design, "inserting" a delay may mean swapping a faster cell with a slower cell from the cell library. Thus, no need to minimize the sum of .
  • More importantly, it may not be possible to find a position to insert delay for some delay paths.
  • Some papers consider only allowing insert delays to the max-delay path only. Some papers consider only allowing insert delays to both the max- and min-delay paths together only. None of them are perfect.

Delay Padding (III)

  • My suggestion. Instead of calculating the necessary and then look for the suitable position to insert, it is easier (and more flexible) to determine the position first and then calculate the suitable values.
  • It can be achieved by modifying the timing graph and solve a feasibility problem. Easy enough!
  • Quantized delay can be handled too (???).

Four possible ways to insert delay

\begin{figure}[htpb]
\centering
\subfigure[No delay can be inserted]{
\input{lec07.files/no_delay.tikz}
}
\subfigure[$p_s$, $p_h$ independently]{
\input{lec07.files/independent.tikz}
}
\subfigure[$p_s = p_h$]{
\input{lec07.files/same_delay.tikz}
}
\subfigure[$p_s \geq p_h$]{
\input{lec07.files/setup_greater.tikz}
}
\caption{}
\end{figure}

Delay Padding (cont'd)

  • If there exists a negative cycle in the modified timing graph, it implies that the timing problem cannot be fixed by simply the delay padding technique.
    • Then, try decrease , or increase
  • Be aware of the min-delay path is still the min-delay path after a certain amount of delay is inserted (how???).

Variation Issue

Yield-driven Clock Skew Scheduling

  • Assume all timing issues are fixed.
  • Now, how to schedule the arrival times to maximize yield?
  • According to the critical-first principle, we seek for the most critical cycle first.
  • The problem can be formulated as:
    • .
  • It is equivalent to the minimum mean cycle problem, which can be solved efficiently by for example Howard's algorithm (publicly available).

Minimum Balancing Algorithm

  • Then we evenly distribute the slack on this cycle.
  • To continue the next most critical cycle, we contract the first one into a "super vertex" and repeat the process.
  • The process stops when the timing graph remains only a single vertex.
  • The overall method is known as minimum balancing (MB) algorithm in the literature.

Example: Most timing-critical cycle

The most vulnerable timing constraint

\input{lec05.files/tcgraph2.tikz}

Example: Distribute the slack

  • Distribute the slack evenly along the most timing-critical cycle.
\input{lec05.files/tcgraph3.tikz}

img

Example: Distribute the slack (cont'd)

  • To determine the optimal slacks and skews for the rest of the graph, we replace the critical cycle with a super vertex.
\input{lec05.files/tcgraph4.tikz}
\input{lec05.files/tcgraph5.tikz}

img

Repeat the process iteratively

\input{lec05.files/tcgraph6.tikz}

img

Repeat the process iteratively (II)

\input{lec05.files/tcgraph7.tikz}

img

Final result

  • Skew = 0.75

  • Skew = -0.25

  • Skew = -0.5

  • Slack = 1.75

  • Slack = 1.75

  • Slack = 1

    where Slack = CP - D - T - Skew

\begin{tikzpicture}
\def \radius {2cm}

\node[draw, circle, fill=cyan!20] at ({30}:\radius) (n1) {0.25};
\node[draw, circle, fill=cyan!20] at ({150}:\radius) (n2) {0.75};
\node[draw, circle, fill=cyan!20] at ({270}:\radius) (n3) {0};

\path[->, >=latex] (n2) edge [bend left=45] node[above]{0.5} (n1);
\path[->, >=latex] (n3) edge [bend left=45] node[left]{2.5} (n2);
\path[->, >=latex] (n1) edge [bend left=45] node[right]{1.5} (n3);

\path[dashed, ->, >=latex] (n1) edge [bend left=15] node[above]{1.5} (n2);
\path[dashed, ->, >=latex] (n2) edge [bend left=15] node[left]{2} (n3);
\path[dashed, ->, >=latex] (n3) edge [bend left=15] node[right]{3} (n1);

\end{tikzpicture}

What the MB algorithm really give us?

  • The MB algorithm not only give us the scheduling solution, but also a tree-topology that represents the order of "criticality"!
\begin{figure}
\centering
\input{lec05.files/hierachy.tikz}
\end{figure}

Clock-tree Synthesis and Placement

  • I strongly suggest that the topology of the clock-tree precisely follows the order of "criticality"!
    • since the lower branch of clock-tree has smaller skew variation.
  • I also suggest that the placer should follow the topology of the clock-tree:
    • Physically place the registers of the same branch together.
    • The locality implies stronger correlation of variations and implies even smaller skew variation due to the cancellation effect.
    • Note that the current SSTA does not provide the correlation information, so this is the best you can do!

Second Example: Yield-driven Clock Skew Scheduling

  • Now assume that SSTA (or STA+OCV, POCV, AOCV) is performed.
  • Let (, ) be the (mean, variance) of
  • The most critical cycle can be obtained by solving:
  • It is equivalent to the minimum cost-to-time ratio cycle problem, which can be solved efficiently by for example Howard's algorithm (publicly available).
  • Gaussian distribution is assumed. For arbitrary distribution, see my DAC'08 paper.

What About the Correlation?

  • In the above formulation, we minimum the maximum possibility of timing violation of each individual timing constraint. So only individual delay distribution is needed.
  • Yes, the objective function is not the true timing-yield. But it is reasonable, easy to solve, and is the best you can do so far.

Multi-Corner Issue

Meet all timing constraints in Multi-Corner

  • Assume no Adjustable Delay Buffer (ADB)
  • Find in
  • Equivalent to finding in
  • Feasibility problem
  • How to solve:
    1. Find a negative cycle, fix it.
    2. Iterate until no negative cycle is found.
  • Better avoid fixing the timing issue corner-by-corner. Inducing ping-pong effect.

Delay padding (DP) in Multi-Corner

  • The problem CANNOT be formulated as a network flow problem. But still you can solve it by a linear programming formulation.
  • Or, decompose the problem into sub-problems for each corner.
  • Again use the modified timing graph technique.
  • Then, 's are shared variables of sub-problems.
  • If we solve each sub-problem individually, the solution will not agree with each other. Induce ping-pong effect.
  • Need something to drive the agreement.

Delay Padding (DP) in Multi-Corner (cont'd)

  • Follow the idea of dual decomposition: If a solution is above the average. then introduce a punishment cost. If a solution is below the average, then introduce a rewarding cost.
  • Then, each subproblem is a min-cost potential problem, which can be solved efficiently.
  • If some subproblems do not have feasible solutions, it implies that the problem cannot be fixed by simply delay padding.
  • The process repeats until all solutions converge. If not, it implies that the problem cannot be fixed by simply delay padding.

Yield-driven Clock Skew Scheduling

  • More or less the same as in Single Corner.

Clock-Tree Issue

Clock Tree Synthesis (CTS)

  • Construct merging location
    • DME algorithm, Elmore delay, buffer insertion
  • Some research on bounded-skew DME algorithm. But the algorithm is too complicated in my opinion.
  • If the previous stage is over-optimized, the clock tree is hard to implement. If it happens, some budgeting techniques should be invoked (engineering issue)
  • After a clock tree is constructed, more detailed timing (rather than Elmore delay) can be obtained via timing analysis.

Co-optimization Issue

  • After a clock tree is built, we have a clearer picture.
  • Should I perform the re-scheduling? And how?
  • Some papers suggest adding a factor to the timing constraint, say: .
  • Then the formulation is not a kind of network-flow, but may still be solvable by linear programming.
  • Need to investigate more deeply.

Adjustable Delay Buffer Issue

Adjustable delay buffers in Multi-Mode

  • Assume adjustable delay buffers are added solely to the clock tree
  • Hence, each mode can have a different set of arrival times.
  • Easier for clock skew scheduling, harder for clock-tree synthesis.

Meet timing constraint in Multi-Mode:

  • find in
  • Can be done in parallel.
  • find a negative cycle, fix it (do not need to know all at the beginning) for every mode in parallel.

Delay Padding (DP) in Multi-mode

  • Again use a modified timing graph technique.
  • NOT a network flow problem. Use LP, or
  • Dual decomposition -> min-cost potential problem for each mode
    • Only 's are shared variables.
    • Initial feasible solution obtained by the single-mode method
      • A negative cycle => problem cannot be fixed by DP
  • Not converge => problem cannot be fixed by DP
    • Try decrease , or increase

Yield-driven Clock Skew Scheduling

  • Pretty much the same as Single-Mode.

Difficulty in ADB Multi-Mode Design

  • How to design the clock-tree?
  • What is the order of criticality?
  • How to determine the minimum range of ADB?