Lecture 1a: 可制造性设计算法

课程概要

任课教师: 陆伟成, 📪 联系方式: luk@fudan.edu.cn, 📍 办公地址: 微电子楼 383 室. 📆 办公时间 F6-F8 或预约
Lecture: 📆 W8-W10, 📍Z2212
Lecture notes will be available at https://luk036.github.io/algo4dfm/

👓 教学目的

了解超大规模集成电路可制造性设计的发展
掌握可制造性设计自动化的一些实用算法及其基本原理
宁缺勿滥 -- avoid "no-time-to-think" syndrome

教学内容

简介：可制造性设计的发展概况，工艺参数变动对芯片性能影响的问题
基本软件开发原理，电子设计自动化，
基本算法原理：算法范式、算法复杂度，优化算法简介
统计与空间相关性提取：参数与非参数方法
鲁棒性电路优化算法，仿射算术、鲁棒几何规划问题。
基于统计时序分析的时钟偏差安排
交替相移掩模简介，版图相位分配问题，Hadlock 算法
光刻问题，双/多图案技术，
混合光刻技术
Redundant Via Insertion

📚 Reference books

Michael Orshansky, Sani R. Nassif, and Duane Boning (2008) Design for Manufacturability and Statistical Design: A Constructive Approach
Artur Balasinski (2014) Design for Manufacturability
Bei Yu and David Z. Pan (2016) Design for Manufacturability with Advanced Lithography
G. Ausiello et al. Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties, Springer-Verlag, 1999.
N. Sherwani, Algorithms for VLSI Physical Design Automation (3rd version), KAP, 2004.

课程考核及成绩评定

考核指标	权重	评定标准
出勤	10%	平时上课的参与度
课堂表现	10%	上课提问和问题回答
作业/实验	40%	PPT 讲演
课程论文	40%	论文阅读报告

任课教师简介

Working on "DfM" for over 10 years.
Working on large-scale software development for almost 20 years.
Working on algorithm design for over 20 years.

Ye Zhang, Wai-Shing Luk et al. Network flow based cut redistribution and insertion for advanced 1D layout design, Proceedings of 2017 Asia and South Pacific Design Automation Conference (ASP-DAC), (awarded best paper nomination)
Yunfeng Yang, Wai-Shing Luk et al. Layout Decomposition Co-optimization for Hybrid E-beam and Multiple Patterning Lithography, in Proceeding of the 20th Asia and South Pacific Design Automation Conference (2015)
Xingbao Zhou, Wai-Shing Luk, et. al. "Multi-Parameter Clock Skew Scheduling." Integration, the VLSI Journal (accepted).
Ye Zhang, Wai-Shing Luk et al. Layout Decomposition with Pairwise Coloring for Multiple Patterning Lithography, Proceedings of 2013 International Conference on Computer Aided-Design (awarded best paper nomination)
魏晗一，陆伟成，一种用于双成像光刻中的版图分解算法，《复旦学报(自然科学版)》，2013
Ye Zhang, Wai-Shing Luk et al. Network flow based cut redistribution and insertion for advanced 1D layout design, Proceedings of 2017 Asia and South Pacific Design Automation Conference (ASP-DAC), (awarded best paper nomination)
Yunfeng Yang, Wai-Shing Luk et al. Layout Decomposition Co-optimization for Hybrid E-beam and Multiple Patterning Lithography, in Proceeding of the 20th Asia and South Pacific Design Automation Conference (2015)
Xingbao Zhou, Wai-Shing Luk, et. al. "Multi-Parameter Clock Skew Scheduling." Integration, the VLSI Journal (accepted).
Ye Zhang, Wai-Shing Luk et al. Layout Decomposition with Pairwise Coloring for Multiple Patterning Lithography, Proceedings of 2013 International Conference on Computer Aided-Design (awarded best paper nomination)
魏晗一，陆伟成，一种用于双成像光刻中的版图分解算法，《复旦学报(自然科学版)》，2013

Yanling Zhi, Wai-Shing Luk, Yi Wang, Changhao Yan, Xuan Zeng, Yield-Driven Clock Skew Scheduling for Arbitrary Distributions of Critical Path Delays, IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E95-A, No.12, pp.2172-2181, 2012.
李佳宁，陆伟成，片内偏差空间相关性的非参数化估计方法，《复旦学报(自然科学版)》 Non-parametric Approach for Spatial Correlation Estimation of Intra-die Variation, 2012，vol. 51, no 1, pp. 27-32
Wai-Shing Luk and Huiping Huang, Fast and Lossless Graph Division Method for Layout Decomposition Using SPQR-Tree, Proceedings of 2010 International Conference on Computer Aided-Design, pp. 112-115, 2010
Yanling Zhi, Wai-Shing Luk, Yi Wang, Changhao Yan, Xuan Zeng, Yield-Driven Clock Skew Scheduling for Arbitrary Distributions of Critical Path Delays, IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E95-A, No.12, pp.2172-2181, 2012.
李佳宁，陆伟成，片内偏差空间相关性的非参数化估计方法，《复旦学报(自然科学版)》 Non-parametric Approach for Spatial Correlation Estimation of Intra-die Variation, 2012，vol. 51, no 1, pp. 27-32
Wai-Shing Luk and Huiping Huang, Fast and Lossless Graph Division Method for Layout Decomposition Using SPQR-Tree, Proceedings of 2010 International Conference on Computer Aided-Design, pp. 112-115, 2010

Qiang Fu, Wai-Shing Luk et al., Intra-die Spatial Correlation Extraction with Maximum Likelihood Estimation Method for Multiple Test Chips, IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences, Vol.E92-A,No.12,pp.-,Dec. 2009.
Qiang Fu, Wai-Shing Luk et al., Characterizing Intra-Die Spatial Correlation Using Spectral Density Fitting Method, IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences, Vol. 92-A(7): 1652-1659, 2009.
Yi Wang, Wai-Shing Luk, et al., Timing Yield Driven Clock Skew Scheduling Considering non-Gaussian Distributions of Critical Path Delays, Proceedings of the 45th Design Automation Conference, USA, pp. 223-226, 2008.
宋宇, 刘学欣, 陆伟成, 唐璞山, 一种鲁棒性几何规划新方法设计两级运放, 微电子学与计算机, 2008 年 25 卷 3 期, 175-181 页.
Qiang Fu, Wai-Shing Luk et al., Intra-die Spatial Correlation Extraction with Maximum Likelihood Estimation Method for Multiple Test Chips, IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences, Vol.E92-A,No.12,pp.-,Dec. 2009.
Qiang Fu, Wai-Shing Luk et al., Characterizing Intra-Die Spatial Correlation Using Spectral Density Fitting Method, IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences, Vol. 92-A(7): 1652-1659, 2009.
Yi Wang, Wai-Shing Luk, et al., Timing Yield Driven Clock Skew Scheduling Considering non-Gaussian Distributions of Critical Path Delays, Proceedings of the 45th Design Automation Conference, USA, pp. 223-226, 2008.
宋宇, 刘学欣, 陆伟成, 唐璞山, 一种鲁棒性几何规划新方法设计两级运放, 微电子学与计算机, 2008 年 25 卷 3 期, 175-181 页.

方君, 陆伟成, 赵文庆. 工艺参数变化下的基于统计时序分析的时钟偏差安排, 计算机辅助设计与图形学报，第 19 卷，第 9 期，pp.1172~1177，2007 年 9 月
FANG Jun, LUK Wai-Shing et al., True Worst-Case Clock Skew Estimation under Process Variations Using Affine Arithmetic, Chinese Journal of Electronics, vol. 16, no. 4, pages 631-636, 2007.
Xuexin Liu, Wai-Shing Luk et al., Robust Analog Circuit Sizing Using Ellipsoid Method and Affine Arithmetic, in Proceeding of the 12th Asia and South Pacific Design Automation Conference, pages 203-208, 2007.
J. Fang, W.-S. Luk and W. Zhao. A Novel Statistical Clock Skew Estimation Method, in The Proceedings of 8th International Conference on Solid-state and Integrated Circuit Technology, pp.1928-1930, 2006.

Lecture 1b: DFM For Dummies

DFM optimizes the ease of manufacturing and production costs of ICs while meeting performance, power, and reliability requirements. As ICs become increasingly miniaturized and complex, the manufacturing process is more sensitive to variations and defects. Chip quality and functionality may suffer if not addressed through Design for Manufacturing (DFM). DFM can be applied to multiple aspects of IC design, including circuit design, logic design, layout design, verification, and testing, to mitigate manufacturing issues.The lecture provides general guidelines for DFM. Best practices for DFM in IC layout design can reduce design iterations, improve collaboration with foundries, enhance product performance and functionality, and achieve faster time to market. Applying DFM techniques in the physical design stage can greatly benefit IC development. Lowering production costs. The lecture covers the challenges of Design for Manufacturability (DFM) as well as its market share. The lecture includes DFM analysis and verification, enhancement, optimization, and the algorithms used to solve DFM problems. The lecture includes DFM analysis and verification, enhancement, optimization, and the algorithms used to solve DFM problems. The course structure focuses on the problems that arise from DFM and presents them in mathematical forms.

Faster, smaller & smarter

iPhoneX

Silicon Gold Rush?

SMIC

Current Transistors

High-K dielectrics, Metal Gate (HKMG)
"3D" gate

The significance of High-K dielectrics is that they have a higher dielectric constant than traditional silicon dioxide (SiO2) dielectrics. This allows for a thicker gate oxide layer to be used without increasing the gate capacitance, which can improve the transistor's performance and reduce leakage current.

Metal Gate refers to the use of a metal material (such as tungsten or tantalum) for the gate electrode, instead of the traditional polysilicon material. This is significant because metal gates can provide better control over the transistor's threshold voltage, which can improve its performance and reduce variability.

FinFET

Lithography

Photo-resist coating
Illumination
Exposure
Etching
Impurities Doping
Metal connection

Process-Design Gap

Problem Visualization

One of the main impacts of lithography is that it can cause variations in the dimensions and shapes of the IC's features, which can negatively impact the performance and yield of the IC. This is because lithography is a complex process that involves the use of light to transfer a pattern from a mask to a wafer. Variations in the intensity, wavelength, and angle of the light can cause deviations in the dimensions and shapes of the features, which can lead to process-induced variation.

ibm

Chemical Mechanical Polishing

Chemical Mechanical Polishing (CMP) is a process used in semiconductor manufacturing to planarize the surface of a wafer. CMP is one of the steps involved in the fabrication of integrated circuits, specifically in the metal connection stage.

CMP

Chemical Mechanical Polishing

CMP

In terms of bridging the Process-Design Gap, CMP can help address the issue of process-induced variation by improving the uniformity of the wafer surface. This is important because process-induced variation can cause deviations in the dimensions and electrical properties of the transistors, which can negatively impact the performance and yield of the IC.

ECP & CMP

ECP

By using CMP to planarize the wafer surface, designers can reduce the variability in the thickness of the metal layers, which can improve the accuracy and consistency of the IC's electrical properties. This, in turn, can help bridge the Process-Design Gap by ensuring that the ICs are manufactured according to the intended design specifications.

Process Variation

Total Thickness Variation Per Node

Thickness Variation

"Slippery Fish" at 45nm

Process variation, impacting yield and performance
More restricted design rules (RDRs)
- +3 or more rules at 45nm
- +100 or more rules at 32nm
- +250 or more rules at 22nm
More rules implies larger die size, lower performance
10nm is not sci-fiction due to FinFET technology

DFM

What is DFM?

Design for 💰?
Design for Manufacturing
Design for Manufacturability
- Refer to a group of challenges less than 130nm
- The engineering practice of designing integrated circuits (ICs) to optimize their manufacturing ease and production cost given performance, power and reliability requirements
- A set of techniques to modify the design of ICs to improve their functional yield, parametric yield or their reliability

Why is it important?

Achieving high-yielding designs in the state-of-the-art VLSI technology is extremely challenging due to the miniaturization and complexity of leading-edge products
The manufacturing process becomes more sensitive to variations and defects, which can degrade the quality and functionality of the chips
DFM can help to address various manufacturing issues, such as lithography hotspots, CMP dishing and erosion, antenna effects, electromigration, stress effects, layout-dependent effects and more

How is it applied?

DFM can be applied to various aspects of IC design, such as circuit design, logic design, layout design, verification and testing
Each aspect has its own specific DFM guidelines and best practices that designers should follow to ensure manufacturability
For example, some general DFM guidelines for layout design are:
- Use regular and uniform layout structures
- Avoid narrow or long metal wires
- Avoid acute angles or jogs in wires
- Avoid isolated or floating features
- Use dummy fill to improve planarity and density uniformity
- Use recommended design rules and constraints from foundries

What are the benefits?

By applying DFM techniques in the physical design stage of IC development, designers can:
- Reduce the number of design iterations
- Improve the collaboration with foundries
- Enhance the product performance and functionality
- Achieve faster time to market and lower production costs

Market Share

💹 DFM Forecast 2009 in $M

forecast

Increasing Importance of DFM

trend

DFM Analysis and Verification

Critical area analysis
CMP modeling
Statistical timing analysis
Pattern matching
Lithography simulation
Lithographic hotspot verification

2D Pattern Matching in DRC+

DRC+

Contour Based Extraction

contour

DFM Enhancement and Optimization

Wire spreading
Dummy Filling
Redundant Via Insertion
Optical proximity correlation (OPC)
Phase Shift Masking (PSM)
Double/Triple/Multiple Patterning
Statistical timing and power optimization

Dummy Filling

filling

"Smart" Filling

Redundant Via Insertion

Also known as double via insertion.
Post-routing RVI (many EDA tools already have this feature)
Considering RVI during routing

RVI

Looks good, right?
But actually only few people are using this!
Why?

Multiple Patterning (MPL)

Instead of exposing the photoresist layer once under one mask, MPL exposes it twice by splitting the mask into "k" parts, each with features less dense.

MPL

What are the challenges of DFM?

DFM is not a fixed set of rules, but rather a flexible and evolving methodology that depends on the product requirements, the manufacturing technology and the industry standards
DFM can also be combined with other design methodologies, such as DFT, DFR, DFLP and DFS, to create a holistic approach to product development
DFM requires strong capabilities in research, supply chain, talent, IP protection and government policies

Course Structure

Describe the DFM problems that arise from.
Abstract the problems in mathematical forms
Describe the algorithms that solve the problems
Discuss the alternative algorithms and possible improvement.
Discuss if the algorithms can be applied to other area.
Only describe the key idea in lectures. Details are left for paper reading if necessary.

🙈 Not covered

Algorithms for 3D problems
Packaging
Machine Learning/AI Based algorithm

Lecture 2a: Open-Source Software Development Flow

💬 Messages

About 99% projects fail.
Software is "soft"; Hardware is "hard"
Automation is hard
Nightly build concept (Microsoft)
Agile software development
Pair programming
Extreme programming
Opensource projects - Continuous Integration

Platforms

https://github.com
gitpod.io - ☁️ cloud base
Github's Codespaces - ☁️ cloud base
Lubuntu
Windows - MSVC++
FydeOS (ChromeOS) - g++-13
Android's Termux - clang-17

Open-source Work Flow (Python)

Open-source Work Flow (C++)

Pull Request

GitHub, Git

git clone https://github.com/luk036/csdigit
cd csdigit
(edit)
git status
git diff
git diff README.md
git pull
git add .
git commit -m "message"
git push
git tag
git branch # list all branches
git branch develop # create a new branch
git switch develop
git switch master

Example - git status

ubuntu@ubuntu:~/github/ellpy$ git status
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    <span style="color:red;">modified:   .pytest_cache/v/cache/lastfailed</span>
    <span style="color:red;">modified:   .pytest_cache/v/cache/nodeids</span>

Untracked files:
  (use "git add <file>..." to include in what will be committed)

    <span style="color:red;">ellpy/</span>
    <span style="color:red;">test.html</span>

no changes added to commit (use "git add" and/or "git commit -a")

Example - git pull

lubuntu@lubuntu:~/github/luk036.github.io$ git pull
remote: Enumerating objects: 29, done.
remote: Counting objects: 100% (29/29), done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 19 (delta 14), reused 16 (delta 11), pack-reused 0
Unpacking objects: 100% (19/19), done.
From ssh://github.com/luk036/luk036.github.io
   461191c..d335266  master     -> origin/master
Updating 461191c..d335266
Fast-forward
 algo4dfm/swdevtips.html  |  36 <span style="color:green;">++++++++</span><span style="color:red;">--</span>
 algo4dfm/swdevtips.md    |  27 <span style="color:green;">+++++++</span><span style="color:red;">-</span>
 algo4dfm/swdevtools.html |  22 <span style="color:green;">+++++</span><span style="color:red;">--</span>
 algo4dfm/swdevtools.md   |  89 <span style="color:green;">++++++++++++++++++++++++</span><span style="color:red;">-</span>
 markdown/remark.html     |  45 <span style="color:green;">++++++++</span><span style="color:red;">-----</span>
 5 files changed, 251 insertions(+), 198 deletions(-)

GitHub, gh

gh repo create csdigit --public
gh repo clone csdigit
gh run list
gh run view
gh release list
gh release create
gh issue list
gh issue create
gh search repos digraphx

🐍 Python

Create a new porject

pip install pyscaffold[all]
putup -i --markdown --github-actions csdigit

⚙️ Setup

cd csdigit
pip install -e .
pip install -r requirements.txt

🧪 Unit Testing

pytest
pytest --doctest-modules src

⛺ Code Coverage

pytest --cov=src/csdigit

🐍 Python

🪄 Formatting and static check

pip install pre-commit
pre-commit run --all-files

📝 Documentation

pip install -r docs/requirements.txt
cd docs
make html
python -m http.server

📊 Benchmarking

pytest benches/test_bench.py

📊 Benchmarking Example

ubuntu@ubuntu:~/github/ellpy$ pytest tests/test_lmi.py
<span style="font-weight:bold;">============================= test session starts ==============================</span>
platform linux -- Python 3.7.3, pytest-5.1.2, py-1.8.0, pluggy-0.13.0 -- /media/ubuntu/casper-rw/miniconda3/bin/python
cachedir: .pytest_cache
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /media/ubuntu/casper-rw/github/ellpy, inifile: setup.cfg
plugins: benchmark-3.2.2, cov-2.7.1
<span style="font-weight:bold;">collecting ... </span>collected 2 items

tests/test_lmi.py::test_lmi_lazy <span style="color:green;">PASSED</span><span style="color:teal;">                                  [ 50%]</span>
tests/test_lmi.py::test_lmi_old <span style="color:green;">PASSED</span><span style="color:teal;">                                   [100%]</span><span style="color:red;"></span>

<span style="color:olive;">------------ benchmark: 2 tests -----------</span>
Name (time in ms)         Min                Max               Mean            StdDev             Median               IQR            Outliers      OPS            Rounds  Iterations
<span style="color:olive;">-------------------------------------------</span>
test_lmi_lazy       <span style="color:green;"></span><span style="font-weight:bold;color:green;">  13.0504 (1.0)    </span><span style="color:green;"></span><span style="font-weight:bold;color:green;">  13.2587 (1.0)    </span><span style="color:green;"></span><span style="font-weight:bold;color:green;">  13.1461 (1.0)    </span><span style="color:red;"></span><span style="font-weight:bold;color:red;">  0.0432 (1.91)   </span><span style="color:green;"></span><span style="font-weight:bold;color:green;">  13.1471 (1.0)    </span><span style="color:red;"></span><span style="font-weight:bold;color:red;">  0.0514 (1.66)   </span>      25;1<span style="color:green;"></span><span style="font-weight:bold;color:green;">  76.0682 (1.0)    </span>      75           1
test_lmi_old        <span style="color:red;"></span><span style="font-weight:bold;color:red;">  13.6855 (1.05)   </span><span style="color:red;"></span><span style="font-weight:bold;color:red;">  13.7888 (1.04)   </span><span style="color:red;"></span><span style="font-weight:bold;color:red;">  13.7279 (1.04)   </span><span style="color:green;"></span><span style="font-weight:bold;color:green;">  0.0225 (1.0)    </span><span style="color:red;"></span><span style="font-weight:bold;color:red;">  13.7271 (1.04)   </span><span style="color:green;"></span><span style="font-weight:bold;color:green;">  0.0310 (1.0)    </span>      24;1<span style="color:red;"></span><span style="font-weight:bold;color:red;">  72.8445 (0.96)   </span>      72           1
<span style="color:olive;">-------------------------------------------</span>

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
<span style="color:green;"></span><span style="font-weight:bold;color:green;">============================== 2 passed in 3.27s ===============================</span>

🦀 Rust

Create a new project

cargo install cargo-generate
cargo generate -o --init --git https://github.com/rust-github/template.git

⚙️ Setup

cd csd-rs
cargo build

🧪 Unit Testing

cargo test
cargo test --lib
cargo test --doc

⛺ Code Coverage

cargo tarpaulin (Windows)

🦀 Rust

🪄 Formatting and static check

cargo fmt
cargo clippy
cargo clippy --fix

📝 Documentation

cargo doc
cd target/doc
python -m http.server

📊 Benchmarking

cargo bench

C++ (CMake + CPM)

Create a new project

Use GitHub's ModernCppStarter template,
⚙️ Setup

cd csd-cpp
cmake -Sall -Bbuild -DCMAKE_BUILD_TYPE=Release
cmake --build build

🧪 Unit Testing

cmake --build build --target test

⛺ Code Coverage

??

C++ (CMake + CPM)

🪄 Formatting and static check

pip install cmake-format clang-format
cmake -Sall -Bbuild -DCMAKE_BUILD_TYPE=Release
cmake --build build --target fix-format

📝 Documentation

cmake --build build --target GenerateDocs

📊 Benchmarking

./build/bench/BM_switch

C++ (XMake)

Create a new project

xmake create -t static lds-cpp
xmake create -t console csd-cpp

⚙️ Setup

xmake f -m debug
xmake

🧪 Unit Testing

xmake run test_csd

⛺ Code Coverage

??

C++ (XMake)

🪄 Formatting and static check

xmake format

📝 Documentation

xmake doxygen

📊 Benchmarking

xmake run test_bench

Lecture 2b: Programming in the Age of AI 🤖

Coding Tips 💡

Test, test, test!!!
Write cleaner code
Refactor repeat codes
Object oriented programming
Generic programming
Design Pattern
Coroutine is your friend
Learn from good codes, not bad ones.
The last rescue: Google search.

Code generation

AWS CodeWhisperer (VSCode's extension)
- generate testing code

Documentation generation

Mintlify (VSCode's extension)

Naming
a, i, p, n ❌
A x = b
x: unknown, x_axis
x, y, z

Use better variable names

p: point, polygon, polynomial, prev
t: time, target, temp
c: cost, cycle, coefficient
d: distance, distribution
e: edge
v: vertex
u, v, w: vertex1, vertex2
i: index
i, j: row, col
i, j, k
l, m: line1, line2
n: dimension, node, next
n, m: ndim, mdim
w: weight, frequence (omega)

🚀 Performance Tips 💡

Avoid string comparison
Use sentinel
Use cheaper measure, avoid sqrt(), sin(), cos()
Lazy evaluation
Table lookup
Avoid sequence search:
- Backward pointers
- Hash Table/Dictionary/Map

Avoid string comparison

Bad 👎:

if pin == "input":
    # ...
elif pin == "output":
    # ...
elif pin == "in_out":
    # ...
elif pin == "dont_care":
    # ...
else:
    # ...

Better ⚡:

pin_type = dict({"input":0},
  {"output":1}, {"in_out":2},
  {"dont_care":3})
...
id = pin_type.get(pin, -1)
if id == 0:
    # ...
elif id == 1:
    # ...
elif id == 2:
    # ...
elif id == 3:
    # ...
else:
    # ...

Use Sentinel

Bad 👎:

max = 0
bckt = [Dllist() for _ in range(high)]
# ...
def popleft():
    res = bckt[max].popleft()
    while max >= 0 and bckt[max].empty():
        max -= 1
    return res

Better ⚡:

max = 0
sentinel = Dllink()
bckt = [Dllist() for _ in range(high+1)]
bckt[0].append(sentinel)  # sentinel
# ...
def popleft():
    res = bckt[max].popleft()
    while bckt[max].empty():
        max -= 1
    return res
# Saved a boundary check `max >= 0`

Use cheaper measure

Bad 👎:

mind = 10000
maxd = 0
for u, v in G.edges():
    t = vec[u] - vec[v]
*   d = sqrt(t.dot(t))
    if mind > d: mind = d
    if maxd < d: maxd = d
*return maxd - mind

Better ⚡:

minq = 10000
maxq = 0
for u, v in G.edges():
    t = vec[u] - vec[v]
*   q = t.dot(t)
    if minq > q: minq = q
    if maxq < q: maxq = q
*return sqrt(maxq) - sqrt(minq)

Another Example

Bad 👎:

mind = 10000
maxd = 0
for u, v in G.edges():
*   t = 1 - vec[u].dot(vec[v])
*   d = arcsin(sqrt(t))
    if mind > d: mind = d
    if maxd < d: maxd = d

*return maxd - mind

Better ⚡:

minq = 10000
maxq = 0
for u, v in G.edges():
*   q = 1 - vec[u].dot(vec[v])
    if minq > q: minq = q
    if maxq < q: maxq = q

*return arcsin(sqrt(maxq)) \
*        - arcsin(sqrt(minq))

Optimization Tips 💡

Convex optimization
Network optimization
Primal-dual paradigm

Lecture 2c: Introduction to Convex Programming

📝 Abstract

This lecture provides an introduction to the convex programming and covers various aspects of optimization. The lecture begins with an overview of optimization, including linear and nonlinear programming, duality and convexity, and approximation techniques. It then delves into more specific topics within continuous optimization, such as linear programming problems and their standard form, transformations to standard form, and the duality of linear programming problems. The lecture also touches on nonlinear programming, discussing the standard form of an NLPP (nonlinear programming problem) and the necessary conditions of optimality known as the Karush-Kuhn-Tucker (KKT) conditions. Convexity is another important concept explored in the document, with explanations on the definition of convex functions and their properties. The lecture also discusses the duality of convex optimization problems and their usefulness in computation. Finally, the document briefly mentions various unconstrained optimization techniques, descent methods, and approximation methods under constraints.

🗺️ Overview

Introduction
Linear programming
Nonlinear programming
Duality and Convexity
Approximation techniques
Convex Optimization
Books and online resources.

Classification of Optimizations

Continuous
- Linear vs Non-linear
- Convex vs Non-convex
Discrete
- Polynomial time Solvable
- NP-hard
  - Approximatable
  - Non-approximatable
Mixed

Continuous Optimization

classification

Linear Programming Problem

An LPP in standard form is: $min {c^{T} x ∣ A x = b, x \geq 0} .$
The ingredients of LPP are:
- An $m \times n$ matrix $A$ , with $n > m$
- A vector $b \in R^{m}$
- A vector $c \in R^{n}$

Example

$minimize subject to 0.4 x_{1} + 3.4 x_{2} - 3.4 x_{3} 0.5 x_{1} + 0.5 x_{2} 0.3 x_{1} - 0.8 x_{2} + 8.4 x_{2} x_{1}, x_{2}, x_{3} \geq 0 = 3.5 = 4.5$

Transformations to Standard Form

Theorem: Any LPP can be transformed into the standard form.
Variables not restricted in sign:
- Decompose $x$ to two new variables $x = x_{1} - x_{2}, x_{1}, x_{2} \geq 0$
Transforming inequalities into equalities:
- By putting slack variable $y = b - A x \geq 0$
- Set $x^{'} = (x, y), A^{'} = (A, 1)$
Transforming a max into a min
- max(expression) = min( $-$ expression);

Duality of LPP

If the primal problem of the LPP: $min {c^{T} x ∣ A x \geq b, x \geq 0}$ .
Its dual is: $max {y^{T} b ∣ A^{T} y \leq c, y \geq 0}$ .
If the primal problem is: $min {c^{T} x ∣ A x = b, x \geq 0}$ .
Its dual is: $max {y^{T} b ∣ A^{T} y \leq c}$ .

Nonlinear Programming

The standard form of an NLPP is $min {f (x) ∣ g (x) \leq 0, h (x) = 0} .$
Necessary conditions of optimality, Karush- Kuhn-Tucker (KKT) conditions:
- $\nabla f (x) + μ \nabla g (x) + λ \nabla h (x) = 0$ ,
- $μg (x) = 0$ ,
- $μ \geq 0, g (x) \leq 0, h (x) = 0$

Convexity

A function $f$ : $K \subseteq R^{n} \mapsto R$ is convex if $K$ is a convex set and $f (y) \geq f (x) + \nabla f (x) (y - x), y, x \in K$ .
Theorem: Assume that $f$ and $g$ are convex differentiable functions. If the pair $(x, m)$ satisfies the KKT conditions above, $x$ is an optimal solution of the problem. If in addition, $f$ is strictly convex, $x$ is the only solution of the problem.

(Local minimum = global minimum)

Duality and Convexity

Dual is the NLPP: $max {θ (μ, λ) ∣ μ \geq 0},$ where $θ (μ, λ) = in f_{x} [f (x) + μg (x) + λh (x)]$
Dual problem is always convex.
Useful for computing the lower/upper bound.

Applications

Statistics
Filter design
Power control
Machine learning
- SVM classifier
- logistic regression

Convexify the non-convex's

Change of curvature: square

Transform: $0.3 \leq x \leq 0.4$ into: $0.09 \leq x \leq 0.16 .$

👉 Note that $\cdot$ are monotonic concave functions in $(0, + \infty)$ .

Generalization:

Consider $∣ H (ω) ∣^{2}$ (power) instead of $∣ H (ω) ∣$ (magnitude).
square root -> Spectral factorization

Change of curvature: square

Transform: $x^{2} + y^{2} \geq 0.16, (non-convex)$ into: $x^{'} + y^{'} \geq 0.16, x^{'}, y^{'} \geq 0$ Then: $x_{opt} = \pm x_{opt}^{'}, y_{opt} = \pm y_{opt}^{'} .$

Change of curvature: sine

Transform: $s i n^{2} x \leq 0.4, 0 \leq x \leq π /2$ into: $y \leq 0.4, 0 \leq y \leq 1$ Then: $x_{opt} = sin^{- 1} (y_{opt}) .$

👉 Note that $sin (\cdot)$ are monotonic concave functions in $(0, π /2)$ .

Change of curvature: log

Transform: $π \leq x / y \leq ϕ$ into: $π^{'} \leq x^{'} - y^{'} \leq ϕ^{'}$ where $z^{'} = lo g (z)$ .

Then: $z_{opt} = exp (z_{opt}^{'}) .$

Generalization:

Geometric programming

Change of curvature: inverse

Transform: $l o g (x) + \frac{c}{x} \leq 0.3, x > 0$ into: $- l o g (y) + c \cdot y \leq 0.3, y > 0 .$

Then: $x_{opt} = y_{opt}^{- 1} .$

👉 Note that $\cdot$ , $lo g (\cdot)$ , and $(\cdot)^{- 1}$ are monotonic functions.

Generalize to matrix inequalities

Transform: $l o g (d e t X) + Tr (X^{- 1} C) \leq 0.3, X ≻ 0$ into: $- l o g (d e t Y) + Tr (Y \cdot C) \leq 0.3, Y ≻ 0$

Then: $X_{opt} = Y_{opt}^{- 1} .$

Change of variables

Transform: $(a + b \cdot y) x \leq 0, x > 0$

into: $a \cdot x + b \cdot z \leq 0, x > 0$ where $z = y x$ .

Then: $y_{opt} = z_{opt} x_{opt}^{- 1}$

Generalize to matrix inequalities

Transform: $(A + B Y) X + X (A + B Y)^{T} ≺ 0, X ≻ 0$

into: $A X + X A^{T} + B Z + Z^{T} B^{T} ≺ 0, X ≻ 0$ where $Z = Y X$ .

Then: $Y_{opt} = Z_{opt} X_{opt}^{- 1}$

Some operations that preserve convexity

$- f$ is concave if and only if $f$ is convex.
Nonnegative weighted sums:
- if $w_{1}, \dots, w_{n} \geq 0$ and $f_{1}, \dots, f_{n}$ are all convex, then so is $w_{1} f_{1} + \dots + w_{n} f_{n} .$ In particular, the sum of two convex functions is convex.
Composition:
- If $f$ and $g$ are convex functions and $g$ is non-decreasing over a univariate domain, then $h (x) = g (f (x))$ is convex. As an example, if $f$ is convex, then so is $e^{f (x)},$ because $e^{x}$ is convex and monotonically increasing.
- If $f$ is concave and $g$ is convex and non-increasing over a univariate domain, then $h (x) = g (f (x))$ is convex.
- Convexity is invariant under affine maps.

Other thoughts

Minimizing any quasi-convex function subject to convex constraints can easily be transformed into a convex programming.
Replace a non-convex constraint with a sufficient condition (such as its lower bound). Less optimal.
Relaxation + heuristic
Decomposition

Unconstraint Techniques

Line search methods
Fixed or variable step size
Interpolation
Golden section method
Fibonacci's method
Gradient methods
Steepest descent
Quasi-Newton methods
Conjugate Gradient methods

General Descent Method

Input: a starting point $x \in$ dom $f$
Output: $x^{*}$
repeat
1. Determine a descent direction $p$ .
2. Line search. Choose a step size $α > 0$ .
3. Update. $x := x + α p$
until stopping criterion satisfied.

Some Common Descent Directions

Gradient descent: $p = - \nabla f (x)^{T}$
Steepest descent:
- $△ x_{n s d} = arg min {\nabla f (x)^{T} v ∣ ∥ v ∥ = 1}$
- $△ x$ = $∥\nabla f (x) ∥△ x_{n s d}$ (un-normalized)
Newton's method:
- $p = - \nabla^{2} f (x)^{- 1} \nabla f (x)$
Conjugate gradient method:
- $p$ is "orthogonal" to all previous $p$ 's
Stochastic subgradient method:
- $p$ is calculated from a set of sample data (instead of using all data)
Network flow problems:
- $p$ is given by a "negative cycle" (or "negative cut").

Approximation Under Constraints

Penalization and barriers
Dual method
Interior Point method
Augmented Lagrangian method

📚 Books and Online Resources

Pablo Pedregal. Introduction to Optimization, Springer. 2003 (O224 P371)
Stephen Boyd and Lieven Vandenberghe, Convex Optimization, Dec. 2002
Mittlemann, H. D. and Spellucci, P. Decision Tree for Optimization Software, World Wide Web, http://plato.la.asu.edu/guide.html, 2003

Non-Parametric Spatial Correlation Estimation

Abstract

This lecture discusses non-parametric spatial correlation estimation and its importance in analyzing the variability in semiconductor devices. The intra-die variation in these devices can exhibit spatially correlated patterns, which require accurate statistical analysis during the design stage. Anisotropic models are used to allow for variations in gate length, which exhibit stronger correlation in the horizontal direction than the vertical direction. Non-parametric approaches make sense for correlation functions, as earlier studies that used parametric forms were limited by the assumptions made about the correlation function. This lecture goes on to describe random fields and the properties of correlation functions before diving into problem formulation and solutions using maximum likelihood estimation and least squares estimation.

🗺️ Overview

Motivation:
- Why is spatial correlation important?
- Why anisotropic models?
- Why do non-parametric approaches make sense?
Problem Formulation
Non-parametric estimation
- Least squares estimation
- Maximum Likelihood estimation
Numerical experiment
Conclusion

Why Spatial Correlation?

As the minimum feature size of semiconductor devices continues to shrink,
- Process variations are inevitable. It is desirable to develop more accurate statistical analysis during the design stage.
Intra-die variation exceeds inter-die variation
- Becomes dominant over total process variation
- Often exhibits spatially correlated patterns.
Applications:
- Statistical timing analysis -> Clock Skew Scheduling
- Power/leakage minimization

Why Anisotropic Model?

Isotropic assumption assumes that the correlation depends only on the distance between two random variables. It was made to simplify the computation.
Certain variations, such variations in gate length, exhibit significantly stronger correlation in the horizontal direction than in the vertical direction.

Why Non-Parametric Approaches?

In earlier studies, the parametric form of the correlation function was simple, such as an exponential, Gaussian or Matérn function:
Pros: guaranteed to be positive definite.
Cons:
- non-convex; may be stuck in a local minimum
- The actual correlation function may not necessarily be of this form.
- isotropic model

Piecewise linearization method (imprecise, not positive definite)
Parametric method (non-convex, too smooth, isotropic)
- Exponential function
- Gaussian function
- Matérn function
Non-parametric method
- Polynomial fitting
- B-spline

Random Field

Random field is an indexed family of random variables denote as ${\tilde{z} (s) : s \in D}$ , where $D \subseteq R^{d}$
Covariance $C (s_{i}, s_{j})$ = $cov (z (s_{i}), z (s_{j}))$ = $E [(z (s_{i}) - E [z (s_{i})]) (z (s_{j}) - E [z (s_{j})])]$
Correlation $R (s_{i}, s_{j}) = C (s_{i}, s_{j}) / C (s_{i}, s_{i}) C (s_{j}, s_{j})$
The field is stationary, or homogeneous, if the distribution is unchanged when the point set is translated.
The field is isotropic if the distribution is invariant under any rotation.
In HIF, let $d = ∥ s_{i} - s_{j} ∥_{2}$ :
- $C (s_{i}, s_{j}) = C (d)$
- $R (s_{i}, s_{j}) = C (d) / C (0) = σ^{2} ρ (d)$

Properties of Correlation Function

Even function, i.e. $ρ (h) = ρ (- h) ⟹$ its Fourier transform is real.
Positive definiteness (PD) $⟹$ its Fourier transform is positive (Bochner's theorem).
Monotonicity: correlations are decreasing against $h$ 🤔
Nonnegativeness: no negative correlation 🤔
Discontinuity at the origin: nugget effect.

The nugget effect refers to the discontinuity at the origin in the correlation function of spatially correlated patterns. It indicates the presence of a small, non-zero correlation value between points that are very close to each other. In other words, it represents the variance component that cannot be explained by spatial correlation and is attributed to purely random variation.

Problem Formulation

Intra-die variation $z = z_{d e t} + z_{cor} + \tilde{z}_{r n d}$
- $z_{d e t}$ : deterministic component
- $\tilde{z}_{cor}$ : correlated random component
- $\tilde{z}_{r n d}$ : purely random component
Given $M$ samples $(z_{1}, z_{2}, \dots, z_{M}) \in R^{n}$ .
Measured covariance matrix $Y$ :
- $Y = (1/ M) \sum_{i = 1}^{M} z_{i} z_{i}^{T}$ (unlikely PD)
In MATLAB, simply call cov(Zs',1) to obtain $Y$ .
In Python, simple call np.cov(Zs, bias=True) to obtain $Y$ .

Nearest PD Matrix Problem

Given $Y$ . Find a nearest matrix $Σ$ that is positive definite. $minimize subject to ∥Σ - Y ∥_{F} Σ ⪰ 0$ where $∥Σ - Y ∥_{F}$ denotes the Frobenius norm, $A ⪰ 0$ denotes $A$ is positive semidefinite.
👉 Note:
1. the problem is convex 😃
2. the problem can be solved easily using CVX 😃

Maximum Likelihood Estimation

Maximum likelihood estimation (MLE): \begin{array}{ll} \text{maximize} & \log \det \Sigma^{-1} - \mathrm{Tr}(\Sigma^{-1}Y) \ \text{subject to} & \Sigma \succeq 0 \end{array} where $\mathrm{Tr}(A)$ denotes the trace of $A$.
👉 Note: 1st term is concave 😭, 2nd term is convex

Maximum Likelihood Estimation (cont'd)

Having $S = \Sigma^{-1}$, the problem becomes convex 😃:

\begin{array}{ll} \text{minimize} & -\log \det S + \mathrm{Tr}(S Y) \ \text{subject to} & S \succeq 0 \end{array}
👉 Note: the problem can be solved easily using MATLAB with the CVX package, or using Python with the cvxpy package.

Matlab Code of CVX

function Sig = log_mle_solver(Y);
ndim = size(Y,1);
cvx_quiet(false);
cvx_begin sdp
    variable S(ndim, ndim) symmetric
    maximize(log_det(S) - trace(S*Y))
    subject to
         S >= 0;
cvx_end
Sig = inv(S);

🐍 Python Code

from cvxpy import *
from scipy import linalg

def mle_corr_mtx(Y):
  ndim = len(Y)
  S = Semidef(ndim)
  prob = Problem(Maximize(log_det(S) - trace(S*Y)))
  prob.solve()
  if prob.status != OPTIMAL:
      raise Exception('CVXPY Error')
  return linalg.inv(S.value)

Correlation Function (I)

Let $\rho(h) = \sum_i^m p_i \Psi_i(h)$, where
- $p_i$'s are the unknown coefficients to be fitted
- $\Psi_i$'s are a family of basis functions.
Let ${F_k}_{i,j} =\Psi_k( | s_i - s_j |_2)$.
The covariance matrix $\Omega(p)$ can be recast as: \Omega(p) = p_1 F_1 + \cdots + p_m F_m
Note 1: affine transformation preserved convexity
Note 2: inverse of matrix unfortunately cannot be expressed in convex form.

Correlation Function (II)

Choice of $\Psi_i(h)$:
- Polynomial $P_i(h)$:
  - Easy to understand 👍
  - No guarantee of monotonicity; unstable for higher-order polynomials.
- B-spline function $B_i(h)$
  - Shapes are easier to control 👍
  - No guarantee of positive definite 👎

Correlation Function (III)

To ensure that the resulting function is PD, additional constraints can be imposed according to Bochner's theorem, e.g.:
- real(FFT(${\Psi_i(h_k)}$)) $\geq 0$

Bochner's theorem states that a continuous function is a valid covariance function if and only if its Fourier transform is a non-negative measure. In other words, a function can be a valid covariance function if and only if its Fourier transform is positive definite. This theorem is important in spatial statistics because it provides a way to check whether a given covariance function is valid or not.

Non-Parametric Estimation

Least squares estimation

\begin{array}{ll} \min_{\kappa, p} & | \Omega(p) + \kappa I - Y |F \ \text{s.t.} & \Omega(p) \succeq 0, \kappa \geq 0 \end{array} $👉 N o t e : co n v e x p ro b l e m 😃 - M a x im u m l ik e l ih oo d es t ima t i o n (M L E) :$ \begin{array}{ll} \min{\kappa, p} & \log \det (\Omega(p) + \kappa I) + \mathrm{Tr}((\Omega(p) + \kappa I)^{-1}Y) \ \text{s.t.} & \Omega(p) \succeq 0, \kappa \geq 0 \end{array}

👉 Note:
- The 1st term is concave 😭, the 2nd term is convex
- However, the problem is geodesically convex.
- If enough samples are available, then $Y \succeq 0$. Furthermore, the MLE is a convex problem in $Y \preceq \Omega(p) + \kappa I \preceq 2Y$

Isotopic Case I

: Data Sample

: Least Square Result

Isotopic Case II

: Data Sample

: Least Square Result

Convex Concave Procedure

Let $\Sigma = \Omega + \kappa I$. Log-likelihood function is:
- $\log \det \Sigma^{-1} - \mathrm{Tr}(\Sigma^{-1}Y)$
Convexify the first term using the fact:
- $\log \det \Sigma^{-1} \geq \log \det \Sigma_0^{-1} + \mathrm{Tr}(\Sigma_0^{-1} (\Sigma - \Sigma_0))$
- minimize: $-\log \det \Sigma_0^{-1} + \mathrm{Tr}(\Sigma_0^{-1} (\Sigma - \Sigma_0)) + \mathrm{Tr}(\Sigma^{-1}Y)$
At each iteration $k$, the following convex problem is solved:

\begin{array}{ll} \min & \mathrm{Tr}(\Sigma_k^{-1} (\Sigma - \Sigma_k)) + \mathrm{Tr}(SY) \ \text{s.t.} & \left( \begin{array}{cc} \Sigma & I_n \ I_n & S \end{array} \right) \succeq 0, \kappa \geq 0 \end{array} $$ 👉 Note: Convergence to an optimal solution is not guaranteed, but is practically good.

MATLAB Code

% Geometric anisotropic parameters
alpha = 2;     % scaling factor
theta = pi/3;  % angle
Sc = [1   0; 0   alpha];
R = [sin(theta) cos(theta); -cos(theta) sin(theta)];
T = Sc*R;
Sig = ones(n,n);
for i=1:n-1,
   for j=i+1:n,
     dt = s(j,:)' - s(i,:)';
     d = T*dt;  % become isotropic after the location transformation
     Sig(i,j) = exp(-0.5*(d'*d)/(sdkern*sdkern)/2);
     Sig(j,i) = Sig(i,j);
   end
end

Anisotopic Data

Isotropic Result

Anisotropic Result

Future Work

Porting MATLAB code to Python
Real data, not computer generated data
Barycentric B-spline.
Sampling method optimization.

Lecture 4: Robust Analog Circuit Sizing Under Process Variations

@luk036

2022-10-12

📝 Abstract

The lecture focuses on the robust sizing of analog circuits under process variations. It analyzes the challenges that arise when designing analog circuits at the 20nm process node, including double-patterning, layout-dependent effects, new local interconnect layers, and the use of FinFET transistors. The lecture stresses the importance of designing circuits with robustness in mind by factoring in process variations in the specification requirements. The lecture presents the concept of formulating the analog circuit sizing problem and identifies the difficulties involved, such as the high level of flexibility and susceptibility to variations. The lecture also explores various approaches to computer-aided design (CAD) for analog circuits, including knowledge-based and optimization-based techniques. The lecture discusses emerging techniques in geometric programming (GP), introducing a new method for solving robust GP problems using the affine arithmetic and the ellipsoid technique. An example of CMOS two-stage operational amplifier design demonstrates the application of the robust geometric programming approach.

🔑 Keywords

Analog circuit 模拟电路
Design for robustness 鲁棒性设计
Worst-case scenarios 最坏情景
Affine arithmetic 仿射运算
Convex programming 凸规划
Geometric programming 几何规划
Posynomial 正项式 (Positive + polynomial)
Ellipsoid method 椭球法

🗺️ Overview

Challenges of 20nm Analog Design
Design for variability
Design for robustness
Analog circuit sizing problem formulation
Robust geometric programming
Affine arithmetic for worst case scenarios
Design examples

📖 Introduction

Costs 28nm 20nm

Fab Costs 3B 4B - 7B Process R&D 1.2B 2.1B - 3B Mask Costs 2M - 3M 5M - 8M Design Costs 50M - 90M 120M - 500M

: Fab, process, mask, and design costs are much higher at 20nm (IBS, May 2011)

Challenges at 20 nm

Double-patterning aware
Layout-dependent effects
New local interconnect layers
>5,000 design rules
Device variation and sensitivity
New type of transistor - FinFET

👫 Double Patterning

Overlay Error (Mask Shift)

Parasitic matching becomes very challenging

Layout-Dependent Effects

Layout-Dependent Effects	> 40nm	At 40nm	>= 28nm
Well Proximity Effect (WPE)	x	x	x
Poly Spacing Effect (PSE)		x	x
Length of Diffusion (LOD)	x	x	x
OD to OD Spacing Effect (OSE)		x	x

New Local Interconnect Layers

New Transistor Type: FinFET

Width is discrete. You can add 2 fins or 3 fins, but not 2.75 fins.

Design for Robustness

Process variations must be included in the specification.

Basic Design Flow

Top-down Design Phases

Basic Flow of Analog Synthesis

Analog Circuit Sizing Problem

Problem definition:
- Given a circuit topology, a set of specification requirements and technology, find the values of design variables that meet the specifications and optimize the circuit performance.
Difficulties:
- High degrees of freedom
- Performance is sensitive to variations

Main Approaches in CAD

Knowledge-based
- Rely on circuit understanding, design heuristics
Optimization based
- Equation based
  - Establish circuit equations and use numerical solvers
- Simulation based
  - Rely on circuit simulation

In practice, you mix and match of them whenever appropriate.

Geometric Programming

In recent years, techniques of using geometric programming (GP) are emerging.
In this lecture, we present a new idea of solving robust GP problems using ellipsoid method and affine arithmetic.

Lecture 04b - Robust Geometric Programming

Outline

Problem Definition for Robust Analog Circuit Sizing
Robust Geometric Programming
Affine Arithmetic
Example: CMOS Two-stage Op-Amp
Numerical Result
Conclusions

Robust Analog Circuit Sizing Problem

Given a circuit topology and a set of specification requirements:

Constraint	Spec.	Units
Device Width	$\geq 2.0$	$μ$ m
Device Length	$\geq 1.0$	$μ$ m
Estimated Area	minimize	$μ$ m $^{2}$
$⋮$	$⋮$	$⋮$
CMRR	$\geq 75$	dB
Neg. PSRR	$\geq 80$	dB
Power	$\leq 3$	mW

Find the worst-case design variable values that meet the specification requirements and optimize circuit performance.

Robust Optimization Formulation

Consider $minimize subject to sup_{q \in Q} f_{0} (x, q), f_{j} (x, q) \leq 0 \forall q \in Q and j = 1, 2, \dots, m,$ where
- $x \in R^{n}$ represents a set of design variables (such as $L$ , $W$ ),
- $q$ represents a set of varying parameters (such as $T_{OX}$ )
- $f_{j} \leq 0$ represents the $j$ th specification requirement (such as phase margin $\geq 6 0^{\circ}$ ).

Geometric Programming in Standard Form

We further assume that $f_{i} (x, q)$ 's are convex for all $q \in Q$ .
Geometric programming is an optimization problem that takes the following standard form: $minimize subject to p_{0} (y) p_{i} (y) \leq 1, g_{j} (y) = 1, y_{k} > 0, i = 1, \dots, l j = 1, \dots, m k = 1, \dots, n,$ where
- $p_{i}$ 's are posynomial functions and $g_{j}$ 's are monomial functions.

Posynomial and Monomial Functions

A monomial function is simply: $g (y) = c y_{1}^{α_{1}} y_{2}^{α_{2}} \dots y_{n}^{α_{n}}, y_{k} > 0.$ where
- $c$ is non-negative and $α_{k} \in R$ .
A posynomial function is a sum of monomial functions: $p (y) = s = 1 \sum T c_{s} y_{1}^{α_{1, s}} y_{2}^{α_{2, s}} \dots y_{n}^{α_{n, s}}, y_{k} > 0,$
A monomial can also be viewed as a special case of posynomial where there is only one term of the sum.

Geometric Programming in Convex Form

Many engineering problems can be formulated as a GP.
On Boyd's website there is a Matlab package "GGPLAB" and an excellent tutorial material.
GP can be converted into a convex form by changing the variables $x_{k} = lo g (y_{k})$ and replacing $p_{i}$ with $lo g p_{i}$ : $minimize subject to lo g p_{0} (exp (x)) lo g p_{i} (exp (x)) \leq 0, a_{j}^{T} x + b_{j} = 0, i = 1, \dots, l j = 1, \dots, m$ where
- $exp (x) = (e^{x_{1}}, e^{x_{2}}, \dots, e^{x_{n}})$
- $a_{j} = (α_{1, j}, \dots, α_{n, j})$
- $b_{j} = lo g (c_{j})$

Robust GP

GP in the convex form can be solved efficiently by interior-point methods.
In robust version, coefficients $c_{s}$ are functions of $q$ .
The robust problem is still convex. Moreover, there is an infinite number of constraints.
Alternative approach: Ellipsoid Method.

Example - Profit Maximization Problem

This example is taken from [@Aliabadi2013Robust].

$maximize subject to p (A x_{1}^{α} x_{2}^{β}) - v_{1} x_{1} - v_{2} x_{2} x_{1} \leq k .$

$p (A x_{1}^{α} x_{2}^{β})$ : Cobb-Douglas production function
$p$ : the market price per unit
$A$ : the scale of production
$α, β$ : the output elasticities
$x$ : input quantity
$v$ : output price
$k$ : a given constant that restricts the quantity of $x_{1}$

Example - Profit maximization (cont'd)

The formulation is not in the convex form.
Rewrite the problem in the following form: $maximize subject to t t + v_{1} x_{1} + v_{2} x_{2} \leq p A x_{1}^{α} x_{2}^{β} x_{1} \leq k .$

Profit maximization in Convex Form

By taking the logarithm of each variable:
- $y_{1} = lo g x_{1}$ , $y_{2} = lo g x_{2}$ .
We have the problem in a convex form:

$max s.t. t lo g (t + v_{1} e^{y_{1}} + v_{2} e^{y_{2}}) - (α y_{1} + β y_{2}) \leq lo g (p A) y_{1} \leq lo g k .$

class profit_oracle:
    def __init__(self, params, a, v):
        p, A, k = params
        self.log_pA = np.log(p * A)
        self.log_k = np.log(k)
        self.v = v
        self.a = a

    def __call__(self, y, t):
        fj = y[0] - self.log_k  # constraint
        if fj > 0.:
            g = np.array([1., 0.])
            return (g, fj), t
        log_Cobb = self.log_pA + self.a @ y
        x = np.exp(y)
        vx = self.v @ x
        te = t + vx
        fj = np.log(te) - log_Cobb
        if fj < 0.:
            te = np.exp(log_Cobb)
            t = te - vx
            fj = 0.
        g = (self.v * x) / te - self.a
        return (g, fj), t

# Main program

import numpy as np
from ellpy.cutting_plane import cutting_plane_dc
from ellpy.ell import ell
from .profit_oracle import profit_oracle

p, A, k = 20., 40., 30.5
params = p, A, k
alpha, beta = 0.1, 0.4
v1, v2 = 10., 35.
a = np.array([alpha, beta])
v = np.array([v1, v2])
y0 = np.array([0., 0.])  # initial x0
r = np.array([100., 100.])  # initial ellipsoid (sphere)
E = ell(r, y0)
P = profit_oracle(params, a, v)
yb1, ell_info = cutting_plane_dc(P, E, 0.)
print(ell_info.value, ell_info.feasible)

Example - Profit Maximization Problem (convex)

$max s.t. t lo g (t + \overset{v}{^}_{1} e^{y_{1}} + \overset{v}{^}_{2} e^{y_{2}}) - (\overset{α}{^} y_{1} + \hat{β} y_{2}) \leq lo g (\overset{p}{^} A) y_{1} \leq lo g \hat{k},$

Now assume that:
- $\overset{α}{^}$ and $\hat{β}$ vary $\overset{α}{ˉ} \pm e_{1}$ and $\overset{ˉ}{β} \pm e_{2}$ respectively.
- $\overset{p}{^}$ , $\hat{k}$ , $\overset{v}{^}_{1}$ , and $\overset{v}{^}_{2}$ all vary $\pm e_{3}$ .

Example - Profit Maximization Problem (oracle)

By detail analysis, the worst case happens when:

$p = \overset{p}{ˉ} - e_{3}$ , $k = \overset{ˉ}{k} - e_{3}$
$v_{1} = \overset{v}{ˉ}_{1} + e_{3}$ , $v_{2} = \overset{v}{ˉ}_{2} + e_{3}$ ,
if $y_{1} > 0$ , $α = \overset{α}{ˉ} - e_{1}$ , else $α = \overset{α}{ˉ} + e_{1}$
if $y_{2} > 0$ , $β = \overset{ˉ}{β} - e_{2}$ , else $β = \overset{ˉ}{β} + e_{2}$

class profit_rb_oracle:
    def __init__(self, params, a, v, vparams):
        e1, e2, e3, e4, e5 = vparams
        self.a = a
        self.e = [e1, e2]
        p, A, k = params
        params_rb = p - e3, A, k - e4
        self.P = profit_oracle(params_rb, a, v + e5)

    def __call__(self, y, t):
        a_rb = self.a.copy()
        for i in [0, 1]:
            a_rb[i] += self.e[i] if y[i] <= 0 else -self.e[i]
        self.P.a = a_rb
        return self.P(y, t)

🔮 Oracle in Robust Optimization Formulation

The oracle only needs to determine:
- If $f_{j} (x_{0}, q) > 0$ for some $j$ and $q = q_{0}$ , then
  - the cut $(g, β)$ = $(\partial f_{j} (x_{0}, q_{0}), f_{j} (x_{0}, q_{0}))$
- If $f_{0} (x_{0}, q) \geq t$ for some $q = q_{0}$ , then
  - the cut $(g, β)$ = $(\partial f_{0} (x_{0}, q_{0}), f_{0} (x_{0}, q_{0}) - t)$
- Otherwise, $x_{0}$ is feasible, then
  - Let $q_{m a x} = arg max_{q \in Q} f_{0} (x_{0}, q)$ .
  - $t := f_{0} (x_{0}, q_{m a x})$ .
  - The cut $(g, β)$ = $(\partial f_{0} (x_{0}, q_{m a x}), 0)$

Remark:

for more complicated problems, affine arithmetic could be used [@liu2007robust].

Lecture 04c - Affine Arithmetic

A Simple Area Problem

Suppose the points $p$ , $q$ and $r$ vary within the region of 3 given rectangles.
Q: What is the upper and lower bound on the area of $△ pq r$ ?

triangle

Method 1: Corner-based

Calculate all the areas of triangles with different corners.
Problems:
- In practical applications, there may be many corners.
- What's more, in practical applications, the worst-case scenario may not be at the corners at all.

Method 2: Monte Carlo

Monte-Carlo or Quasi Monte-Carlo:
- Calculate the area of triangles for different sampling points.
Advantage: more accurate when there are more sampling points.
Disadvantage: time consuming

Interval Arithmetic vs. Affine Arithmetic

Method 3: Interval Arithmetic

Interval arithmetic (IA) estimation:
- Let px = [2, 3], py = [3, 4]
- Let qx = [-5, -4], qy = [-6, -5]
- Let rx = [6, 7] , ry = [-5, -4]
Area of triangle:
- = ((qx - px)(ry - py) - (qy - py)(rx - px))/2
- = [33 .. 61] (actually [36.5 .. 56.5])
Problem: cannot handle correlation between variables.

Method 4: Affine Arithmetic

(Definition to be given shortly)
More accurate estimation than IA:
- Area = [35 .. 57] in the previous example.
Take care of first-order correlation.
Usually faster than Monte-Carlo, but ....
- becomes inaccurate if the variations are large.
libaffa.a/YALAA package is publicly available:
- Provides functuins like +, -, *, /, sin(), cos(), pow() etc.

Analog Circuit Example

Unit Gain bandwidth
- GBW = sqrt(A*Kp*Ib*(W2/L2)/(2*pi*Cc) where some parameters are varying

Enabling Technologies

C++ template and operator overloading features greatly simplify the coding effort:

E.g., the following code can be applied to both <double> and <AAF>:

template <typename Tp>
Tp area(const Tp& px, const Tp& qx, const Tp& rx,
        const Tp& py, const Tp& qy, const Tp& ry) {
    return ((qx-px)*(ry-py) - (qy-py)*(rx-px)) / 2;
}

In other words, some existing code can be reused with minimal modification.

Applications of AA

Analog Circuit Sizing
Worst-Case Timing Analysis
Statistical Static Timing Analysis
Parameter Variation Interconnect Model Order Reduction [CMU02]
Clock Skew Analysis
Bounded Skew Clock Tree Synthesis

Limitations of AA

Something AA can't replace <double>:
- Iterative methods (no fixed point in AA)
- No Multiplicative inverse operation (no LU decomposition)
- Not total ordering, can't sort (???)
AA can only handle linear correlation, which means you can't expect an accurate approximation of abs(x) near zero.
Fortunately the ellipsoid method is one of the few algorithms that works with AA.

Circuit Sizing for Op. Amp.

Geometric Programming formulation for CMOS Op. Amp.
Min-max convex programming under Parametric variations (PVT)
Ellipsoid Method

What is Affine Arithmetic?

Represents a quantity x with an affine form (AAF): $\overset{x}{^} = x_{0} + x_{1} ϵ_{1} + \dots + x_{n} ϵ_{n}$ where
- noise symbols $ϵ_{i} \in [- 1, 1]$
- central value $x_{0} \in R$
- partial deviations $x_{i} \in R$
- $n$ is not fixed - new noise symbols are generated during the computation process.
IA -> AA : $[3..4] \to 3.5 + 0.5 ϵ_{1}$

Geometry of AA

Affine forms that share noise symbols are dependent:
- $\overset{x}{^} = x_{0} + x_{1} ϵ_{1} + \dots + x_{n} ϵ_{n}$
- $\overset{y}{^} = y_{0} + y_{1} ϵ_{1} + \dots + y_{m} ϵ_{m}$
The region containing (x, y) is:
- $Z = {(x, y) : ϵ_{i} \in [- 1, 1]}$
- This region is a centrally symmetric convex polygon called "zonotope".

zonotope

Affine Arithmetic

How to find $sup_{q \in Q} f_{j} (x, q)$ efficiently?

$sup_{q \in Q} f_{j} (x, q)$ is in general difficult to obtain.
Provided that variations are small or nearly linear, we propose using Affine Arithmetic (AA) to solve this problem.
Features of AA:
- Handle correlation of variations by sharing noise symbols.
- Enabling technology: template and operator overloading features of C++.
- A C++ package "YALAA" is publicly available.

Affine Arithmetic for Worst Case Analysis

An uncertain quantity is represented in an affine form (AAF): $\overset{a}{^} = a_{0} + a_{1} ε_{1} + a_{2} ε_{2} + \dots + a_{k} ε_{k} = a_{0} + i = 1 \sum k a_{i} ε_{i},$ where
- $ε_{i} \in [- 1, 1]$ is called noise symbol.
Exact results for affine operations ( $\overset{a}{^} + \hat{b}$ , $\overset{a}{^} - \hat{b}$ and $α \cdot \overset{a}{^}$ )
Results of non-affine operations (such as $\overset{a}{^} \cdot \hat{b}$ , $\overset{a}{^} / \hat{b}$ , $max (\overset{a}{^}, \hat{b}), lo g (\overset{a}{^})$ ) are approximated in an affine form.
AA has been applied to a wide range of applications recently when process variations are considered.

Affine Arithmetic for Optimization

In our robust GP problem:

First, represent every elements in $q$ in affine forms.
For each ellipsoid iteration, $f (x_{c}, q)$ is obtained by approximating $f (x_{c}, \overset{q}{^})$ in an affine form: $\hat{f} = f_{0} + f_{1} ε_{1} + f_{2} ε_{2} + \dots + f_{k} ε_{k} .$
Then the maximum of $\hat{f}$ is determined by: $ε_{j} = {+ 1 - 1 if f_{j} > 0 if f_{j} < 0 j = 1, \dots, k .$

Performance Specification

Constraint	Spec.	Units
Device Width	$\geq 2.0$	$μ$ m
Device Length	$\geq 1.0$	$μ$ m
Estimated Area	minimize	$μ$ m $^{2}$
Input CM Voltage	$[0.45, 0.55]$	x $V_{DD}$
Output Range	$[0.1, 0.9]$	x $V_{DD}$
Gain	$\geq 80$	dB
Unity Gain Freq.	$\geq 50$	MHz
Phase Margin	$\geq 60$	degree
Slew Rate	$\geq 50$	V/ $μ$ s
CMRR	$\geq 75$	dB
Neg. PSRR	$\geq 80$	dB
Power	$\leq 3$	mW
Noise, Flicker	$\leq 800$	nV/Hz $^{0.5}$

Open-Loop Gain (Example)

Open-loop gain $A_{v}$ can be approximated as a monomial function:

$A_{v} = \frac{2 C _{o x}}{( λ _{n} + λ _{p} ) ^{2}} μ_{n} μ_{p} \frac{W _{1} W _{6}}{L _{1} L _{6} I _{1} I _{6}}$

where $I_{1}$ and $I_{6}$ are monomial functions.

Corresponding C++ code fragment:

// Open Loop Gain
monomial<aaf> OLG = 2*COX/square(LAMBDAN+LAMBDAP)*
     sqrt(KP*KN*W[1]/L[1]*W[6]/L[6]/I1/I6);

Results of Design Variables

Variable	Units	GGPLAB	Our	Robust
$W_{1}$	$μ$ m	44.8	44.9	45.4
$W_{8}$	$μ$ m	3.94	3.98	3.8
$W_{10}$	$μ$ m	2.0	2.0	2.0
$W_{13}$	$μ$ m	2.0	2.0	2.1
$L_{1}$	$μ$ m	1.0	1.0	1.0
$L_{8}$	$μ$ m	1.0	1.0	1.0
$L_{10}$	$μ$ m	1.0	1.0	1.0
$L_{13}$	$μ$ m	1.0	1.0	1.0
$A$	N/A	10.4	10.3	12.0
$B$	N/A	61.9	61.3	69.1
$C_{c}$	pF	1.0	1.0	1.0
$I_{bia s}$	$μ$ A	6.12	6.19	5.54

Performances

Performance (units)	Spec.	Std.	Robust
Estimated Area ( $μ$ m $^{2}$ )	minimize	5678.4	6119.2
Output Range (x $V_{DD}$ )	[0.1, 0.9]	[0.07, 0.92]	[0.07, 0.92]
Comm Inp Range (x $V_{DD}$ )	[0.45, 0.55]	[0.41, 0.59]	[0.39, 0.61]
Gain (dB)	$\geq 80$	80	[80.0, 81.1]
Unity Gain Freq. (MHz)	$\geq 50$	50	[50.0, 53.1]
Phase Margin (degree)	$\geq 60$	86.5	[86.1, 86.6]
Slew Rate (V/ $μ$ s)	$\geq 50$	64	[66.7, 66.7]
CMRR (dB)	$\geq 75$	77.5	[77.5, 78.6]
Neg. PSRR (dB)	$\geq 80$	83.5	[83.5, 84.6]
Power (mW)	$\leq 3$	1.5	[1.5, 1.5]
Noise, Flicker (nV/Hz $^{0.5}$ )	$\leq 800$	$600$	[578, 616]

Conclusions

Our ellipsoid method is fast enough for practical analog circuit sizing (take < 1 sec. running on a 3GHz Intel CPU for our example).
Our method is reliable, in the sense that the solution, once produced, always satisfies the specification requirement in the worst case.

Comments

The marriage of AA (algebra) and Zonotope (geometry) has the potential to provide us with a powerful tool for algorithm design.
AA does not solve all problems. E.g. Iterative method does not apply to AA because AA is not in the Banach space (the fixed-point theorem does not hold).
AA * and + do not obey the laws of distribution (c.f. floating-point arithmetic)
AA can only perform first-order approximations. In other words, it can only be applied to nearly linear variations.
In practice, we still need to combine AA with other methods, such as statistical method or the (quasi-) Monte Carlo method.

Ellipsoid Method and Its Amazing Oracles 🔮

When you have eliminated the impossible, whatever remains, however improbable, must be the truth.

Sir Arthur Conan Doyle, stated by Sherlock Holmes

📖 Introduction

Common Perspective of Ellipsoid Method

It is widely believed to be inefficient in practice for large-scale problems.
- Convergent rate is slow, even when using deep cuts.
- Cannot exploit sparsity.
It has since then supplanted by the interior-point methods.
Used only as a theoretical tool to prove polynomial-time solvability of some combinatorial optimization problems.

But...

The ellipsoid method works very differently compared with the interior point methods.
It only requires a separation oracle that provides a cutting plane.
Although the ellipsoid method cannot take advantage of the sparsity of the problem, the separation oracle is capable of take advantage of certain structural types.

Consider the ellipsoid method when...

The number of design variables is moderate, e.g. ECO flow, analog circuit sizing, parametric problems
The number of constraints is large, or even infinite
Oracle can be implemented effectively.

🥥 Cutting-plane Method Revisited

Convex Set

Let $K \subseteq R^{n}$ be a convex set 🥚.
Consider the feasibility problem:
- Find a point $x^{*} \in R^{n}$ in $K$ ,
- or determine that $K$ is empty (i.e., there is no feasible solution)

🔮 Separation Oracle

When a separation oracle $Ω$ is queried at $x_{0}$ , it either
asserts that $x_{0} \in K$ , or
returns a separating hyperplane between $x_{0}$ and $K$ : $g^{T} (x - x_{0}) + β \leq 0, β \geq 0, g \neq = 0, \forall x \in K$

🔮 Separation Oracle (cont'd)

$(g, β)$ is called a cutting-plane, or cut, because it eliminates the half-space ${x ∣ g^{T} (x - x_{0}) + β > 0}$ from our search.
If $β = 0$ ( $x_{0}$ is on the boundary of halfspace that is cut), the cutting-plane is called neutral cut.
If $β > 0$ ( $x_{0}$ lies in the interior of halfspace that is cut), the cutting-plane is called deep cut.
If $β < 0$ ( $x_{0}$ lies in the exterior of halfspace that is cut), the cutting-plane is called shallow cut.

Subgradient

$K$ is usually given by a set of inequalities $f_{j} (x) \leq 0$ or $f_{j} (x) < 0$ for $j = 1 \dots m$ , where $f_{j} (x)$ is a convex function.
A vector $g \equiv \partial f (x_{0})$ is called a subgradient of a convex function $f$ at $x_{0}$ if $f (z) \geq f (x_{0}) + g^{T} (z - x_{0})$ .
Hence, the cut $(g, β)$ is given by $(\partial f (x_{0}), f (x_{0}))$

Remarks:

If $f (x)$ is differentiable, we can simply take $\partial f (x_{0}) = \nabla f (x_{0})$

Key components of Cutting-plane method

A cutting plane oracle $Ω$
A search space $S$ initially large enough to cover $K$ , e.g.
- Polyhedron $P$ = ${z ∣ C z ⪯ d}$
- Interval $I$ = $[l, u]$ (for one-dimensional problem)
- Ellipsoid $E$ = ${z ∣ (z - x_{c}) P^{- 1} (z - x_{c}) \leq 1}$

Outline of Cutting-plane method

Given initial $S$ known to contain $K$ .
Repeat
- Choose a point $x_{0}$ in $S$
- Query the cutting-plane oracle at $x_{0}$
- If $x_{0} \in K$ , quit
- Otherwise, update $S$ to a smaller set that covers: $S^{+} = S \cap {z ∣ g^{T} (z - x_{0}) + β \leq 0}$
- If $S^{+} = \emptyset$ or it is small enough, quit.

Corresponding Python code

def cutting_plane_feas(omega, space, options=Options()):
    for niter in range(options.max_iters):
        cut = omega.assess_feas(space.xc())  # query the oracle
        if cut is None:  # feasible sol'n obtained
            return space.xc(), niter
        status = space.update_deep_cut(cut)  # update space
        if status != CutStatus.Success or space.tsq() < options.tol:
            return None, niter
    return None, options.max_iters

From Feasibility to Optimization

$minimize subject to f_{0} (x), x \in K$

The optimization problem is treated as a feasibility problem with an additional constraint $f_{0} (x) \leq γ$ .
$f_{0} (x)$ could be a convex or a quasiconvex function.
$γ$ is also called the best-so-far value of $f_{0} (x)$ .

Convex Optimization Problem

Consider the following general form:

$minimize subject to γ, Φ (x, γ) \leq 0, x \in K,$

where $K_{γ}^{'} = {x ∣ Φ (x, γ) \leq 0}$ is the $γ$ -sublevel set of ${x ∣ f_{0} (x) \leq γ}$ .
👉 Note: $K^{'}_{γ} \subseteq K^{'}_{ϵ}$ if and only if $γ \leq ϵ$ (monotonicity)
One easy way to solve the optimization problem is to apply the binary search on $γ$ .

Shrinking

Another possible way is, to update the best-so-far $γ$ whenever a feasible solution $x^{'}$ is found by solving the equation: $Φ (x^{'}, γ_{new}) = 0 .$
If the equation is difficuit to solve but $γ$ is also convex w.r.t. $Φ$ , then we may create a new varaible, say $z$ and let $z \leq γ$ .

Outline of Cutting-plane method (Optim)

Given initial $S$ known to contain $K_{γ}$ .
Repeat
- Choose a point $x_{0}$ in $S$
- Query the separation oracle at $x_{0}$
- If $x_{0} \in K_{γ}$ , update $γ$ such that $Φ (x_{0}, γ) = 0$ .
- Update $S$ to a smaller set that covers: $S^{+} = S \cap {z ∣ g^{T} (z - x_{0}) + β \leq 0}$
- If $S^{+} = \emptyset$ or it is small enough, quit.

Corresponding Python code

def cutting_plane_optim(omega, S, gamma, options=Options()):
    x_best = None
    for niter in range(options.max_iters):
        cut, gamma1 = omega.assess_optim(space.xc(), gamma)
        if gamma1 is not None:  # better \gamma obtained
            gamma = gamma1
            x_best = copy.copy(space.xc())
            status = space.update_central_cut(cut)
        else:
            status = space.update_deep_cut(cut)
        if status != CutStatus.Success or space.tsq() < options.tol:
            return x_best, target, niter
    return x_best, gamma, options.max_iters

Example - Profit Maximization Problem

This example is taken from [@Aliabadi2013Robust].

$maximize subject to p (A x_{1}^{α} x_{2}^{β}) - v_{1} x_{1} - v_{2} x_{2} x_{1} \leq k .$

$p (A x_{1}^{α} x_{2}^{β})$ : Cobb-Douglas production function
$p$ : the market price per unit
$A$ : the scale of production
$α, β$ : the output elasticities
$x$ : input quantity
$v$ : output price
$k$ : a given constant that restricts the quantity of $x_{1}$

Example - Profit maximization (cont'd)

The formulation is not in the convex form.
Rewrite the problem in the following form: $maximize subject to γ γ + v_{1} x_{1} + v_{2} x_{2} \leq p A x_{1}^{α} x_{2}^{β} x_{1} \leq k .$

Profit maximization in Convex Form

By taking the logarithm of each variable:
- $y_{1} = lo g x_{1}$ , $y_{2} = lo g x_{2}$ .
We have the problem in a convex form:

$max s.t. γ lo g (γ + v_{1} e^{y_{1}} + v_{2} e^{y_{2}}) - (α y_{1} + β y_{2}) \leq lo g (p A) y_{1} \leq lo g k .$

Corresponding Python code

class ProfitOracle:
    def __init__(self, params, elasticities, price_out):
        unit_price, scale, limit = params
        self.log_pA = math.log(unit_price * scale)
        self.log_k = math.log(limit)
        self.price_out = price_out
        self.el = elasticities

    def assess_optim(self, y, gamma):
        if (fj := y[0] - self.log_k) > 0.0:  # constraint
            return (np.array([1.0, 0.0]), fj), None

        log_Cobb = self.log_pA + self.el.dot(y)
        q = self.price_out * np.exp(y)
        qsum = q[0] + q[1]
        if (fj := math.log(gamma + qsum) - log_Cobb) > 0.0:
            return (q / (gamma + qsum) - self.el, fj), None

        Cobb = np.exp(log_Cobb) # shrinking
        return (q / Cobb - self.el, 0.0), Cobb - qsum

Main program

import numpy as np
from ellalgo.cutting_plane import cutting_plane_optim
from ellalgo.ell import Ell
from ellalgo.oracles.profit_oracle import ProfitOracle

p, A, k = 20.0, 40.0, 30.5
params = p, A, k
alpha, beta = 0.1, 0.4
v1, v2 = 10.0, 35.0
el = np.array([alpha, beta])
v = np.array([v1, v2])
r = np.array([100.0, 100.0])  # initial ellipsoid (sphere)

ellip = Ell(r, np.array([0.0, 0.0]))
omega = ProfitOracle(params, el, v)
xbest, \gamma, num_iters = cutting_plane_optim(omega, ellip, 0.0)

Area of Applications

Robust convex optimization
- oracle technique: affine arithmetic
Semidefinite programming
- oracle technique: Cholesky or $L D L^{T}$ factorization
Parametric network potential problem
- oracle technique: negative cycle detection

Robust Convex Optimization

Robust Optimization Formulation

Consider:

$minimize subject to sup_{q \in Q} f_{0} (x, q), f_{j} (x, q) \leq 0, \forall q \in Q, j = 1, 2, \dots, m,$ where $q$ represents a set of varying parameters.
The problem can be reformulated as: $minimize subject to γ f_{0} (x, q) < γ f_{j} (x, q) \leq 0, \forall q \in Q, j = 1, 2, \dots, m .$

Example - Profit Maximization Problem (convex)

$max s.t. γ lo g (γ + \overset{v}{^}_{1} e^{y_{1}} + \overset{v}{^}_{2} e^{y_{2}}) - (\overset{α}{^} y_{1} + \hat{β} y_{2}) \leq lo g (\overset{p}{^} A) y_{1} \leq lo g \hat{k},$

Now assume that:
- $\overset{α}{^}$ and $\hat{β}$ vary $\overset{α}{ˉ} \pm e_{1}$ and $\overset{ˉ}{β} \pm e_{2}$ respectively.
- $\overset{p}{^}$ , $\hat{k}$ , $\overset{v}{^}_{1}$ , and $\overset{v}{^}_{2}$ all vary $\pm e_{3}$ .

Example - Profit Maximization Problem (oracle)

By detail analysis, the worst case happens when:

$p = \overset{p}{ˉ} - e_{3}$ , $k = \overset{ˉ}{k} - e_{3}$
$v_{1} = \overset{v}{ˉ}_{1} + e_{3}$ , $v_{2} = \overset{v}{ˉ}_{2} + e_{3}$ ,
if $y_{1} > 0$ , $α = \overset{α}{ˉ} - e_{1}$ , else $α = \overset{α}{ˉ} + e_{1}$
if $y_{2} > 0$ , $β = \overset{ˉ}{β} - e_{2}$ , else $β = \overset{ˉ}{β} + e_{2}$

Corresponding Python code

class ProfitRbOracle(OracleOptim):
    def __init__(self, params, elasticities, price_out, vparams):
        e1, e2, e3, e4, e5 = vparams
        self.elasticities = elasticities
        self.e = [e1, e2]
        unit_price, scale, limit = params
        params_rb = unit_price - e3, scale, limit - e4
        self.omega = ProfitOracle(params_rb, elasticities,
                                  price_out + np.array([e5, e5]))

    def assess_optim(self, y, gamma):
        el_rb = copy.copy(self.elasticities)
        for i in [0, 1]:
            el_rb[i] += -self.e[i] if y[i] > 0.0 else self.e[i]
        self.omega.el = el_rb
        return self.omega.assess_optim(y, gamma)

🔮 Oracle in Robust Optimization Formulation

The oracle only needs to determine:
- If $f_{j} (x_{0}, q) > 0$ for some $j$ and $q = q_{0}$ , then
  - the cut $(g, β)$ = $(\partial f_{j} (x_{0}, q_{0}), f_{j} (x_{0}, q_{0}))$
- If $f_{0} (x_{0}, q) \geq γ$ for some $q = q_{0}$ , then
  - the cut $(g, β)$ = $(\partial f_{0} (x_{0}, q_{0}), f_{0} (x_{0}, q_{0}) - γ)$
- Otherwise, $x_{0}$ is feasible, then
  - Let $q_{m a x} = arg max_{q \in Q} f_{0} (x_{0}, q)$ .
  - $γ := f_{0} (x_{0}, q_{m a x})$ .
  - The cut $(g, β)$ = $(\partial f_{0} (x_{0}, q_{m a x}), 0)$

Remark: for more complicated problems, affine arithmetic could be used [@liu2007robust].

Matrix Inequalities

Problems With Matrix Inequalities

Consider the following problem:

$find subject to x, F (x) ≻ 0,$

$F (x)$ : a matrix-valued function
$A ≻ 0$ denotes $A$ is positive semidefinite.

Problems With Matrix Inequalities

Recall that a matrix $A$ is positive semidefinite if and only if $v^{T} A v > 0$ for all $v \in R^{N} - 0^{N}$ .
The problem can be transformed into: $find subject to x, v^{T} F (x) v > 0, \forall v \in R^{N} - 0^{N}$
Consider $v^{T} F (x) v$ is concave for all $v \in R^{N}$ w. r. t. $x$ , then the above problem is a convex programming.
Reduce to semidefinite programming if $F (x)$ is linear w.r.t. $x$ , i.e., $F (x) = F_{0} + x_{1} F_{1} + \dots + x_{n} F_{n}$

LDLT factorization

The LDLT factorization of a symmetric positive definite matrix $A$ is the factorization $A = L D L^{T}$ , where $L$ is lower triangular with unit diagonal elements and $D$ is a diagonal matrix.
For example, $1111121211311214 = 111101010010000110000100002000021000110010101101 .$

Naïve implementation

Then, start with $a_{11} = d_{11}$ , the basic algorithm of LDLT factorization is:

$1 for i = 1 : n 2 for j = 1 : i - 1 3 s = a_{ij} - \sum_{k = 1}^{j - 1} d_{kk} l_{ik} l_{jk} 4 l_{ij} = s / d_{jj} 5 end 6 d_{ii} = a_{ii} - \sum_{k = 1}^{j - 1} d_{kk} l_{ik} l_{jk} 7 end$

Invoke $p^{3}$ FLOP's, where $p$ is the place the algorithm stops.

Storage representation

First, we pack the solution and the intermediate storage on a single matrix $T$ such that:

$t_{ij} = ⎩ ⎨ ⎧ d_{ii} l_{ij} d_{ii} l_{ji} if i = j, if i > j, if j < i .$

For example, $T = d_{11} l_{21} l_{31} l_{41} d_{11} l_{21} d_{22} l_{32} l_{42} d_{11} l_{31} d_{22} l_{32} d_{33} l_{43} d_{11} l_{41} d_{22} l_{42} d_{33} l_{43} d_{44} .$

Improved implementation

Then, start with $a_{11} = t_{11}$ , the improved implementation of LDLT factorization is:

$1 for i = 1 : n 2 for j = 1 : i - 1 3 t_{ji} = a_{ij} - \sum_{k = 1}^{j - 1} t_{ik} t_{jk} 4 t_{ij} = t_{ji} / t_{jj} 5 end 6 t_{ii} = a_{ii} - \sum_{k = 1}^{i - 1} t_{ik} t_{ki} 7 end$

Invoke $\frac{p ^{3}}{2}$ FLOP's (same as Cholesky factorization's), where $p$ is the place the algorithm stops.

Witness of indefiniteness

In the case of failure, a vector $v$ can be constructed to certify that $v^{T} A v \leq 0$ .
Let $L_{1 : p}$ denote the partial sub-matrix $L (1 : p, 1 : p)$ where $p$ is the row of failure.
Then $v = [L_{1 : p}^{- T} e_{p}, 0, \dots, 0]^{T}$ , where $e_{p} = [0, \dots, 0, 1]^{T} \in R^{p}$
Start with $v = e_{p}$ , the basic algorithm is:

$1 for i = p - 1 downto 1 2 for k = i to p 3 v_{i} = v_{i} - t_{k, i} v_{k} 4 end 5 end$

🔮 Oracle in Matrix Inequalities

The oracle only needs to:

Perform a row-based LDLT factorization such that $F (x_{0}) = L D L^{T}$ .
Let $A_{p, p}$ denotes a submatrix $A (1 : p, 1 : p) \in R^{p \times p}$ .
If the process fails at row $p$ ,
- there exists a vector $e_{p} = (0, 0, \dots, 0, 1)^{T} \in R^{p}$ , such that
  - $v = R_{p, p}^{- 1} e_{p}$ , and
  - $v^{T} F_{p, p} (x_{0}) v \leq 0$ .
- The cut $(g, β)$ = $(- v^{T} \partial F_{p, p} (x_{0}) v, - v^{T} F_{p, p} (x_{0}) v)$

Lazy evaluation

Don't construct the full matrix at each iteration!
Only O( $p^{3}$ ) per iteration, independent of $N$ !

class LMIOracle:
    def __init__(self, F, B):
        self.F = F
        self.F0 = B
        self.Q = LDLTMgr(len(B))

    def assess_feas(self, x: Arr) -> Optional[Cut]:
        def get_elem(i, j):
            return self.F0[i, j] - sum(
                Fk[i, j] * xk for Fk, xk in zip(self.F, x))

        if self.Q.factor(get_elem):
            return None
        ep = self.Q.witness()
        g = np.array([self.Q.sym_quad(Fk) for Fk in self.F])
        return g, ep

Google Benchmark 📊 Comparison

2: ----------------------------------------------------------
2: Benchmark                Time             CPU   Iterations
2: ----------------------------------------------------------
2: BM_LMI_Lazy         131235 ns       131245 ns         4447
2: BM_LMI_old          196694 ns       196708 ns         3548
2/4 Test #2: Bench_BM_lmi .....................   Passed    2.57 sec

Example - Matrix Norm Minimization

Let $A (x) = A_{0} + x_{1} A_{1} + \dots + x_{n} A_{n}$
Problem $min_{x} ∥ A (x) ∥$ can be reformulated as $minimize subject to γ, (γ I A^{T} (x) A (x) γ I) ≻ 0,$
Binary search on $γ$ can be used for this problem.

Example - Estimation of Correlation Function

$min_{κ, p} s. t. ∥Σ (p) + κ I - Y ∥ Σ (p) ≻ 0, κ \geq 0 .$

Let $ρ (h) = \sum_{i}^{n} p_{i} Ψ_{i} (h)$ , where
- $p_{i}$ 's are the unknown coefficients to be fitted
- $Ψ_{i}$ 's are a family of basis functions.
The covariance matrix $Σ (p)$ can be recast as: $Σ (p) = p_{1} F_{1} + \dots + p_{n} F_{n}$

where ${F_{k}}_{i, j} = Ψ_{k} (∥ s_{j} - s_{i} ∥_{2})$

🧪 Experimental Result

: Data Sample (kern=0.5)

: Least Square Result

🧪 Experimental Result II

: Data Sample (kern=1.0)

🧪 Experimental Result III

: Data Sample (kern=2.0)

: Least Square Result

Multi-parameter Network Problem

Parametric Network Problem

Given a network represented by a directed graph $G = (V, E)$ .

Consider:

$find subject to x, u u_{j} - u_{i} \leq h_{ij} (x), \forall (i, j) \in E,$

$h_{ij} (x)$ is the concave function of edge $(i, j)$ ,
Assume: network is large, but the number of parameters is small.

Network Potential Problem (cont'd)

Given $x$ , the problem has a feasible solution if and only if $G$ contains no negative cycle. Let $C$ be a set of all cycles of $G$ .

$find subject to x w_{k} (x) \geq 0, \forall C_{k} \in C,$

$C_{k}$ is a cycle of $G$
$w_{k} (x) = \sum_{(i, j) \in C_{k}} h_{ij} (x)$ .

Negative Cycle Finding

There are lots of methods to detect negative cycles in a weighted graph [@cherkassky1999negative], in which Tarjan's algorithm [@Tarjan1981negcycle] is one of the fastest algorithms in practice [@alg:dasdan_mcr; @cherkassky1999negative].

🔮 Oracle in Network Potential Problem

The oracle only needs to determine:
- If there exists a negative cycle $C_{k}$ under $x_{0}$ , then
  - the cut $(g, β)$ = $(- \partial w_{k} (x_{0}), - w_{k} (x_{0}))$
- Otherwise, the shortest path solution gives the value of $u$ .

🐍 Python Code

class NetworkOracle:
    def __init__(self, G, u, h):
        self._G = G
        self._u = u
        self._h = h
        self._S = NegCycleFinder(G)

    def update(self, gamma):
        self._h.update(gamma)

    def assess_feas(self, x) -> Optional[Cut]:
        def get_weight(e):
            return self._h.eval(e, x)

        for Ci in self._S.find_neg_cycle(self._u, get_weight):
            f = -sum(self._h.eval(e, x) for e in Ci)
            g = -sum(self._h.grad(e, x) for e in Ci)
            return g, f  # use the first Ci only
        return None

Example - Optimal Matrix Scaling [@orlin1985computing]

Given a sparse matrix $A = [a_{ij}] \in R^{N \times N}$ .
Find another matrix $B = U A U^{- 1}$ where $U$ is a nonnegative diagonal matrix, such that the ratio of any two elements of $B$ in absolute value is as close to 1 as possible.
Let $U = diag ([u_{1}, u_{2}, \dots, u_{N}])$ . Under the min-max-ratio criterion, the problem can be formulated as:

$minimize subject to variables π / ψ ψ \leq u_{i} ∣ a_{ij} ∣ u_{j}^{- 1} \leq π, \forall a_{ij} \neq = 0, π, ψ, u, positive π, ψ, u .$

Optimal Matrix Scaling (cont'd)

By taking the logarithms of variables, the above problem can be transformed into:

$minimize subject to variables γ π^{'} - ψ^{'} \leq γ u_{i}^{'} - u_{j}^{'} \leq π^{'} - a_{ij}^{'}, \forall a_{ij} \neq = 0, u_{j}^{'} - u_{i}^{'} \leq a_{ij}^{'} - ψ^{'}, \forall a_{ij} \neq = 0, π^{'}, ψ^{'}, u^{'} .$

where $k^{'}$ denotes $lo g (∣ k ∣)$ and $x = (π^{'}, ψ^{'})^{T}$ .

class OptScalingOracle:
    class Ratio:
        def __init__(self, G, get_cost):
            self._G = G
            self._get_cost = get_cost

        def eval(self, e, x: Arr) -> float:
            u, v = e
            cost = self._get_cost(e)
            return x[0] - cost if u < v else cost - x[1]

        def grad(self, e, x: Arr) -> Arr:
            u, v = e
            return np.array([1.0, 0.0] if u < v else [0.0, -1.0])

    def __init__(self, G, u, get_cost):
        self._network = NetworkOracle(G, u, self.Ratio(G, get_cost))

    def assess_optim(self, x: Arr, gamma: float):
        s = x[0] - x[1]
        g = np.array([1.0, -1.0])
        if (fj := s - gamma) >= 0.0:
            return (g, fj), None
        if (cut := self._network.assess_feas(x)):
            return cut, None
        return (g, 0.0), s

Example - clock period & yield-driven co-optimization

$minimize subject to variables T_{CP} / β u_{i} - u_{j} \leq T_{CP} - F_{ij}^{- 1} (β), u_{j} - u_{i} \leq F_{ij}^{- 1} (1 - β), T_{CP} \geq 0, 0 \leq β \leq 1, T_{CP}, β, u . \forall (i, j) \in E_{s}, \forall (j, i) \in E_{h},$

👉 Note that $F_{ij}^{- 1} (x)$ is not concave in general in $[0, 1]$ .
Fortunately, we are most likely interested in optimizing circuits for high yield rather than the low one in practice.
Therefore, by imposing an additional constraint to $β$ , say $β \geq 0.8$ , the problem becomes convex.

Example - clock period & yield-driven co-optimization

The problem can be reformulated as:

$minimize subject to variables γ T_{CP} - β γ \leq 0 u_{i} - u_{j} \leq T_{CP} - F_{ij}^{- 1} (β), u_{j} - u_{i} \leq F_{ij}^{- 1} (1 - β), T_{CP} \geq 0, 0 \leq β \leq 1, T_{CP}, β, u . \forall (i, j) \in E_{s}, \forall (j, i) \in E_{h},$

🫒 Ellipsoid Method Revisited

📝 Abstract

This lecture provides a brief history of the ellipsoid method. Then it discusses implementation issues of the ellipsoid method, such as utilizing parallel cuts to update the search space and enhance computation time. In some instances, parallel cuts can drastically reduce computation time, as observed in FIR filter design. Discrete optimization is also investigated, illustrating how the ellipsoid method can be applied to problems that involve discrete design variables. An oracle implementation is required solely for locating the nearest discrete solutions

Some History of Ellipsoid Method [@BGT81]

Introduced by Shor and Yudin and Nemirovskii in 1976
Used to show that linear programming (LP) is polynomial-time solvable (Kachiyan 1979), settled the long-standing problem of determining the theoretical complexity of LP.
In practice, however, the simplex method runs much faster than the method, although its worst-case complexity is exponential.

Basic Ellipsoid Method

An ellipsoid $E (x_{c}, P)$ is specified as a set ${x ∣ (x - x_{c}) P^{- 1} (x - x_{c}) \leq 1},$ where $x_{c}$ is the center of the ellipsoid.

ellipsoid

Updating the ellipsoid (deep-cut)

Calculation of minimum volume ellipsoid $E^{+}$ covering:

$E \cap {z ∣ g^{T} (z - x_{c}) + β \leq 0} .$

Deep-cut

Updating the ellipsoid (deep-cut)

Let $\tilde{g} = P g$ , $τ^{2} = g^{T} P g$ .
If $τ + n \cdot β < 0$ (shallow cut), no smaller ellipsoid can be found.
If $β > τ$ , intersection is empty.

Otherwise,

$x_{c}^{+} = x_{c} - \frac{ρ}{τ ^{2}} g, P^{+} = δ \cdot (P - \frac{σ}{τ ^{2}} g \tilde{g}^{T}) .$

where

$ρ = \frac{τ + n \cdot β}{n + 1}, σ = \frac{2 ρ}{τ + β}, δ = \frac{n ^{2} ( τ + β ) ( τ - β )}{( n ^{2} - 1 ) τ ^{2}} .$

Updating the ellipsoid (cont'd)

Even better, split $P$ into two variables $κ \cdot Q$
Let $g = Q \cdot g$ , $ω = g^{T} g$ , $τ = κ \cdot ω$ .

$x_{c}^{+} = x_{c} - \frac{ρ}{ω} g, Q^{+} = Q - \frac{σ}{ω} g \tilde{g}^{T}, κ^{+} = δ \cdot κ .$
Reduce $n^{2}$ multiplications per iteration.
👉 Note:
- The determinant of $Q$ decreases monotonically.
- The range of $δ$ is $(0, \frac{n ^{2}}{n ^{2} - 1})$ .

Central Cut

A Special case of deep cut when $β = 0$
Deserve a separate implement because it is much simplier.
Let $\tilde{g} = Q g$ , $τ = κ \cdot ω$ ,

$ρ = \frac{τ}{n + 1}, σ = \frac{2}{n + 1}, δ = \frac{n ^{2}}{n ^{2} - 1} .$

Central Cut

Calculation of minimum volume ellipsoid $E^{+}$ covering:

$E \cap {z ∣ g^{T} (z - x_{c}) \leq 0} .$

Central-cut

🪜 Parallel Cuts

Oracle returns a pair of cuts instead of just one.
The pair of cuts is given by $g$ and $(β_{0}, β_{1})$ such that:

$g^{T} (x - x_{c}) + β_{0} \leq 0, g^{T} (x - x_{c}) + β_{1} \geq 0,$ for all $x \in K$ .
Only linear inequality constraint can produce such parallel cut: $l \leq a^{T} x + b \leq u, L ⪯ F (x) ⪯ U .$
Usually provide faster convergence.

🪜 Parallel Cuts

Calculation of minimum volume ellipsoid $E^{+}$ covering:

$E \cap {z ∣ g^{T} (z - x_{c}) + β_{0} \leq 0 \land g^{T} (z - x_{c}) + β_{1} \geq 0} .$

Parallel Cut

Updating the ellipsoid (old)

Let $\tilde{g} = Q g$ , $τ^{2} = κ \cdot ω$ .
If $β_{0} > β_{1}$ , intersection is empty.
If $β_{0} β_{1} \leq - τ^{2} / n$ , no smaller ellipsoid can be found.
If $β_{1}^{2} > τ^{2}$ , it reduces to deep-cut with $β = β_{1}$
Otherwise, $x_{c}^{+} = x_{c} - \frac{ρ}{ω} g, Q^{+} = Q - \frac{σ}{ω} g \tilde{g}^{T}, κ^{+} = δ κ .$ where

\begin{array}{lll} \zeta_0 &=& \tau^2 - \beta_0^2 \ \zeta_1 &=& \tau^2 - \beta_1^2 \ \xi &=& \sqrt{4\zeta_0\zeta_1 + n^2(\beta_1^2 - \beta_0^2)^2}, \ \sigma &=& (n + (2\tau^2 + 2\beta_0\beta_1 - \xi){\color{red}(\beta_0 + \beta_1)^2} ) / (n + 1), \ \rho &=& \sigma(\beta_0 + \beta_1) / 2, \ \delta &=& (n^2/2(n^2-1)) (\zeta_0 + \zeta_1 + \xi/n) / \tau^2 . \end{array}

Updating the ellipsoid (new)

Let $\tilde{g} = Q g$ , $τ^{2} = κ \cdot ω$ .
If $β_{0} > β_{1}$ , intersection is empty.
If $τ^{2} + n β_{0} β_{1} \leq 0$ , no smaller ellipsoid can be found.
If $β_{1}^{2} > τ^{2}$ , it reduces to deep-cut with $β = β_{1}$
Otherwise, $x_{c}^{+} = x_{c} - \frac{ρ}{ω} g, Q^{+} = Q - \frac{σ}{ω} g \tilde{g}^{T}, κ^{+} = δ κ .$ where $η \overset{ˉ}{β} h k σ δ = = = = = = τ^{2} + n β_{0} β_{1} (β_{0} + β_{1}) /2 \frac{1}{2} (τ^{2} + β_{0} β_{1}) + n \overset{ˉ}{β}^{2}, h + h^{2} - (n + 1) η \overset{ˉ}{β}^{2}, η / k, ρ = σ \overset{ˉ}{β}, 1 + \frac{η}{τ ^{2} ( k - η )} (\overset{ˉ}{β}^{2} σ - β_{0} β_{1}) .$

Parallel Central Cuts

Calculation of minimum volume ellipsoid $E^{+}$ covering:

$E \cap {z ∣ g^{T} (z - x_{c}) \leq 0 \land g^{T} (z - x_{c}) + β_{1} \geq 0} .$

Updating the ellipsoid

Let $\tilde{g} = Q g$ , $τ^{2} = κ \cdot ω$ .
If $β_{1}^{2} > τ^{2}$ , it reduces to central-cut
Otherwise, $x_{c}^{+} = x_{c} - \frac{ρ}{ω} g, Q^{+} = Q - \frac{σ}{ω} g \tilde{g}^{T}, κ^{+} = δ κ .$ where $α^{2} h r ρ σ δ = = = = = = β^{2} / τ^{2} \frac{n}{2} α^{2} h + h^{2} + 1 - α^{2}, \frac{β}{r + 1} \frac{2}{r + 1}, \frac{r}{r - 1/ n} .$

Example - FIR filter design

A typical structure of an FIR filter @mitra2006digital.

The time response is: $y [t] = k = 0 \sum n - 1 h [k] u [t - k] .$

Example - FIR filter design (cont'd)

The frequency response: $H (ω) = m = 0 \sum n - 1 h (m) e^{- jmω} .$
The magnitude constraints on frequency domain are expressed as

$L (ω) \leq ∣ H (ω) ∣ \leq U (ω), \forall ω \in (- \infty, + \infty) .$

where $L (ω)$ and $U (ω)$ are the lower and upper (nonnegative) bounds at frequency $ω$ respectively.
The constraint is non-convex in general.

Example - FIR filter design (II)

However, via spectral factorization [@goodman1997spectral], it can transform into a convex one [@wu1999fir]: $L^{2} (ω) \leq R (ω) \leq U^{2} (ω), \forall ω \in (0, π),$

where
- $R (ω) = \sum_{i = - 1 + n}^{n - 1} r (t) e^{- j ω t} = ∣ H (ω) ∣^{2}$
- $r = (r (- n + 1), r (- n + 2), ..., r (n - 1))$ are the autocorrelation coefficients.

Example - FIR filter design (III)

$r$ can be determined by $h$ :

$r (t) = i = - n + 1 \sum n - 1 h (i) h (i + t), t \in Z,$

where $h (t) = 0$ for $γ < 0$ or $γ > n - 1$ .
The whole problem can be formulated as:

$min s.t. γ L^{2} (ω) \leq R (ω) \leq U^{2} (ω), \forall ω \in [0, π] R (ω) > 0, \forall ω \in [0, π]$

#🧪 Experiment

Result

📊 Google Benchmark Result

3: ------------------------------------------------------------------
3: Benchmark                        Time             CPU   Iterations
3: ------------------------------------------------------------------
3: BM_Lowpass_single_cut    627743505 ns    621639313 ns            1
3: BM_Lowpass_parallel_cut   30497546 ns     30469134 ns           24
3/4 Test #3: Bench_BM_lowpass .................   Passed    1.72 sec

Discrete Optimization

Why Discrete Convex Programming

Many engineering problems can be formulated as a convex/geometric programming, e.g. digital circuit sizing
Yet in an ASIC design, often there is only a limited set of choices from the cell library. In other words, some design variables are discrete.
The discrete version can be formulated as a Mixed-Integer Convex programming (MICP) by mapping the design variables to integers.

What's Wrong w/ Existing Methods?

Mostly based on relaxation.
Then use the relaxed solution as a lower bound and use the branch--and--bound method for the discrete optimal solution.
- 👉 Note: the branch-and-bound method does not utilize the convexity of the problem.
What if I can only evaluate constraints on discrete data? Workaround: convex fitting?

Mixed-Integer Convex Programming

Consider:

$minimize subject to f_{0} (x), f_{j} (x) \leq 0, \forall j = 1, 2, \dots x \in D$

where

$f_{0} (x)$ and $f_{j} (x)$ are "convex"
Some design variables are discrete.

🔮 Oracle Requirement

The oracle looks for the nearby discrete solution $x_{d}$ of $x_{c}$ with the cutting-plane: $g^{T} (x - x_{d}) + β \leq 0, β \geq 0, g \neq = 0$
👉 Note: the cut may be a shallow cut.
Suggestion: use different cuts as possible for each iteration (e.g. round-robin the evaluation of constraints)

Discrete Cut

Example - Multiplier-less FIR filter design (nnz=3)

Lowpass

Lecture 05a - ⏳ Clock Skew Scheduling Under Process Variations

📝 Abstract

The main topic of the lecture is clock skew scheduling under process variations. The lecture discusses various techniques and methods for optimizing clock skew to improve circuit performance or minimize timing failures.

The lecture begins with an overview of the problem and background of clock skew scheduling. It then explains the concept of clock skew and the difference between zero skew and useful skew designs. The importance of meeting timing constraints, such as setup time and hold time, is discussed, along with the potential problems that can occur if these constraints are violated.

The lecture presents various approaches to clock skew scheduling, such as traditional scheduling, yield-driven scheduling, and minimum cost-to-time ratio formulation. It also examines various methods for finding the optimal clock period and the corresponding skew schedule, including linear programming and the use of the Bellman-Ford algorithm.

The lecture goes on to discuss primitive solutions and their shortcomings, such as pre-allocating timing margins and using the Least Center Error Square (LCES) problem formulation. The lecture also introduces more advanced techniques such as slack maximization (EVEN) and prop-based methods that distribute slack along the most timing-critical cycle based on Gaussian models. The drawbacks of these methods are highlighted, particularly their assumptions about gate delay distributions.

Finally, statistical static timing analysis (SSTA) and the use of statistical methods to account for process variations are discussed. The concept of the most critical cycle is introduced, and the lecture provides experimental results to demonstrate the effectiveness of various clock skew scheduling techniques.

🔑 Keywords

Static timing analysis, STA 静态时序分析
Statistical STA 统计静态时序分析
Clock skew 时钟偏差/偏斜
Zero skew design 零偏差设计
- Critical paths 关键路径
- Negative slack 负时序裕量
Useful skew design 有效偏差设计
- Critical cycles 关键环
- Negative cycles 负环
Clock skew scheduling ⏳ (CSS) 时钟偏差安排/规划
Yield-driven CSS 产品率驱动时钟偏差安排

🗺️ Overview

Background
Problem formulation
Traditional clock skew scheduling ⏳
Yield-driven clock skew scheduling ⏳
Minimum cost-to-time ratio formulation

Sequential Logic

Local data path

Sequential Logic (cont'd)

Graph

Clock Skew

$T_{skew} (i, f) = t_{i} - t_{f}$ , where
- $t_{i}$ : clock signal delay at the initial register
- $t_{f}$ : clock signal delay at the final register

Timing Constraint

Setup time constraint $T_{skew} (i, f) \leq T_{CP} - D_{i f} - T_{setup} = u_{i f}$ While this constraint destroyed, cycle time violation (zero clocking) occurs.
Hold time constraint $T_{skew} (i, f) \geq T_{hold} - d_{i f} = l_{i f}$ While this constraint destroyed, race condition (double clocking) occurs.

Zero skew vs. Useful skew

Zero skew ( $t_{i} = t_{f}$ ) : Relatively easy to implement.
Useful skew. Improve:
- The performance of the circuit by permitting a higher maximum clock frequency, or
- The safety margins of the clock skew within the permissible ranges.
Max./min. path delays are got from static timing analysis (STA).

Timing Constraint Graph

Create a graph by
- replacing the hold time constraint with a h-edge with cost $- (T_{hold} - d_{ij})$ from $FF_{i}$ to $FF_{j}$ , and
- replacing the setup time constraint with an s-edge with cost $T_{CP} - D_{ij} - T_{setup}$ from $FF_{j}$ to $FF_{i}$ .
Two sets of constraints stemming from clock skew definition:
- The sum of skews for paths having the same starting and ending flip-flop to be the same;
- The sum of clock skews of all cycles to be zero

Timing Constraint Graph (TCG)

Example circuit

Timing Constraint Graph (TCG)

Assume $T_{setup} = T_{hold}$ = 0

Clock period $T_{CP}$ is feasible if and only if current graph contains no negative cost cycles.

TCG

Minimize Clock Period

Linear programming (LP) formulation

$minimize subject to T_{CP} l_{ij} \leq T_{i} - T_{j} \leq u_{ij}$

where $FF_{i}$ and $FF_{j}$ are sequential adjacent
The above constraint condition is so-called system of difference constraints (see Introduction to Algorithms, MIT):
👉 Note: easy to check if a feasible solution exists by detecting negative cycle using for example Bellman-Ford algorithm.

Basic Bellman-Ford Algorithm

function BellmanFord(list vertices, list edges, vertex source)
    // Step 1: initialize graph
    for each vertex i in vertices:
        if i is source then u[i] := 0
        else u[i] := inf
        predecessor[i] := null

    // Step 2: relax edges repeatedly
    for i from 1 to size(vertices)-1:
        for each edge (i, j) with weight d in edges:
*           if u[j] > u[i] + d[i,j]:
*               u[j] := u[i] + d[i,j]
*               predecessor[j] := i

    // Step 3: check for negative-weight cycles
    for each edge (i, j) with weight d in edges:
        if u[j] > u[i] + d[i,j]:
            error "Graph contains a negative-weight cycle"
return u[], predecessor[]

Problems with Bellman-Ford Algorithm

The algorithm is originally used for finding the shortest paths.
Detecting negative cycle is just a side product of the algorithm.
The algorithm is simple, but...
- detects negative cycle at the end only.
- has to compute all d[i,j].
- Restart the initialization with u[i] := inf.
- requests the input graph must have a source node.

Various improvements have been proposed extensively.

Minimize clock period (I)

Fast algorithm for solving the LP:
- Use binary search method for finding the minimum clock period.
- In each iteration, Bellman-Ford algorithm is called to detect if the timing constraint graph contains negative weighted edge cycle.
👉 Note: Originally Bellman-Ford algorithm is used to find a shortest-path of a graph.

Minimize clock period (II)

When the optimal clock period is solved, the corresponding skew schedule is got simultaneously.
However, many skew values are on the bounds of feasible range.

Timing uncertainty emerges under process variations

Yield-driven Clock Skew Scheduling

When process variations increase more and more, timing-failure-induced yield loss becomes a significant problem.
Yield-driven Clock Skew Scheduling becomes important.
Primary goal of this scheduling is to minimize the yield loss instead of minimizing the clock period.

Timing Yield Definition

The circuit is called functionally correct if all the setup- and hold-time constraints are satisfied under a group of determinate process parameters.
Timing Yield = (functional correct times) / sample number * 100%

Primitive solution (1)

Pre-allocate timing margins (usually equivalent to maximum timing uncertainty) at both ends of the FSR's (Feasible Skew Region). $l_{ij} \leq s_{ij} \leq u_{ij} ⟹ l_{ij} + Δ d \leq s_{ij} \leq u_{ij} - Δ d$
Then perform clock period optimization.

Problems with this method

The maximum timing uncertainty is too pessimistic. Lose some performance;
$Δ d$ is fixed; it does not consider data path delay differences between cycle edges.

📑 References (1)

"Clock skew optimization", IEEE Trans. Computers, 1990
"A graph-theoretic approach to clock skew optimization", ISCAS'94
"Cycle time and slack optimization for VLSI-chips", ICCAD'99
"Clock scheduling and clocktree construction for high performance Asics", ICCAD'03
"ExtensiveSlackBalance: an Approach to Make Front-end Tools Aware of Clock Skew Scheduling", DAC'06

Primitive solution (2)

Formulate as LCES (Least Center Error Square) problem
- A simple observation suggests that, to maximize slack, skew values should be chosen as close as possible to the middle points of their FSR's.
$l_{ij} + l m_{k} (u_{ij} - l_{ij}) \leq s_{ij} \leq u_{ij} - u m_{k} (u_{ij} - l_{ij})$

$minimize subject to \sum_{k} (0.5 - min (l m_{k}, u m_{k}))^{2} 0 \leq l m_{k} \leq 0.5 0 \leq u m_{k} \leq 0.5$

📑 References (2)

Graph-based algorithm
- (J. L. Neves and E. G. Friedman, "Optimal Clock Skew Scheduling Tolerant to Process Variations", DAC'96)
Quadratic Programming method
- (I. S. Kourtev and E. G. Fredman, "Clock skew scheduling ⏳ for improved reliability via quadratic programming", ICCAD'99)

Shortcoming: might reduce some slacks to be zero to minimum total CES. This is not optimal for yield.

Primitive solution (3)

Incremental Slack Distribution
- (Xinjie Wei, Yici CAI and Xianlong Hong, "Clock skew scheduling ⏳ under process variations", ISQED'06)
Advantage: check all skew constraints
Disadvantage: didn't take the path delay difference into consideration

Minimum Mean Cycle Based

Even: solve the slack optimization problem using a minimum mean cycle formulation.
Prop: distribute slack along the most timing-critical cycle proportional to path delays
FP-Prop: use sensitizable-critical-path search algorithm for clock skew scheduling.

Slack Maximization (EVEN)

Slack Maximization Scheduling

$maximize subject to t T_{j} - T_{i} \leq μ_{ij} - t$
Equivalent to the so-called minimum mean cycle problem (MMC), where $t^{*} = \frac{1}{∣ C ∣} (i, j) \in C \sum μ_{ij}$ $C$ : critical cycle (first negative cycle)
Can be solved efficiently by the above method.

Even - iterative slack optimization

Identify the circuit's most timing-critical cycle,
Distribute the slack along the cycle,
Freeze the clock skews on the cycle, and
Repeat the process iteratively.

Most timing-critical cycle

Identify the timing-critical cycle

Identify the circuit's most timing-critical cycle
Solve the minimum mean-weight cycle problem by
- Karp's algorithm
- A. Dasdan and R.K.Gupta, "Faster Maximum and Minimum Mean Cycle Algorithms for System-Performance", TCAD'98.

Distribute the slack

Distribute the slack evenly along the most timing-critical cycle.

Freeze the clock skews (I)

Replace the critical cycle with super vertex.

Freeze the clock skews (II)

To determine the optimal slacks and skews for the rest of the graph, we replace the critical cycle with super vertex.

Repeat the process (I)

Repeat the process (II)

Final result

$Skew_{12}$ = 0.75
$Skew_{23}$ = -0.25
$Skew_{31}$ = -0.5
$Slack_{12}$ = 1.75
$Slack_{23}$ = 1.75
$Slack_{31}$ = 1

where $Slack_{ij} = T_{CP} - D_{ij} - T_{setup} - Skew_{ij}$

Problems with Even

Assume all variances are the same.
However, the timing uncertainty of a long combinational path is usually larger than that of a shorter path.
Therefore, the even slack distribution along timing-critical cycles performed by Even is not optimal for yield if data path delays along the cycles are different.

Prop-Based on Gaussian model (I)

Assuming there are $n$ gates with delay $N (μ, σ^{2})$ in a path, then this path delay is $N (n μ, n σ^{2})$
Distribute slack along the most timing-critical cycle, according to the square root of each edge's path delays (???).
To achieve this, update the weights of s-edges and h-edges: $T_{CP} - (D_{ij} + α D_{ij} σ) - T_{setup} - T_{hold} + (d_{ij} - α d_{ij} σ)$ where $α$ ensures a minimum timing margin for each timing constraint.

Prop-Based on Gaussian model (II)

Given a specific clock period $T_{CP}$ , we gradually increase $α$ and use the Bellman-Ford algorithm to detect whether it is still feasible.
After finding the maximum $α$ , the edges along the most timing-critical cycle will have slacks equal to the pre-allocated timing margins.
Many edges in a circuit have sufficiently large slack. Therefore, we can perform proportional slack distribution only for the most timing-critical cycle. Assign the rest of skews using Even.

Problems with Prop

Assume all gate delay has the same distribution.
Not justify using the square root of path delay for timing margin.

FP-Prop (I)

: False path

FP-Prop (II)

If we do not consider false path, some non timing-critical cycles become timing-critical. Then, more slacks are distributed to these cycles, but the slacks in actually timing-critical cycles are not sufficient. As a result, the overall timing yield decreases.

Problems with FP-Prop

Same problems as Prop

🧪 Experimental Results

📈 Statistical Method

Setup time constraint

$T_{skew} (i, f) \leq T_{CP} - \tilde{D}_{i f} - T_{setup}$
Hold time constraint

$T_{skew} (i, f) \geq T_{hold} - \tilde{d}_{i f}$

where $D_{i f} and d_{i f}$ are random variable under process variations.

📈 Statistical TC Graph

After SSTA, edge weight is represented as a pair of value (mean, variance).

Most Critical Cycle

Traditional criteria: minimum mean cycle $C \in C min \frac{\sum _{(i, j) \in C} μ _{ij}}{∣ C ∣}$
New criteria: $C \in C min \frac{\sum _{(i, j) \in C} μ _{ij}}{\sum _{(i, j) \in C} σ _{ij}}$

(We show the correctness later)

Slack Maximization (C-PROP)

Slack Maximization Scheduling $maximize subject to t T_{j} - T_{i} \leq μ_{ij} - σ_{ij} t$
Equivalent to the minimum cost-to-time ratio problem (MMC), where:
- $t^{*} = \sum_{(i, j) \in C} μ_{ij} / \sum_{(i, j) \in C} σ_{ij}$
- $C$ : critical cycle (first negative cycle)

Probability Observation

Prob(timing failure) turns out to be an Error function that solely depends on this ratio. Therefore, it is justified to use this ratio as critical criteria.

Whole flow

After determining the clock arrival time at each vertex in the most critical cycle, the cycle is replaced with a super vertex $v^{'}$ .
In-edge $(u, v)$ from outside vertex $u$ to cycle member $v$ is replaced by an in-edge $(u, v^{'})$ with weight mean $μ (u, v) - T_{v}$ .
Out-edge $(v, u)$ is replaced by out-edge $(v^{'}, u)$ with weight mean $μ (v, u) + T_{v}$ . However, the variance of the edge weight is not changed. And parallel edges can be remained.
Repeat the process iteratively until the graph is reduced to a single super vertex, or the edges number is zero.

Data structure

Final result: $T_{1} = T_{1} + T_{s_{1}} + T_{s_{3}}$

Advantages of This Method

Justified by probability observation.
Fast algorithm exists for minimum cost-to-time ratio problem.
Reduce to Even when all variances are equal.
When a variance tends to zero, it makes sense that only minimal slack is assigned to this variable, and hence others can be assigned more.

Results

📑 Main Reference

Jeng-Liang Tsai, Dong Hyum Baik, Charlie Chung-Ping Chen, and Kewal K. Saluja, "Yield-Driven, False-Path-Aware Clock Skew Scheduling", IEEE Design & Test of Computers, May-June 2005

Lecture 05b - ⏳ Clock Skew Scheduling Under Process Variations (2)

🗺️ Overview

A Review of CSS Issues
General Formulation
Yield-driven Clock Skew Scheduling
Numerical Results

Minimum Clock Period Problem

Linear programming (LP) formulation

$minimize subject to T_{CP} l_{ij} \leq T_{i} - T_{j} \leq u_{ij}$

where $FF_{i}$ and $FF_{j}$ are sequentially adjacent to each other.
The above constraints are called system of difference constraints (see Introduction to Algorithms, MIT):
- Key: it is easy to check if a feasible solution exists by detecting negative cycles using the Bellman-Ford algorithm.

System of Difference Constraints

In some cases, you may need to do some transformations, e.g.
- $T_{i} \leq min_{k} {T_{k} + a_{ik}} \to T_{i} - T_{k} \leq a_{ik}, \forall k$
- $T_{i} \geq max_{k} {T_{k} + b_{ik}} \to b_{ik} \leq T_{i} - T_{k}, \forall k$

Slack Maximization (EVEN)

Slack Maximization Scheduling

$maximum subject to t T_{j} - T_{i} \leq μ_{ij} - t$ (👉 Note: $μ_{ij} is not equal to μ_{ji}$ )
is equivalent to the so-called minimum mean cycle problem (MMC), where:
- $t^{*} = \sum_{(i, j) \in C} μ_{ij} /∣ C ∣$ ,
- $C$ : critical cycle (first negative cycle)
Can be efficiently solved by the parametric shortest path methods.

Slack Maximization (C-PROP)

Slack Maximization Scheduling

$maximum subject to t T_{j} - T_{i} \leq μ_{ij} - σ_{ij} t$ (we show the correctness later)
is equivalent to the minimum cost-to-time ratio problem (MCR), where:
- $t^{*} = \sum_{(i, j) \in C} μ_{ij} / \sum_{(i, j) \in C} σ_{ij}$ ,
- $C$ : critical cycle

General Formulation

General form: $maximum subject to g (t) T_{i} - T_{j} \leq f_{ij} (t), \forall (i, j) \in E$ where $f_{ij} (t)$ a linear function that represents various problems defined above.

Problem	$g (t)$	$f_{ij} (t)$ (setup)	$f_{ji} (t)$ (hold)
Min. CP	$- t$	$t - D_{ij} - T_{setup}$	$- T_{hold} + d_{ij}$
EVEN	$t$	$T_{CP} - D_{ij} - T_{setup} - t$	$- T_{hold} + d_{ij} - t$
C-PROP	$t$	$T_{CP} - D_{ij} - T_{setup} - σ_{ij} t$	$- T_{hold} + d_{ij} - σ_{ij} t$

General Formulation (cont'd)

In fact, $g (t)$ and $f_{ij} (t)$ are not necessarily linear functions. Any monotonic decreasing function will do.
Theorem: if $g (t)$ and $f_{ij} (t)$ are monotonic decreasing functions for all $i$ and $j$ , then there is a unique solution to the problem. (prove later).
Question 1: Does this generalization have any application?
Question 2: What if $g (t)$ and $f_{ij} (t)$ are convex but not monotone?

🔕 Non-Gaussian Distribution

65nm and below, the path delay is likely to have a non-Gaussian distribution:

👉 Note: central limit theorem does not apply because
- random variables are correlated (why?)
- delays are non-negative

Timing Yield Maximization

Formulation:
- $max {min {Pr {T_{j} - T_{i} \leq \tilde{W}_{ij}}}}$
- is not exactly timing yield but reasonable.
It is equivalent to:

$maximum subject to t T_{i} - T_{j} \leq T_{CP} - F_{ji}^{- 1} (t) T_{j} - T_{i} \leq F_{ij}^{- 1} (1 - t)$

where $F_{ij} (\cdot) is CDF of \tilde{W}_{ij}$
Luckily, any CDF must be a monotonic increasing function.

📈 Statistical Interpretations of C-PROP

Reduce to C-PROP when $\tilde{W}_{ij}$ is Gaussian, or precisely

$F_{ij} (x) = K ((x - μ_{ij}) / σ_{ij})$
EVEN: identical distribution up to shifting

$F_{ij} (x) = H (x - μ_{ij})$

Not necessarily worse than C-PROP

⚖️ Comparison

Three Solving Methods in General

Binary search based
- Local convergence is slow.
Cycle based
- Idea: if a solution is infeasible, there exists a negative cycle which can always be "zero-out" with minimum effort (proof of optimality)
Path based
- Idea: if a solution is feasible, there exists a (shortest) path from where we can always improve the solution.

Parametric Shortest Path Algorithms

Lawler's algorithm (binary search)
Howard's algorithm (based on cycle cancellation)
Hybrid method
Improved Howard's algorithm
Input:
- Interval [tmin, tmax] that includes t*
- Tol: tolerance
- G(V, E): timing graph
Output:
- Optimal t* and its corresponding critical cycle C

Lawler's Algorithm

@startuml
 while ((tmax - tmin) > tol)
    : t := (tmin + tmax) / 2;
    if (a neg. cycle C under t exists) then
       : tmax := t;
    else
       : tmin := t;
    endif
 endwhile
 : t* := t;
@enduml

Howard's Algorithm

@startuml
 : t := tmax;
 while (a neg. cycle C under t exists)
    : find t' such that
      sum{(i,j) in C | fij(t')} = 0;
    : t := t';
 endwhile
 : t* := t;
@enduml

Hybrid Method

@startuml
 while ((tmax - tmin) > tol)
    : t := (tmin + tmax) / 2;
    if (a neg. cycle C under t exists) then
       : find t' such that
         sum{(i,j) in C | fij(t')} = 0;
       : t := t';
       : tmax := t;
    else
       : tmin := t;
    endif
 endwhile
 : t* := t;
@enduml

Improved Howard's Algorithm

@startuml
 : t := (tmin + tmax) / 2;
 while (no neg. cycle under t)
    : tmin := t;
    : t := (tmin + tmax) / 2;
 endwhile
 while (a neg. cycle C under t exists)
    : find t' such that
      sum{(i,j) in C | fij(t')} = 0;
    : t := t';
 endwhile
 : t* := t;
@enduml

]

⏳ Clock Skew Scheduling for Unimodal Distributed Delay Models

@luk036

2022-10-26

Useful Skew Design: Why and Why not?

Bad 👎:

Needs more engineer training.
Balanced clock-trees are harder to build.
Don't know how to handle process variation, multi-corner multi-mode, ..., etc.

Good 👍:

If you do it right,

spend less time struggling about timing, or
get better chip performance or yield.

What can modern STA tools do today?

Manually assign clock arrival times to registers (all zeros by default)
Grouping: Non-critical parts can be grouped as a single unit. In other words, there is no need for full-chip optimization.
Takes care of multi-cycle paths, slew rate, clock-gating, false paths etc. All we need are the reported slacks.
Provide 3-sigma statistics for slacks/path delays (POCV).
However, the full probability density function and correlation information are not available.

Unimodality

In statistics, a unimodal probability distribution or unimodal distribution is a probability distribution with a single peak.
In continuous distributions, unimodality can be defined through the behavior of the cumulative distribution function (cdf). If the cdf is convex for $x < m$ and concave for $x > m$ , then the distribution is unimodal, $m$ being the mode.
Examples
- Normal distribution
- Log-normal distribution
- Log-logistic distribution
- Weibull distribution

Quantile function

The quantile function $z_{p}$ of a distribution is the inverse of the cumulative distribution function $Φ^{- 1} (p)$ .
Close-form expression for some unimodal distributions:
- Normal: $μ + σ 2 erf^{- 1} (2 p - 1)$
- Log-normal: $exp (μ + σ 2 erf^{- 1} (2 p - 1))$
- Log-logistic: $α (\frac{p}{1 - p})^{1/ β}$
- Weibull: $λ (- ln (1 - p))^{1/ k}$
For log-normal distribution:
- mode: $exp (μ - σ^{2})$
- CDF at mode: $1/2 (1 + erf (- σ / 2))$

Normal vs. Log-normal Delay Model

Normal/Gaussian:

Convertible to a linear network optimization problem.
Supported over the whole real line. Negative delays are possible.
Symmetric, obviously not adaptable to the 3-sigma results.

Log-normal:

Non-linear, but still can be solved efficiently with network optimization.
Supported only on the positive side.
Non-symmetric, may be able to fit into the 3-sigma results. (???)

Setup- and Hold-time Constraints

Let $T_{skew} (i, f) = t_{i} - t_{f}$ , where
- $t_{i}$ : clock signal delay at the initial register
- $t_{f}$ : clock signal delay at the final register
- Assume in zero-skew, i.e. $T_{skew} (i, f) = 0$ , the reported setup- and hold-time slacks are _ $S_{i f}$ and _ $H_{i f}$ respectively.
Then, in useful skew design: $T_{skew} (i, f) \leq S_{i f} ⟹ t_{i} - t_{f} \leq S_{i f}$ $T_{skew} (i, f) \geq - H_{i f} ⟹ t_{f} - t_{i} \leq H_{i f}$
In principle, $H_{i f} and T_{CP} - S_{i f}$ represent the minimum- and maximum-path delay, and should be always greater than zero.
Let $D_{i f} = T_{CP} - S_{i f}$

Yield-driven Optimization

Max-Min Formulation:
- $max {min {Pr {t_{j} - t_{i} \leq \tilde{W}_{ij}}}}$ ,
- No need for correlation information between paths.
- Not exactly the timing yield objective but reasonable.
Equivalent to:

$maximum subject to β Pr {t_{i} - t_{j} \leq T_{CP} - D_{ij}} \geq β Pr {t_{j} - t_{i} \leq H_{ij}} \geq β$

$maximum subject to β t_{i} - t_{j} \leq T_{CP} - Φ_{D_{ij}}^{- 1} (β) t_{j} - t_{i} \leq Φ_{H_{ij}}^{- 1} (1 - β)$

Yield-driven Optimization (cont'd)

In general, Lawler's algorithm (binary search) can be used.
Depending on the distribution, there are several other ways to solve problem.

Gaussian Delay Model

Reduce to:

$maximum subject to β t_{i} - t_{j} \leq T_{CP} - (μ_{ij}^{D} + σ_{ij}^{D} 2 erf^{- 1} (2 β - 1)) t_{j} - t_{i} \leq μ_{ij}^{H} + σ_{ij}^{H} 2 erf^{- 1} (2 (1 - β) - 1)$

Linearization. Since $erf^{- 1} (\cdot)$ is anti-symmetric and monotonic, we have:

$maximum subject to β^{'} t_{i} - t_{j} \leq T_{CP} - μ_{ij}^{D} - σ_{ij}^{D} β^{'} t_{j} - t_{i} \leq μ_{ij}^{H} - σ_{ij}^{H} β^{'}$

is equivalent to the minimum cost-to-time ratio (linear).
However, actual path delay distributions are non-Gaussian.

Log-normal Delay Model

Reduce to:

$maximum subject to β t_{i} - t_{j} \leq T_{CP} - exp (μ_{ij}^{D} + σ_{ij}^{D} 2 erf^{- 1} (2 β - 1)) t_{j} - t_{i} \leq exp (μ_{ij}^{H} + σ_{ij}^{H} 2 erf^{- 1} (2 (1 - β) - 1))$

Since $erf^{- 1} (\cdot)$ is anti-symmetric and monotonic, we have:

$maximum subject to β^{'} t_{i} - t_{j} \leq T_{CP} - exp (μ_{ij}^{D} + σ_{ij}^{D} β^{'}) t_{j} - t_{i} \leq exp (μ_{ij}^{H} - σ_{ij}^{H} β^{'})$

Bypass evaluating error function. Non-linear and non-convex, but still can be solved efficiently by for example binary search on $β^{'}$ .

Weibull Delay Model

Reduce to:

$maximum subject to β t_{i} - t_{j} \leq T_{CP} - λ_{ij}^{D} (- ln (1 - β))^{1/ k_{ij}^{D}} t_{j} - t_{i} \leq λ_{ij}^{H} (- ln (β))^{1/ k_{ij}^{H}}$

Network Optimization: Quick Start

📝 Abstract

This lecture serves as an introductory guide to the algorithms used to solve network optimization problems. It covers several important concepts and techniques for both beginners and advanced users. The lecture begins by explaining how to explore the locality and associativity of a network, solve discrete optimization problems, and gain insight into critical parts of the network's cut and cycle. It then delves into basic concepts such as nodes, edges, orientation, the node-edge incidence matrix, and the boundary operator. It then explains the flow and potential of a network. It also examines feasibility problems and provides examples such as clock skew scheduling and delay padding. The lecture concludes with guidelines for algorithm developers and average users, suggesting special handling of multi-edges and techniques for finding negative cycles and cuts. Overall, the lecture provides a quick start guide to network optimization, covering important algorithms and concepts needed to tackle such problems.

📖 Introduction

Why and why not

👍 Algorithms are available for common network problems (Python: networkx, C++: Boost Graph Library (BGL)):
- Explore the locality of network.
- Explore associativity (things can be added up in any order)
👍 Be able to solve discrete problems optimally (e.g. matching/assignment problems)
👍 Bonus: gives you insight into the most critical parts of the network (critical cut/cycle)
👎 The theory is hard to understand.
👎 Algorithms are hard to understand (some algorithms do not allow users to have an input flow in reverse directions, but create edges internally for the reverse flows).
👎 There are too many algorithms available. You have to choose them wisely.

Flow and Potential

Cut
Current
Flow $x$
Sum of $x_{ij}$ around a node = 0

flow

Cycle/Path
Voltage
Tension $y$
Sum of $y_{ij}$ around a cycle = 0

potential

If you don't know more...

For the min-cost linear flow problem, the best guess is to use the "network simplex algorithm".
For the min-cost linear potential problem: formulate it as a dual (flow) problem.
For the parametric potential problem (single parameter), the best guess is to use Howard's algorithm.
All these algorithms are based on the idea of finding "negative cycle".
You can apply the same principle to the nonlinear problems.

For dual problems...

Dual problems can be solved by applying the same principle.
Finding negative cycles is replaced by finding a negative "cuts", which is more difficult...
...unless your network is a planar graph.

Guidelines for the average users

Look for specialized algorithms for specialized problems. For example, for bipartite maximum cardinality matching, use the Hopcroft-Karp matching algorithm.
Avoid creating edges with infinite costs. Delete them or reformulate your problem.

Guidelines for algorithm developers

Make "negative cycles" as orthogonal to each other as possible.
Reuse previous solutions as a new starting point for finding negative cycles.

💡 Essential Concepts

Basic elements of a network

Definition (network)

A network is a collection of finite-dimensional vector spaces, which includes nodes and edges/arcs:

$V = {v_{1}, v_{2}, \dots, v_{N}}$ , where $∣ V ∣ = N$
$E = {e_{1}, e_{2}, e_{3}, \dots, e_{M}}$ where $∣ E ∣ = M$

which satisfies 2 requirements:

The boundary of each edge is comprised of the union of nodes
The intersection of any edges is either empty or the boundary node of both edges.

Network

By this definition, a network can contain self-loops and multi-edges.
A graph structure encodes the neighborhood information of nodes and edges.
Note that Python's NetworkX requires special handling of multi-edges.
The most efficient graph representation is an adjacency list.
The concept of a graph can be generalized to complex: node, edge, face...

Types of graphs

Bipartite graphs, trees, planar graphs, st-graphs, complete graphs.

Orientation

Definition (Orientation)

An orientation of an edge is an ordering of its boundary node $(s, t)$ , where

$s$ is called a source/initial node
$t$ is called a target/terminal node

👉 Note: orientation != direction

Definition (Coherent)

Two orientations to be the same is called coherent

Node-edge Incidence Matrix (connect to algebra!)

Definition (Incidence Matrix)

An $N \times M$ matrix $A^{T}$ is a node-edge incidence matrix with entries:

$A (i, j) = ⎩ ⎨ ⎧ + 1 - 1 0 if e_{i} is coherent with the orientation of node v_{j}, if e_{i} is not coherent with the orientation of node v_{j}, otherwise.$

Example:

$A^{T} = 01 - 1 - 1 10 10 - 1 1 - 1 0 0 - 1 1$

Chain

Definition (Chain $τ$ )

An edge/node chain $τ$ is an $M$ / $N$ -tuple of scalar that assigns a coefficient to each edge/node, where $M$ / $N$ is the number of distinct edges/nodes in the network.

Remark (II)

A chain may be viewed as an (oriented) indicator vector representing a set of edges/nodes.

Example (II)

$[0, 0, 1, 1, 1]$ , $[0, 0, 1, - 1, 1]$

Discrete Boundary Operator

Definition (Boundary operator)

The boundary operator $\partial = A^{T}$ .

Definition (Cycle)

A chain is said to be a cycle if it is in the null-space of the boundary operator, i.e. $A^{T} τ = 0$ .

Definition (Boundary)

A chain $β$ is said to be a boundary of $τ$ if it is in the range of the boundary operator.

Co-boundary Operator $d$

Definition (Co-boundary operator)

The co-boundary (or differential) operator $d = \partial^{*} = (A^{T})^{*} = A$

👉 Note

Null-space of $A$ is #components of a graph

Discrete Stokes' Theorem

Let $τ_{i} = {10 if e_{i} \in S, otherwise.$
Conventional (integration): $\int_{S} d ω = \oint_{\partial S} ω$
Discrete (pairing): $[τ, A ω] = [A^{T} τ, ω]$

Fundamental Theorem of Calculus

Conventional (integration): $\int_{a}^{b} f (t) d t = F (b) - F (a)$
Discrete (pairing): $[τ_{1}, A c^{0}] = [A^{T} τ_{1}, c^{0}]$

stokes

Divergence and Flow

Definition (Divergence)

$div x = A^{T} x$

Definition (Flow)

$x$ is called a flow if $\sum div x = 0$ , where all negative entries of (div $x$ ) are called sources and positive entries are called sinks.

Definition (Circulation)

A network is called a circulation if there is no source or sink. In other words, $div x = 0$

Tension and Potential

Definition (Tension)

A tension (in co-domain) $y$ is a differential of a potential $u$ , i.e. $y = A u$ .

Theorem (Tellgen's)

Flow and tension are bi-orthogonal (isomorphic).

Proof

$0 = [A^{T} x, u] = (A^{T} x)^{T} u = x^{T} (A u) = x^{T} y$

Path

A path indicator vector $τ$ of $P$ that $τ_{i} = {10 if e_{i} \in P, otherwise.$

Theorem

[total tension $y$ on $P$ ] = [total potential on the boundary of $P$ ].

Proof

$y^{T} τ = (A u)^{T} τ = u^{T} (A^{T} τ) = u^{T} (\partial P)$ .

Cut

Two node sets $S$ and $S^{'}$ (the complement of $S$ , i.e. $V - S$ ). A cut $Q$ is an edge set, denoted by $[S, S^{'}]^{-}$ . A cut indicator vector $q$ (oriented) of $Q$ is defined as $A c$ where $c_{i} = {10 if v_{i} \in S, otherwise .$

Theorem (Stokes' theorem!)

[Total divergence of $x$ on $S$ ] = [total $x$ across $Q$ ].

Proof

$(div x)^{T} c = (A^{T} x)^{T} c = x^{T} (A c) = x^{T} q$ .

Examples

cut

Feasibility Problems

Feasible Flow/Potential Problem

Feasible Flow Problem

Find a flow $x$ such that: $c^{-} \leq x \leq c^{+}, A^{T} x = b, b (V) = 0.$
Can be solved using:
- Painted network algorithm
- If no feasible solution, return a "negative cut".

Feasible Potential Problem:

Find a potential $u$ such that: $d^{-} \leq y \leq d^{+} A \cdot u = y .$
Can be solved using:
- Bellman-Ford algorithm
- If no feasible solution, return a "negative cycle".

Examples

Genome-scale reaction network (primal)

$A$ : Stoichiometric matrix $S$
$x$ : reactions between metabolites/proteins
$c^{-} \leq x \leq c^{+}$ : constraints on reaction rates

Timing constraints (co-domain)

$A^{T}$ : incidence matrix of timing constraint graph
$u$ : arrival time of clock
$y$ : clock skew
$d^{-} \leq y \leq d^{+}$ : setup- and hold-time constraints

Feasibility Flow Problem

Theorem (feasibility flow)

The problem has a feasible solution if and only if $b (S) \leq c^{+} (Q)$ for all cuts $Q = [S, S^{'}]$ where $c^{+} (Q)$ = upper capacity [1, p. 56].

Proof (if-part)

Let $q = A \cdot k$ be a cut vector (oriented) of $Q$ . Then

$c^{-} \leq x \leq c^{+}$
$q^{T} x \leq c^{+} (Q)$
$(A \cdot k)^{T} x \leq c^{+} (Q)$
$k^{T} A^{T} x \leq c^{+} (Q)$
$k^{T} b \leq c^{+} (Q)$
$b (S) \leq c^{+} (Q)$

Feasibility Potential Problem

Theorem (feasibility potential)

The problem has a feasible solution if and only if $d^{+} (P) \geq 0$ for all cycles $P$ where $d^{+} (P)$ = upper span [1, p. ??].

Proof (if-part)

Let $τ$ be a path indicator vector (oriented) of $P$ . Then

$d^{-} \leq y \leq d^{+}$
$τ^{T} y \leq d^{+} (P)$
$τ^{T} (A \cdot u) \leq d^{+} (P)$
$(A^{T} τ)^{T} u \leq d^{+} (P)$
$(\partial P)^{T} u \leq d^{+} (P)$
$0 \leq d^{+} (P)$

Remarks

The only-if part of the proof is constructive. It can be done by constructing an algorithm to obtain the feasible solution.
$d^{+}$ could be $\infty$ or zero, etc.
$d^{-}$ could be $- \infty$ or zero, etc.
$c^{+}$ could be $\infty$ or zero, etc.
$c^{-}$ could be $- \infty$ or zero, etc.

Note: most tools require that $c^{-}$ must be zero such that the solution flow $x$ is always positive.

Convert to the elementary problem

By splitting every edge into two, the feasibility flow problem can reduce to an elementary one:
- Find a flow $x$ such that
  
  $c \leq x, A_{1}^{T} x = b_{1}, b_{1} (V_{1}) = 0.$
  
  where $A_{1}$ is the incident matrix of the modified network.

Original:

original

Modified:

modified

Convert to the elementary problem

By adding a reverse edge for every edge, the feasibility potential problem can reduce to an elementary one:
- Find a potential $u$ such that
  
  $y_{2} \leq d, A_{2} u = y_{2}$
  
  where $A_{2}$ is the incident matrix of the modified network.

Original:

original2

Modified:

modified2

]

Basic Bellman-Ford Algorithm

function BellmanFord(list vertices, list edges, vertex source)
   // Step 1: initialize graph
   for each vertex i in vertices:
       if i is source then u[i] := 0
       else u[i] := inf
       predecessor[i] := null

   // Step 2: relax edges repeatedly
   for i from 1 to size(vertices)-1:
       for each edge (i, j) with weight d in edges:
           if u[j] > u[i] + d[i,j]:
               u[j] := u[i] + d[i,j]
               predecessor[j] := i

   // Step 3: check for negative-weight cycles
   for each edge (i, j) with weight d in edges:
       if u[j] > u[i] + d[i,j]:
           error "Graph contains a negative-weight cycle"
   return u[], predecessor[]

Example 1 : Clock skew scheduling ⏳

Goal: intentionally assign an arrival time $u_{i}$ to each register so that the setup and hold time constraints are satisfied.
Note: the clock skew $s_{ij} = u_{i} - u_{j}$ is more important than the arrival time $u$ itself, because the clock runs periodically.
In the early stages, fixing the timing violation could be done as soon as a negative cycle is detected. A complete timing analysis is unnecessary at this stage.

Example 2 : Delay padding + clock skew scheduling ⏳

Goal: intentionally "insert" a delay $p$ so that the setup and hold time constraints are satisfied.
Note that a delay can be "inserted" by swapping a fast transistor into a slower transistor.
Traditional problem formulation: Find $p$ and $u$ such that

$y \leq d + p, A u = y, p \geq 0$
Note 1: Inserting delays into some local paths may not be allowed.
Note 2: The problem can be reduced to the standard form by modifying the network (timing constraint graph)

Four possible ways to insert delay

No delay:

no_delay

$p_{s} = p_{h}$ :

same_delay

Independent:

independent

$p_{s} \geq p_{h}$ :

setup_greater

Remarks (III)

If there exists a negative cycle, it means that timing cannot be fixed using simply this technique.
Additional constraints, such as $p_{s} \leq p_{m a x}$ , can be imposed.

Parametric Problems

Parametric Potential Problem (PPP)

Consider a parameter potential problem: $maximize subject to β y \leq d (β), A \cdot u = y$ where $d (β)$ is a monotonic decreasing function.
If $d (β)$ is a linear function $(m - s β)$ where $s$ is non-negative, the problem reduces to the well-known minimum cost-to-time ratio problem.
If $s$ = constant, it further reduces to the minimum mean cycle problem.

Note: Parametric flow problem can be defined similarly.

Examples (III)

$d (β)$ is linear $(m - s β)$ :
- Optimal clock period scheduling problem
- Slack maximization problem
- Yield-driven clock skew scheduling ⏳ (Gaussian)
$d (β)$ is non-linear:
- Yield-driven clock skew scheduling ⏳ (non-Gaussian)
- Multi-domain clock skew scheduling ⏳

Examples (IV)

Lawler's algorithm (binary search based)
Howard's algorithm (cycle cancellation)
Young's algorithm (path based)
Burns' algorithm (path based)
- for clock period optimization problem (all elements of $s$ are either 0 or 1)
Several hybrid methods have also been proposed

Remarks (IV)

Need to solve feasibility problems many times.
Data structures, such as Fibonacci heap or spanning tree/forest, can be used to improve efficiency
For multi-parameter problems, the ellipsoid method can be used.
Example 1: yield-driven clock skew scheduling ⏳ (c.f. lecture 5)

Example 2: yield-driven delay padding

The problem can be reduced to the standard form by modifying the underlying constraint graph.

Four possible way to insert delay

No delay:

no_delay_s

$p_{s} = p_{h}$ :

same_delay_s

Independent:

independent_s

$p_{s} \geq p_{h}$ :

setup_greater_s

Min-cost Flow/Potenial Problem

Elementary Optimal Problems

Elementary Flow Problem:

$min s. t. d^{T} x + p c \leq x, A^{T} x = b, b (V) = 0$
Elementary Potential Problem: $max s. t. b^{T} u - (c^{T} y + q) y \leq d, A u = y$

Elementary Optimal Problems (Cont'd)

The problems are dual to each other if $p + q = - c^{T} d, (x - c)^{T} (d - y) = 0, c \leq x, y \leq d$
Since $b^{T} u$ = $(A^{T} x)^{T} u = x^{T} A u = x^{T} y,$ $[min] - [max] = (d^{T} x + p) - (b^{T} u - [c^{T} y + q])$ = $d^{T} x + c^{T} y - x^{T} y + p + q = (x - c)^{T} (d - y) \geq 0$
$[min] - [max]$ when equality holds.

Remark (V)

We can formulate a linear problem in primal or dual form, depending on which solution method is more appropriate:
- Incremental improvement of feasible solutions
- Design variables are in the integral domain:
  - The max-flow problem (i.e. $d^{T} = [- 1, - 1, \dots, - 1]^{T}$ ) may be better solved by the dual method.

Linear Optimal Problems

Optimal Flow Problem: $min s. t. d^{T} x + p c^{-} \leq x \leq c^{+}, A^{T} x = b, b (V) = 0$
Optimal Potential Problem: $max s. t. b^{T} u - (c^{T} y + q) d^{-} \leq y \leq d^{+}, A u = y$

Linear Optimal Problems (II)

By modifying the network:

The problem can be reduced to the elementary case [pp.275-276]

piece of cake

Piece-wise linear convex cost can be reduced to this linear problem [p.239,p.260]

The problem has been extensively studied and has numerous applications.

Remark (VI)

We can transform the cost function to be non-negative by reversing the orientation of the negative cost edges.
Then reduce the problem to the elementary case (or should we???)

Algorithms for Optimal Flow Problems

Successive shortest path algorithm
Cycle cancellation method
- Iteratively insert additional minimal flows according to a negative cycle of the residual network until no negative cycles are found.
Scaling method

For Special Cases

Max-flow problem ( $d = - [1, \dots, 1]$ )
- Ford-Fulkerson algorithm: iteratively insert additional minimal flows according to an augmented path of the residual network, until no augmented paths of the residual network are found.
- Pre-flow Push-Relabel algorithm (dual method???)
Matching problems ( $[c^{-}, c^{+}] = [0, 1]$ )
- Edmond's blossom algorithm

Min-Cost Flow Problem (MCFP)

Problem Formulation: $min s. t. d^{T} x 0 \leq x \leq c, A^{T} x = b, b (V) = 0$
Algorithm idea: descent method: given a feasible $x_{0}$ , find a better solution $x_{1} = x_{0} + α p$ , where $α$ is positive.

General Descent Method

Input: $f (x)$ , initial $x$
Output: optimal opt $x^{*}$
while not converged,
1. Choose descent direction $p$ ;
2. Choose the step size $α$ ;
3. $x := x + α p$ ;

Some Common Descent Directions

Gradient descent: $p = - \nabla f (x)^{T}$
Steepest descent:
- $△ x_{n s d} = arg min {\nabla f (x)^{T} v ∣ ∥ v ∥ = 1}$
- $△ x_{s d}$ = $∥\nabla f (x) ∥△ x_{n s d}$ (un-normalized)
Newton's method: $p = - \nabla^{2} f (x)^{- 1} \nabla f (x)$
For convex problems, must satisfy $\nabla f (x)^{T} p < 0$ .

Note: Here, there is a natural way to choose $p$ !

Min-Cost Flow Problem (II)

Let $x_{1} = x_{0} + α p$ , then we have: $min s. t. d^{T} x_{0} + α d^{T} p - x_{0} \leq α p \leq c - x_{0} A^{T} p = 0 \Rightarrow d^{T} p < 0 \Rightarrow residual graph \Rightarrow p is a cycle!$
In other words, choose $p$ to be a negative cycle!
- Simple negative cycle, or
- Minimum mean cycle

Primal Method for MCFP

Input: $G (V, E), [c^{-}, c^{+}], d$
Output: optimal opt $x^{*}$
Initialize a feasible $x$ and certain data structure
while a negative cycle $p$ found in $G (x)$ ,
1. Choose a step size $α$ ;
2. If $α$ is unbounded, return UNBOUNDED;
3. If $α = 0$ , break;
4. $x := x + α p$ ;
5. Update corresponding data structures
return OPTIMAL

Remarks (VI)

In Step 4, negative cycle can be found using Bellman-Ford algorithm.
In the cycle cancelling algorithm, $p$ is:
- a simple negative cycle, or
- a minimum mean cycle
A heap or other data structures are used for finding negative cycles efficiently.
Usually $α$ is chosen such that one constraint is tight.

Min-Cost Potential Problem (MCPP)

Problem Formulation: $min s. t. c^{T} y y \leq d, A u = y$ where $c$ is assumed to be non-negative.
Algorithm: given an initial feasible $u_{0}$ , find a better solution $u_{1} = u_{0} + βq$ , where $β$ is positive: $min s. t. c^{T} y_{0} + c^{T} y y \leq d - A u_{0} β A q = y \Rightarrow c^{T} y < 0 \Rightarrow residual graph \Rightarrow q is a “cut”!$

Method for MCPP

Input: $G (V, E), c, d$
Output: optimal opt $u^{*}$
Initialize a feasible $u$ and certain data structure
while a negative cut $q$ found in $G (u)$ ,
1. Choose a step size $β$ ;
2. If $β$ is unbounded, return UNBOUNDED;
3. If $β = 0$ , break;
4. $u := u + βq$ ;
5. Update corresponding data structures
return OPTIMAL

Remarks (VII)

Usually $β$ is chosen such that one constraint is tight.
The min-cost potential problem is the dual of the min-cost flow problem, so algorithms can solve both problems.
In the network simplex method, $q$ is chosen from a spanning tree data structure (for linear problems only)

When "Convex Optimization" Meets "Network Flow"

📖 Introduction

Overview

Network flow problems can be solved efficiently and have a wide range of applications.
Unfortunately, some problems may have other additional constraints that make them impossible to solve with current network flow techniques.
In addition, in some problems, the objective function is quasi-convex rather than convex.
In this lecture, we will investigate some problems that can still be solved by network flow techniques with the help of convex optimization.

Parametric Potential Problems

Parametric potential problems

Consider:

$maximize subject to g (β), y \leq d (β), A u = y,$

where $g (β)$ and $d (β)$ are concave.

Note: the parametric flow problems can be defined in a similar way.

Network flow says:

For fixed $β$ , the problem is feasible precisely when there exists no negative cycle
Negative cycle detection can be done efficiently using the Bellman-Ford-like methods
If a negative cycle $C$ is found, then $\sum_{(i, j) \in C} d_{ij} (β) < 0$

Convex Optimization says:

If both sub-gradients of $g (β)$ and $d (β)$ are known, then the bisection method can be used for solving the problem efficiently.
Also, for multi-parameter problems, the ellipsoid method can be used.

Quasi-convex Minimization

Consider: $maximize subject to f (β), y \leq d (β), A u = y,$

where $f (β)$ is quasi-convex and $d (β)$ are concave.

Example of Quasi-Convex Functions

$∣ y ∣$ is quasi-convex on $R$
$lo g (y)$ is quasi-linear on $R_{++}$
$f (x, y) = x y$ is quasi-concave on $R_{++}^{2}$
Linear-fractional function:
- $f (x)$ = $(a^{T} x + b) / (c^{T} x + d)$
- dom $f$ = ${x ∣ c^{T} x + d > 0}$
Distance ratio function:
- $f (x)$ = $∥ x - a ∥_{2} /∥ x - b ∥_{2}$
- dom $f$ = ${x ∣ ∥ x - a ∥_{2} \leq ∥ x - b ∥_{2}}$

Convex Optimization says:

If $f$ is quasi-convex, there exists a family of functions $ϕ_{t}$ such that:

$ϕ_{t} (β)$ is convex w.r.t. $β$ for fixed $t$
$ϕ_{t} (β)$ is non-increasing w.r.t. $t$ for fixed $β$
$t$ -sublevel set of $f$ is $0$ -sublevel set of $ϕ_{t}$ , i.e., $f (β) \leq t$ iff $ϕ_{t} (β) \leq 0$

For example:

$f (β) = p (β) / q (β)$ with $p$ convex, $q$ concave $p (β) \geq 0$ , $q (β) > 0$ on dom $f$ ,
can take $ϕ_{t} (β)$ = $p (β) - t \cdot q (β)$

Convex Optimization says:

Consider a convex feasibility problem: $find s. t. f (β), ϕ_{t} (β) \leq 0, y \leq d (β), A u = y,$

If feasible, we conclude that $t \geq p^{*}$ ;
If infeasible, $t < p^{*}$ .

Binary search on $t$ can be used for obtaining $p^{*}$ .

Quasi-convex Network Problem

Again, the feasibility problem ([eq:quasi]) can be solved efficiently by the bisection method or the ellipsoid method, together with the negatie cycle detection technique.
Any EDA's applications ???

Monotonic Minimization

Consider the following problem: $minimize subject to max_{ij} f_{ij} (y_{ij}), A u = y,$

where $f_{ij} (y_{ij})$ is non-decreasing.
The problem can be recast as:

$maximize subject to β, y \leq f^{- 1} (β), A u = y,$

where $f^{- 1} (β)$ is non-deceasing w.r.t. $β$ .

E.g. Yield-driven Optimization

Consider the following problem:

$maximize subject to min_{ij} Pr (y_{ij} \leq \tilde{d}_{ij}) A u = y,$

where $\tilde{d}_{ij}$ is a random variables.
Equivalent to the problem:

$maximize subject to β, β \leq Pr (y_{ij} \leq \tilde{d}_{ij}), A u = y,$

where $f_{ij}^{- 1} (β)$ is non-deceasing w.r.t. $β$ .

E.g. Yield-driven Optimization (II)

Let $F (x)$ is the cdf of $\tilde{d}$ .
Then: $\Rightarrow \Rightarrow β \leq Pr (y_{ij} \leq \tilde{d}_{ij}) \leq t β \leq 1 - F_{ij} (y_{ij}) y_{ij} \leq F_{ij}^{- 1} (1 - β)$
The problem becomes:

$maximize subject to β, y_{ij} \leq F_{ij}^{- 1} (1 - β), A u = y,$

Network flow says

Monotonic problem can be solved efficiently using cycle-cancelling methods such as Howard's algorithm.

Min-cost flow problems

Min-Cost Flow Problem (linear)

Consider:

$min s. t. d^{T} x + p c^{-} \leq x \leq c^{+}, A^{T} x = b, b (V) = 0$

some $c^{+}$ could be $+ \infty$ some $c^{-}$ could be $- \infty$ .
$A^{T}$ is the incidence matrix of a network $G$ .

Conventional Algorithms

Augmented-path based:
- Start with an infeasible solution
- Inject minimal flow into the augmented path while maintaining infeasibility in each iteration
- Stop when there is no flow to inject into the path.
Cycle cancelling based:
- Start with a feasible solution $x_{0}$
- find a better sol'n $x_{1} = x_{0} + α △ x$ , where $α$ is positive and $△ x$ is a negative cycle indicator.

General Descent Method

Input: a starting $x \in$ dom $f$
Output: $x^{*}$
repeat
1. Determine a descent direction $p$ .
2. Line search. Choose a step size $α > 0$ .
3. Update. $x := x + α p$
until a stopping criterion is satisfied.

Some Common Descent Directions

For convex problems, the search direction must satisfy $\nabla f (x)^{T} p < 0$ .
Gradient descent:
- $p = - \nabla f (x)^{T}$
Steepest descent:
- $△ x^{n s d} = arg min {\nabla f (x)^{T} v ∣ ∥ v ∥ = 1}$ .
- $△ x^{s d}$ = $∥\nabla f (x) ∥△ x^{n s d}$ (un-normalized)
Newton's method:
- $p = - \nabla^{2} f (x)^{- 1} \nabla f (x)$

Network flow says (II)

Here, there is a better way to choose $p$ !
Let $x := x + α p$ , then we have: $min s. t. d^{T} x_{0} + α d^{T} p - x_{0} \leq α p \leq c - x_{0} A^{T} p = 0 \Rightarrow d^{T} < 0 \Rightarrow residual graph \Rightarrow p is a cycle!$
In other words, choose $p$ to be a negative cycle with cost $d$ !
- Simple negative cycle, or
- Minimum mean cycle

Network flow says (III)

Step size is limited by the capacity constraints:
- $α_{1} = min_{ij} {c^{+} - x_{0}}$ , for $△ x_{ij} > 0$
- $α_{2} = min_{ij} {x_{0} - c^{-}}$ , for $△ x_{ij} < 0$
- $α_{lin}$ = min ${α_{1}, α_{2}}$
If $α_{lin} = + \infty$ , the problem is unbounded.

Network flow says (IV)

An initial feasible solution can be obtained by a similar construction of the residual graph and cost vector.
The LEMON package implements this cycle cancelling algorithm.

Min-Cost Flow Convex Problem

Problem Formulation: $min s. t. f (x) 0 \leq x \leq c, A^{T} x = b, b (V) = 0$

Common Types of Line Search

Exact line search:
- $t = arg min_{t > 0} f (x + t △ x)$
Backtracking line search (with parameters $α \in (0, 1/2), β \in (0, 1)$ )
- starting from $t = 1$ , repeat $t := βt$ until $f (x + t △ x) < f (x) + α t \nabla f (x)^{T} △ x$
- graphical interpretation: backtrack until $t \leq t_{0}$

Network flow says (V)

The step size is further limited by the following:
- $α_{cvx} = min {α_{lin}, t}$
In each iteration, choose $△ x$ as a negative cycle of $G_{x}$ , with cost $\nabla f (x)$ such that $\nabla f (x)^{T} △ x < 0$

Quasi-convex Minimization (new)

Problem Formulation: $min s. t. f (x) 0 \leq x \leq c, A^{T} x = b, b (V) = 0$
The problem can be recast as: $min s. t. t f (x) \leq t, 0 \leq x \leq c, A^{T} x = b, b (V) = 0$

Convex Optimization says (II)

Consider a convex feasibility problem: $find s. t. x ϕ_{t} (x) \leq 0, 0 \leq x \leq c, A^{T} x = b, b (V) = 0$
- If feasible, we conclude that $t \geq p^{*}$ ;
- If infeasible, $t < p^{*}$ .
Binary search on $t$ can be used for obtaining $p^{*}$ .

Network flow says (VI)

Choose $△ x$ as a negative cycle of $G_{x}$ with cost $\nabla ϕ_{t} (x)$
If no negative cycle is found, and $ϕ_{t} (x) > 0$ , we conclude that the problem is infeasible.
Iterate until $x$ becomes feasible, i.e. $ϕ_{t} (x) \leq 0$ .

E.g. Linear-Fractional Cost

Problem Formulation: $min s. t. (e^{T} x + f) / (g^{T} x + h) 0 \leq x \leq c, A^{T} x = b, b (V) = 0$
The problem can be recast as: $min s. t. t (e^{T} x + f) - t (g^{T} x + h) \leq 0 0 \leq x \leq c, A^{T} x = b, b (V) = 0$

Convex Optimization says (III)

Consider a convex feasibility problem: $find s. t. x (e - t \cdot g)^{T} x + (f - t \cdot h) \leq 0, 0 \leq x \leq c, A^{T} x = b, b (V) = 0$
- If feasible, we conclude that $t \geq p^{*}$ ;
- If infeasible, $t < p^{*}$ .
Binary search on $t$ can be used for obtaining $p^{*}$ .

Network flow says (VII)

Choose $△ x$ to be a negative cycle of $G_{x}$ with cost $(e - t \cdot g)$ , i.e. $(e - t \cdot g)^{T} △ x < 0$
If no negative cycle is found, and $(e - t \cdot g)^{T} x_{0} + (f - t \cdot h) > 0$ , we conclude that the problem is infeasible.
Iterate until $(e - t \cdot g)^{T} x_{0} + (f - t \cdot h) \leq 0$ .

E.g. Statistical Optimization

Consider the quasi-convex problem:

$min s. t. Pr (d^{T} x > α) 0 \leq x \leq c, A^{T} x = b, b (V) = 0$
- $d$ is random vector with mean $d$ and covariance $Σ$ .
- Hence, $d^{T} x$ is a random variable with mean $d^{T} x$ and variance $x^{T} Σ x$ .

📈 Statistical Optimization

The problem can be recast as: $min s. t. t Pr (d^{T} x > α) \leq t 0 \leq x \leq c, A^{T} x = b, b (V) = 0$

👉 Note: $\Rightarrow Pr (d^{T} x > α) \leq t d^{T} x + F^{- 1} (1 - t) ∥ Σ^{1/2} x ∥_{2} \leq α$ (convex quadratic constraint w.r.t $x$ )

Recall...

Recall that the gradient of $d^{T} x + F^{- 1} (1 - t) ∥ Σ^{1/2} x ∥_{2}$ is $d + F^{- 1} (1 - t) (∥ Σ^{1/2} x ∥_{2})^{- 1} Σ x$ .

Problem w/ additional Constraints (new)

Problem Formulation: $min s. t. f (x) 0 \leq x \leq c, A^{T} x = b, b (V) = 0 s^{T} x \leq γ$

E.g. Yield-driven Delay Padding

Consider the following problem: $maximize subject to γ β - c^{T} p, β \leq Pr (y_{ij} \leq d_{ij} + p_{ij}), A u = y, p \geq 0$
- $p$ : delay padding
- $γ$ : weight (determined by a trade-off curve of yield and buffer cost)
- $d_{ij}$ : Gaussian random variable with mean $d_{ij}$ and variance $s_{ij}$ .

E.g. Yield-driven Delay Padding (II)

The problem is equivalent to: $max s.t. γ β - c^{T} p, y \leq d - β s + p, A u = y, p \geq 0$
or its dual: $min s.t. d^{T} x 0 \leq x \leq c, A^{T} x = b, b (V) = 0 s^{T} x \leq γ$

Recall ...

Yield drive CSS: $max s.t. β, y \leq d - β s, A u = y,$
Delay padding $max s.t. - c^{T} p, y \leq d + p, A u = y, p \geq 0$

Considering Barrier Method

Approximation via logarithmic barrier:

$min s.t. f (x) + (1/ t) ϕ (x) 0 \leq x \leq c, A^{T} x = b, b (V) = 0$
- where $ϕ (x) = - lo g (γ - s^{T} x)$
- Approximation improves as $t \to \infty$
- Here, $\nabla ϕ (x) = s / (γ - s^{T} x)$

Barrier Method

Input: a feasible $x$ , $t := t^{(0)}$ , $μ > 1$ , tolerance $ε > 0$
Output: $x^{*}$
repeat
1. Centering step. Compute $x^{*} (t)$ by minimizing $t f + ϕ$
2. Update $x := x^{*} (t)$ .
3. Increase $t$ . $t := μ t$
until $1/ t < ε$ .

👉 Note: Centering is usually done by Newton's method in general.

Network flow says (VIII)

In the centering step, instead of using the Newton descent direction, we can replace it with a negative cycle on the residual graph.

Useful Skew Design Flow

Useful Skew Design: Why vs. Why Not {#sec:first}

Why not

Some common challenges when implementing useful skew design include:

need more engineer training
difficulty in building a balanced clock-tree
uncertainty in how to handle process variation and multi-corner multi-mode issues ..., etc.

Why

If these challenges are overcome and useful skew design is implemented correctly,

it can lead to less time spent on timing issues
get better chip performance or yield

Clock Arrival Time vs. Clock Skew

Clock signal runs periodically.
Thus, absolute clock arrival time $u_{i}$ is not so important.
Instead, the skew $y_{ij} = u_{i} - u_{j}$ is more important in this scenario.

Useful Skew Design vs. Zero-Skew Design

"Critical cycle" instead of "critical path".
"Negative cycle" instead of "negative slack".
If there is a negative cycle, it means that there is no positive slack solution no matter how to schedule.
Others are pretty much the same.
Same design principle:
- Always tackle the most critical one first!

Linear Programming vs. Network Flow Formulation

Linear programming formulation
- can handle more complex constraints
Network flow formulation
- usually more efficient
- return the most critical cycle as a bonus
- can handle quantized buffer delay (???)
Anyway, timing analysis is much more time-consuming than the optimization solving.

Target Skew vs. Actual Skew

Don't mess up these two concepts:

Target skew:
- the skew we want to achieve in the scheduling stage.
- Usually deterministic (we schedule a meeting at 10:00, rather than 10:00 $\pm$ 34 minutes, right?)
Actual skew
- the skew that the clock tree actually generates.
- Can be formulated as a random variable.

A Simple Case

To warm up, let us start with a simple case:

Assume equal path delay variations.
Single-corner.
Before a clock tree is built.
No adjustable delay buffer (ADB).

Network

Definition (Network)

A network is a collection of finite-dimensional vector spaces of nodes and edges/arcs:

$V = {v_{1}, v_{2}, \dots, v_{N}}$ , where $∣ V ∣ = N$
$E = {e_{1}, e_{2}, e_{3}, \dots, e_{M}}$ , where $∣ E ∣ = M$

which satisfies 2 requirements:

The boundary of each edge is comprised of the union of nodes
The intersection of any edges is either empty or a boundary node of both edges.

Example

\begin{figure}[hp]
\centering
\input{lec07.files/network.tikz}
\caption{A network}%
\label{fig:network}
\end{figure}

Orientation

Definition (Orientation)

An orientation of an edge is an ordering of its boundary node $(s, t)$ , where

$s$ is called a source/initial node
$t$ is called a target/terminal node

Definition (Coherent)

Two orientations to be the same is called coherent

Node-edge Incidence Matrix

Definition (Incidence Matrix)

A $N \times M$ matrix $A^{T}$ is a node-edge incidence matrix with entries: $A (i, j) = ⎩ ⎨ ⎧ + 1 - 1 0 if e_{i} is coherent with v_{j}, if e_{i} is not coherent with v_{j}, otherwise.$

Timing Constraint

Setup time constraint $y_{skew} (i, f) \leq T_{CP} - D_{i f} - T_{setup} = u_{i f}$ While this constraint destroyed, cycle time violation (zero clocking) occurs.
Hold time constraint $y_{skew} (i, f) \geq T_{hold} - d_{i f} = l_{i f}$ While this constraint destroyed, race condition (double clocking) occurs.

Timing Constraint Graph

Create a graph (network) by
- replacing the hold time constraint with an h-edge with cost $- (T_{hold} - d_{ij})$ from $FF_{i}$ to $FF_{j}$ , and
- replacing the setup time constraint with an s-edge with cost $T_{CP} - D_{ij} - T_{setup}$ from $FF_{j}$ to $FF_{i}$ .
Two sets of constraints stemming from clock skew definition:
- The sum of skews for paths having the same starting and ending flip-flop to be the same;
- The sum of clock skews of all cycles to be zero

Timing Constraint Graph (TCG)

Example circuit

\begin{figure}[h!]
\centering
\input{lec05.files/tcgraph.tikz}
\end{figure}

First Thing First

Meet all timing constraints

Find $y$ in ${y \in R^{n} ∣ y \leq d, A u = y}$
How to solve:
1. Find a negative cycle, fix it.
2. Iterate until no negative cycle is found.
Bellman-Ford-like algorithm (and its variants are publicly available):
- Strongly suggest "Lazy Evaluation":
  - Don't do full timing analysis on the whole timing graph at the beginning!
  - Instead, perform timing analysis only when the algorithm needs.
- Stop immediately whenever a negative cycle is detected.

Delay Padding (DP)

Delay padding is a technique that fixes the timing issue by intentionally solely "increasing" delays.
Usually formulated as:
- Find $p, y$ in ${p, y \in R^{n} ∣ y \leq d + p, A u = y, p \geq 0}$
If the objective is to minimize the sum of $p$ , then the problem is the dual of the standard min-cost flow problem, which can be solved efficiently by the network simplex algorithm (publicly available).
Beautiful right?

Delay Padding (II)

No, the above formulation is impractical.
In modern design, "inserting" a delay may mean swapping a faster cell with a slower cell from the cell library. Thus, no need to minimize the sum of $p$ .
More importantly, it may not be possible to find a position to insert delay for some delay paths.
Some papers consider only allowing insert delays to the max-delay path only. Some papers consider only allowing insert delays to both the max- and min-delay paths together only. None of them are perfect.

Delay Padding (III)

My suggestion. Instead of calculating the necessary $p^{'} s$ and then look for the suitable position to insert, it is easier (and more flexible) to determine the position first and then calculate the suitable values.
It can be achieved by modifying the timing graph and solve a feasibility problem. Easy enough!
Quantized delay can be handled too (???).

Four possible ways to insert delay

\begin{figure}[htpb]
\centering
\subfigure[No delay can be inserted]{
\input{lec07.files/no_delay.tikz}
}
\subfigure[$p_s$, $p_h$ independently]{
\input{lec07.files/independent.tikz}
}
\subfigure[$p_s = p_h$]{
\input{lec07.files/same_delay.tikz}
}
\subfigure[$p_s \geq p_h$]{
\input{lec07.files/setup_greater.tikz}
}
\caption{}
\end{figure}

Delay Padding (cont'd)

If there exists a negative cycle in the modified timing graph, it implies that the timing problem cannot be fixed by simply the delay padding technique.
- Then, try decrease $D_{ij}$ , or increase $T_{CP}$
Be aware of the min-delay path is still the min-delay path after a certain amount of delay is inserted (how???).

Variation Issue

Yield-driven Clock Skew Scheduling

Assume all timing issues are fixed.
Now, how to schedule the arrival times to maximize yield?
According to the critical-first principle, we seek for the most critical cycle first.
The problem can be formulated as:
- $max {β \in R ∣ y \leq d - β, A u = y}$ .
It is equivalent to the minimum mean cycle problem, which can be solved efficiently by for example Howard's algorithm (publicly available).

Minimum Balancing Algorithm

Then we evenly distribute the slack on this cycle.
To continue the next most critical cycle, we contract the first one into a "super vertex" and repeat the process.
The process stops when the timing graph remains only a single vertex.
The overall method is known as minimum balancing (MB) algorithm in the literature.

Example: Most timing-critical cycle

The most vulnerable timing constraint

\input{lec05.files/tcgraph2.tikz}

Example: Distribute the slack

Distribute the slack evenly along the most timing-critical cycle.

\input{lec05.files/tcgraph3.tikz}

Example: Distribute the slack (cont'd)

To determine the optimal slacks and skews for the rest of the graph, we replace the critical cycle with a super vertex.

\input{lec05.files/tcgraph4.tikz}
\input{lec05.files/tcgraph5.tikz}

Repeat the process iteratively

\input{lec05.files/tcgraph6.tikz}

Repeat the process iteratively (II)

\input{lec05.files/tcgraph7.tikz}

Final result

Skew $_{12}$ = 0.75
Skew $_{23}$ = -0.25
Skew $_{31}$ = -0.5
Slack $_{12}$ = 1.75
Slack $_{23}$ = 1.75
Slack $_{31}$ = 1

where Slack $_{ij}$ = CP - D $_{ij}$ - T $_{setup}$ - Skew $_{ij}$

\begin{tikzpicture}
\def \radius {2cm}

\node[draw, circle, fill=cyan!20] at ({30}:\radius) (n1) {0.25};
\node[draw, circle, fill=cyan!20] at ({150}:\radius) (n2) {0.75};
\node[draw, circle, fill=cyan!20] at ({270}:\radius) (n3) {0};

\path[->, >=latex] (n2) edge [bend left=45] node[above]{0.5} (n1);
\path[->, >=latex] (n3) edge [bend left=45] node[left]{2.5} (n2);
\path[->, >=latex] (n1) edge [bend left=45] node[right]{1.5} (n3);

\path[dashed, ->, >=latex] (n1) edge [bend left=15] node[above]{1.5} (n2);
\path[dashed, ->, >=latex] (n2) edge [bend left=15] node[left]{2} (n3);
\path[dashed, ->, >=latex] (n3) edge [bend left=15] node[right]{3} (n1);

\end{tikzpicture}

What the MB algorithm really give us?

The MB algorithm not only give us the scheduling solution, but also a tree-topology that represents the order of "criticality"!

\begin{figure}
\centering
\input{lec05.files/hierachy.tikz}
\end{figure}

Clock-tree Synthesis and Placement

I strongly suggest that the topology of the clock-tree precisely follows the order of "criticality"!
- since the lower branch of clock-tree has smaller skew variation.
I also suggest that the placer should follow the topology of the clock-tree:
- Physically place the registers of the same branch together.
- The locality implies stronger correlation of variations and implies even smaller skew variation due to the cancellation effect.
- Note that the current SSTA does not provide the correlation information, so this is the best you can do!

Second Example: Yield-driven Clock Skew Scheduling

Now assume that SSTA (or STA+OCV, POCV, AOCV) is performed.
Let ( $\overset{ˉ}{d}$ , $s$ ) be the (mean, variance) of $d$
The most critical cycle can be obtained by solving:
- $max {β \in R ∣ y \leq \overset{ˉ}{d} - β s, A u = y}$
It is equivalent to the minimum cost-to-time ratio cycle problem, which can be solved efficiently by for example Howard's algorithm (publicly available).
Gaussian distribution is assumed. For arbitrary distribution, see my DAC'08 paper.

What About the Correlation?

In the above formulation, we minimum the maximum possibility of timing violation of each individual timing constraint. So only individual delay distribution is needed.
Yes, the objective function is not the true timing-yield. But it is reasonable, easy to solve, and is the best you can do so far.

Multi-Corner Issue

Meet all timing constraints in Multi-Corner

Assume no Adjustable Delay Buffer (ADB)
Find $y$ in ${y \in R^{n} ∣ y \leq d^{(k)}, A u = y, \forall k \in [1.. K]}$
Equivalent to finding $y$ in ${y \in R^{n} ∣ y \leq min_{k} {d^{(k)}}, A u = y}$
Feasibility problem
How to solve:
1. Find a negative cycle, fix it.
2. Iterate until no negative cycle is found.
Better avoid fixing the timing issue corner-by-corner. Inducing ping-pong effect.

Delay padding (DP) in Multi-Corner

The problem CANNOT be formulated as a network flow problem. But still you can solve it by a linear programming formulation.
Or, decompose the problem into sub-problems for each corner.
Again use the modified timing graph technique.
Then, $y$ 's are shared variables of sub-problems.
If we solve each sub-problem individually, the solution will not agree with each other. Induce ping-pong effect.
Need something to drive the agreement.

Delay Padding (DP) in Multi-Corner (cont'd)

Follow the idea of dual decomposition: If a solution is above the average. then introduce a punishment cost. If a solution is below the average, then introduce a rewarding cost.
Then, each subproblem is a min-cost potential problem, which can be solved efficiently.
If some subproblems do not have feasible solutions, it implies that the problem cannot be fixed by simply delay padding.
The process repeats until all solutions converge. If not, it implies that the problem cannot be fixed by simply delay padding.

Yield-driven Clock Skew Scheduling

$max {β \in R ∣ y \leq d^{(k)} - β s, A u = y, \forall k \in [1.. K]}$
More or less the same as in Single Corner.

Clock-Tree Issue

Clock Tree Synthesis (CTS)

Construct merging location
- DME algorithm, Elmore delay, buffer insertion
Some research on bounded-skew DME algorithm. But the algorithm is too complicated in my opinion.
If the previous stage is over-optimized, the clock tree is hard to implement. If it happens, some budgeting techniques should be invoked (engineering issue)
After a clock tree is constructed, more detailed timing (rather than Elmore delay) can be obtained via timing analysis.

Co-optimization Issue

After a clock tree is built, we have a clearer picture.
Should I perform the re-scheduling? And how?
Some papers suggest adding a factor to the timing constraint, say: $1.2 u_{i} - 0.8 u_{j} \leq w_{ij}$ .
Then the formulation is not a kind of network-flow, but may still be solvable by linear programming.
Need to investigate more deeply.

Adjustable Delay Buffer Issue

Adjustable delay buffers in Multi-Mode

Assume adjustable delay buffers are added solely to the clock tree
Hence, each mode can have a different set of arrival times.
Easier for clock skew scheduling, harder for clock-tree synthesis.

Meet timing constraint in Multi-Mode:

find $y^{(m)}$ in ${y^{(m)} \in R^{n} ∣ y^{(m)} \leq d^{(m)}, A u^{(m)} = y^{(m)}, \forall m \in [1.. M]}$
Can be done in parallel.
find a negative cycle, fix it (do not need to know all $d_{i}^{(m)}$ at the beginning) for every mode in parallel.

Delay Padding (DP) in Multi-mode

Again use a modified timing graph technique.
NOT a network flow problem. Use LP, or
Dual decomposition -> min-cost potential problem for each mode
- Only $p$ 's are shared variables.
- Initial feasible solution obtained by the single-mode method
  - A negative cycle => problem cannot be fixed by DP
Not converge => problem cannot be fixed by DP
- Try decrease $D_{ij}$ , or increase $T_{CP}$

Yield-driven Clock Skew Scheduling

$max {β \in R ∣ y^{(m)} \leq d^{(m)} - β s, A u^{(m)} = y^{(m)}, \forall m \in [1.. M]}$
Pretty much the same as Single-Mode.

Difficulty in ADB Multi-Mode Design

How to design the clock-tree?
What is the order of criticality?
How to determine the minimum range of ADB?

Lecture 8: Phase Shifting Mask

@luk036

2022-11-26

🗺️ Overview

Background
What is Phase Shifting Mask?
Phase Conflict Graph
Phase Assignment Problem
- Greedy Approach
- Planar Graph Approach

Background

In the past, chips have continued to get smaller and smaller, and therefore consume less and less power.
However, we are rapidly approaching the end of the road and optical lithography cannot take us to the next place we need to go.

] .pull-right[

]

Process of Lithography

Photo-resist coating (光阻涂层)
Illumination (光照)
Exposure (曝光)
Etching (蚀刻)
Impurities doping (杂质掺杂)
Metal connection

] .pull-right[

]

Sub-wavelength Lithography

Feature size is much smaller than the lithography wavelength
- 45nm vs. 193nm

] .pull-right[

What you see in the mask/layout is not what you get on the chip:
- Features are distored
- Yields are declined

]

DFM Tool (Mentor Graphics)

OPC and PSM

Results of OPC on PSM:
- A = original layout
- B = uncorrected layout
- C = after PSM and OPC

] .pull-right[

]

Phase Shifting Mask

Phase Conflict Graph

Edge between two features with separation of $\leq b$ (dark field)
Similar conflict graph for "bright field".
Construction method: plane sweeping method + dynamic priority search tree

Phase Assignment Problem

Instance: Graph $G = (V, E)$
Solution: A color assignment $c : V \to [1.. k]$ (here $k = 2$ )
Goal: Minimize the weights of the monochromatic edges. (Question: How can we model the weights?)

] .pull-right[

]

Phase Assignment Problem

In general, the problem is NP-hard.
It is solvable in polynomial time for planar graphs with $k = 2$ , since the problem is equivalent to the T-join problem in the dual graph [Hadlock75].
For planar graphs with $k = 2$ , the problem can be solved approximately in the ratio of two using the primal-dual method.

Overview of Greedy Algorithm

Create a maximum weighted spanning tree (MST) of $G$ (can be found in LEDA package)
Assign colors to the nodes of the MST.
Reinsert edges that do not conflict.
Time complexity: $O (N lo g N)$
Can be applied to non-planar graphs.

Greedy Algorithm

Step 1: Construct a maximum spanning tree $T$ of $G$ (using e.g. Kruskal's algorithm, which is available in the LEDA package).

] .pull-right[

]

Greedy Algorithm (Cont'd)

Step 2: Assign colors to the nodes of $T$ .

] .pull-right[

]

Greedy Algorithm (Cont'd)

Step 3: Reinsert edges that do not conflict.

] .pull-right[

]

Other Approaches

Reformulate the problem as a MAX-CUT problem. Note that the MAX-CUT problem is approximatable within a factor of 1.1383 using the "semi-definite programming" relaxation technique [Goemans and Williamson 93].
Planar graph approach: Convert $G$ to a planar graph by removing the minimal edges, and then apply the methods to the resulting planar graph.

👉 Note: the optimal "planar sub-graph" problem is NP-hard.

Overview of Planar Graph Approach (Hadlock's algorithm)

Approximate $G$ by a planar graph $G^{'}$
Decompose $G^{'}$ into its bi-connected components.
For each bi-connected component in $G^{'}$ ,
1. construct a planar embedding
2. construct a dual graph $G^{*}$
3. construct a complete graph $C (V, E)$ , where
  - $V$ is a set of odd-degree vertices in $G^{*}$
  - the weight of each edge is the shortest path of two vertices
4. find the minimum perfect matching 💯👬🏻 solution. The matching edges are the conflict edges that have to be deleted.
Reinsert the non-conflicting edges from $G$ .

Planar Graph Approach

Step 1: Approximate $G$ with a planar graph $G^{'}$
- It is NP-hard.
- The naive greedy algorithm takes $O (n^{2})$ time.
- Any good suggestion?

] .pull-right[

]

Planar Graph Approach

Step 2: Decompose $G^{'}$ into its bi-connected components in linear time (available in the LEDA package).

Planar Graph Approach

Step 3: For each bi-connected component in $G^{'}$ , construct a planar embedding in linear time (available in the LEDA package)

👉 Note: planar embedding may not be unique unless $G$ is tri-connected.

Planar Graph Approach

Step 4: For each bi-connected component, construct its dual graph $G^{*}$ in linear time.

] .pull-right[

]

Planar Graph Approach

Step 5: Find the minimum weight perfect matching 💯👬🏻 of $G^{*}$ .
- Polynomial time solvable.
- Can be formulated as a network flow problem.
👉 Note: complete graph vs. Voronoi graph

] .pull-right[

]

Planar Graph Approach

Step 6: reinsert the non-conflicting edges in $G$ .

👉 Note: practically we keep track of conflicting edges.

] .pull-right[

]

Lecture 9: Double Patterning 👫

@luk036

2022-11-23

Background

In the past, chips have continued to get smaller and smaller, and therefore consume less and less power.
However, we are rapidly approaching the end of the road and optical lithography cannot take us to the next place we need to go.

Sub-wavelength Lithography

Feature size << lithography wavelength
- 45 nm vs. 193 nm

What you see in the mask/layout is not what you get on the chip:
- Features are distorted
- Yields are declined

What is Double Patterning?

Unlike conventional optical lithography, which exposes the photoresist once under one mask, masks is exposed twice by splitting them into two, each with half its feature density.

Key technologies

Layout fracturing algorithm to reduce the number of rectangles and the total cut length.
Dynamic priority search tree for plane sweeping.
Graph-theoretic approach:
- Convert the coloring problem to a T-join problem and then solve it with Hadlock's algorithm.
Decompose the underlying conflict graph into its tri-connected components using a data structure named SPQR-tree.

Polygon Fracturing Algorithm

Allow minimal overlap to reduce the number of rectangles, and thus the number of conflicts.

Conflict Detection

Rule 1: If the distance between two rectangles is $\geq b$ , then the two rectangles are not in conflict.
Rule 2: Two overlapping/contacting rectangles are NOT conflict.
Rule 3:
- Definition: A polygon is said to be rectilinearly convex if it is both x-monotone and y-monotone.
- Two rectangles $X$ and $Y$ are in conflict if they are $\leq b$ apart and there is a path from $X$ to $Y$ that reconstructs a "concave" polygon.
- Conflicting: $(A, C)$ , $(B, D)$ , but not $(A, B)$ , $(A, D)$ and $(B, C)$ .

Conflict Graph

Blue edge: positive weight (opposite color preferred)
Green edge: negative weight (same color preferred)

Formulation of the Layout Decomposition Problem

INSTANCE: Graph $G = (V, E)$ and weight function $w : E \to Z$
SOLUTION: Disjoint subsets of vertices $V_{0}$ and $V_{1}$ so that $V_{0} \cup V_{1} = V$ and $V_{0} \cap V_{1} = \emptyset$ .
MINIMIZE: total cost $\sum_{e \in E_{c}} w (e)$ where $E_{c} = {(u, v) : u, v \in V_{0}$ or $u, v \in V_{1}, (u, v) \in E}$

👉 Note: the problem is

Linear time solvable for bipartite graphs.
Polynomial time solvable for planar graphs.
But in general, NP-hard (even for tripartite graphs)

Graph-Theoretic Approach

Q: How can we produce a high-quality result when the problem is NP-hard?
A: Observe that $G$ is a nearly planar graph: we can use Hadlock's algorithm.
However, the time complexity of this method is still very high.
Solution: Graph division methods.

SPQR-Tree

Connected Graph

Recall that a graph $G = (V, E)$ is a connected if every pair of vertices $u, v$ in $G$ is connected by a path.
A graph can be divided into its connected components in linear time.
Clearly, the color assignment problem can be solved independently for each connected component without affecting any QoR.

Bi-connected Graph

A vertex is called a cut-vertex of a connected graph $G$ if removing it disconnects $G$ .
If no cut-vertex is found in $G$ , then the graph is called a bi-connected graph.
In the following example, $a$ , $b$ and $c$ are cut-vertices.

Bi-connected Components $G^{'}$

A division of $G$ into its bi-connected components can be performed in linear time by using a simple depth-first search to identify cut-vertices.
It can be easily shown that the color assignment problem can be solved for each bi-connected component separately without affecting any QoR [@chiang_fast_2005]
Question: Is it possible to further decompose the graph?

Tri-connected Graph

If removing a pair of vertices will disconnect $G^{'}$ , the pair is called a separation pair of $G^{'}$ .
If no separation pair can be found in $G^{'}$ , then it is called a tri-connected graph.
In the following example, $(a, b)$ , $(g, h)$ , $(c, d)$ , $(c, e)$ and $(c, f)$ are separation pairs.

Tri-connected Graph Division

SPQR-tree

A division of $G^{'}$ into its tri-connected components can be performed by identifying the separation pairs in linear time with the help of SPQR-tree [@gutwenger_linear_2001].

Skeleton

Each tree node of SPQR-tree is associated with a tri-connected component of $G^{'}$ called skeleton
A skeleton represents a contraction of $G^{'}$ based on a set of virtual edges.
A skeleton was classified into four types:
- Series (S): the skeleton is a cycle graph.
- Parallel (P): the skeleton contains only two vertices $s$ and $t$ , and $k$ parallel edges between $s$ and $t$ where $k \geq 3$ .
- Trivial (Q): the skeleton contains only two vertices $s$ and $t$ , and two parallel edges between $s$ and $t$ , one of which is virtual and the other is real.
- Rigid (R): the skeleton is a tri-connected graph of a type other than the above.

Divide-and-Conquer Method

Divide-and-conquer method

Consists of three basic steps:

Divide a conflict graph into its tri-connected components.
Conquer each tri-connected component in a bottom-up manner.
Combine the solutions into a complete solution in a top-down manner.

We calculate two possible solutions for each component, namely $(s, t)$ of the same color and $(s, t)$ of the opposite color.

Node splitting (additional rectangle splitting) for resolving conflicts.
To reduce the number of "cuts", we apply node splitting after one color assignment and then recolor.

Before:

] .pull-right[

After:

]

🧪 Experimental Results

45 nm SDFFRS_X2 Layer 11, 9

fft_all, 320K polygons

🧪 Experimental Results

#poly	#nodes/#edges	w/ spqr	w/o spqr	time	cost
3631	31371/52060	13.29	38.25	65.3%	4.58%
9628	83733/138738	199.94	2706.12	92.6%	2.19%
18360	159691/265370	400.43	4635.14	91.4%	1.18%
31261	284957/477273	1914.54	9964.18	80.7%	1.61%
49833	438868/738759	3397.26	15300.9	77.8%	1.76%
75620	627423/1057794	3686.07	17643.9	79.1%	2.50%

: Experimental results of the runtime and cost reduction (with minimizing the number of stitches)

Algorithms for Design-for-Manufacturability