367 lines
21 KiB
Typst
367 lines
21 KiB
Typst
//#import "@preview/clean-acmart:0.0.1": acmart, acmart-ccs, acmart-keywords, acmart-ref, to-string
|
||
#import "clean-acmart.typ": acmart
|
||
#import "@preview/cetz:0.3.4"
|
||
#import "@preview/lilaq:0.3.0" as lq
|
||
#import "@preview/cetz:0.3.2"
|
||
#import "@preview/cetz-plot:0.1.1": chart as cetz_chart
|
||
|
||
#let title = [Dataflow Analysis for Compiler Optimization]
|
||
#let authors = (
|
||
(
|
||
name: "Matthias Veigel",
|
||
email: "matthias.veigel@uni-ulm.de",
|
||
department: [Institute of Software Engineering and Programming Languages],
|
||
institute: [University Ulm]
|
||
),
|
||
)
|
||
|
||
#show: acmart.with(
|
||
title: title,
|
||
authors: authors,
|
||
copyright: none
|
||
// Set review to submission ID for the review process or to "none" for the final version.
|
||
// review: [\#001],
|
||
)
|
||
|
||
#set figure(supplement: [Fig.])
|
||
#show figure.caption: it => [
|
||
#set text(size: 8pt)
|
||
*#it.supplement #context it.counter.display(it.numbering)*
|
||
#it.body
|
||
]
|
||
#show figure.where(kind: "raw"): set figure(supplement: [Listing])
|
||
#show figure.where(kind: "raw"): it => align(left)[
|
||
#v(8pt, weak: true)
|
||
#it.body
|
||
#v(4pt, weak: true)
|
||
#it.caption
|
||
#v(8pt, weak: true)
|
||
]
|
||
|
||
#set heading(numbering: "1.1.1")
|
||
|
||
= Abstract
|
||
Dataflow analysis is an important part of compiler optimization since it allows to eliminate parts of the code with various techniques such as: constant propagation, dead code elimination, branch elimination. This work aims to look at the advantages and disadvantages of using dataflow analysis, how it is already used in current compilers, on which programming languages / immediate representations it operates and what limitations still exist. \
|
||
For this purpose we conducted a systematic literature in which we analyzed 15 publications selected from 571 entries. Finally following conclusions were drawn: // TODO
|
||
|
||
|
||
= Introduction
|
||
Program performance remains a large concern in modern computing and programming, since it has a direct impact on user and developer experience. As software is becoming more complex, manual optimization is increasingly complex and harder for developers to implement.
|
||
Another problem with this increasing complexity is that large codebases are spread out over more files, which also makes it harder for developers to keep an overview and to implement optimizations. Because of these reasons automatic optimization is needed in compilers. \
|
||
#figure(
|
||
caption: [C code and respective SSA form],
|
||
kind: "raw",
|
||
grid(
|
||
columns: (1fr, 1fr),
|
||
```C
|
||
int x = 8;
|
||
x = x - 2;
|
||
if (x < 4)
|
||
x = 10;
|
||
else
|
||
x = 12;
|
||
int y = x * 2;
|
||
```,
|
||
/*```
|
||
x₁ = 8
|
||
x₂ = x₁ - 2
|
||
if (x₂ < 4)
|
||
x₃ = 10
|
||
else
|
||
x₄ = 12
|
||
x₅ = ɸ(x₃, x₄)
|
||
y₁ = x₅ * 2
|
||
```*/
|
||
image("ssa-example.svg", height: 16em)
|
||
)
|
||
) <ssa_form_example>
|
||
Many modern compilers and analysis tools operate on a Static Single-Assignment (SSA) form @cooper_keith_d_engineering_nodate @cytron_efficiently_1991. The SSA form works by assigning each variable only once. This is done by creating multiple subvariable $x_1, x_2, ...$ for each variable $x$. After a branch in the program a #{sym.Phi}-Node is used to select the new value of the variable based on branch executed.
|
||
An example of the SSA form can be seen in @ssa_form_example. On the left is a simple C code in a function body and on right is the respective SSA form of the C code. \
|
||
Dataflow analysis is a technique used to gather information about the state of variables throughout the flow of the program. It plays an important role in many compilers, since by analyzing how, where and what variables are assigned and how these variables are used, many complex optimizations, which require context from the surrounding code, can be implemented. \
|
||
Dataflow analysis is an evolving field where regularly new techniques are created and older techniques improved. Different compilers and analysis framework implement different methods and optimizations with dataflow analysis. This work aims to summarrize the current state and past achievements of this technology.
|
||
This work is divided into the following sections in @methodology_c the methology used to create this work is described. // TODO
|
||
// TODO explain constant propagation, ...
|
||
// TODO LLVM, GCC as examples
|
||
|
||
= Methodology <methodology_c>
|
||
This work is created following the process described in @process_fig. The protocol for the review is divided up into the object of the research see @research_questions_s, the search strategy see @sas_s, the selection criteria see @selection_criteria_s and the data extraction strategy see @data_extraction_s.
|
||
#place(
|
||
bottom + center,
|
||
scope: "parent",
|
||
float: true,
|
||
[
|
||
#figure(
|
||
caption: [Overview of the review process. Adapted from @ciccozzi_execution_2019 and @gotz_claimed_2021.],
|
||
image("review_process.png")
|
||
) <process_fig>
|
||
]
|
||
)
|
||
|
||
== Objective and research questions <research_questions_s>
|
||
The goal of this research paper is to find claims about the advantages and disadvantages of using dataflow analysis for compiler optimization and where DFA is already implemented in Compilers.
|
||
This goal has been defined in two research questions:
|
||
- RQ1 --- What are the advantages and disadvantages of using dataflow analysis for compiler optimization? \
|
||
This questions aims to identify which advantages DFA has over other optimization techniques and which disadvantages it has when used.
|
||
|
||
- RQ2 --- How is dataflow analysis used in current compilers? \
|
||
This questions aims to identify how DFA is already used in current compilers, what optimizations are done with it and if it is used during normal compilation or if it has to be explicitly enabled.
|
||
|
||
== Search and selection strategy <sas_s>
|
||
#figure(
|
||
caption: [Search string used in electronic databases],
|
||
kind: "raw",
|
||
align(left)[
|
||
// ("dataflow analysis" OR "data flow analysis") AND (compiler OR compilers OR compilation) AND (optimization OR optimizations) AND (advantages OR disadvantages OR strengths OR limitations OR trade-offs) AND (implementation OR usage OR used OR applied)
|
||
// ("Full Text .AND. Metadata":"dataflow analysis" OR "Full Text .AND. Metadata":"data flow analysis") AND ("Full Text .AND. Metadata":compiler OR "Full Text .AND. Metadata":compilers OR "Full Text .AND. Metadata":compilation) AND ("Full Text .AND. Metadata":optimization OR "Full Text .AND. Metadata":optimizations) AND ("Full Text .AND. Metadata":advantages OR "Full Text .AND. Metadata":disadvantages OR "Full Text .AND. Metadata":strengths OR "Full Text .AND. Metadata":limitations OR "Full Text .AND. Metadata":trade-offs) AND ("Full Text .AND. Metadata":implementation OR "Full Text .AND. Metadata":usage OR "Full Text .AND. Metadata":used OR "Full Text .AND. Metadata":applied)
|
||
#set raw(syntaxes: "search-string.sublime-syntax", theme: "search-string.tmTheme")
|
||
// AND ("compiler optimization" OR "compilation optimization" OR "compiler optimizations" OR "compilation optimizations" OR "optimizing compiler" OR "optimizing compilers")
|
||
```SearchString
|
||
("dataflow analysis" OR "data flow analysis")
|
||
AND (compiler OR compilers OR compilation)
|
||
AND (optimization OR optimizations)
|
||
AND (advantages OR disadvantages OR strengths OR limitations OR trade-offs)
|
||
AND (implementation OR usage OR used OR applied)
|
||
```
|
||
]
|
||
) <sas_search_string>
|
||
Our search strategy consisted of 5 steps as seen in @sas_fig. \
|
||
The papers from the first steps are collected from the electronic databases ACM Digital Library, IEEE Xplore, Springer Link with the search string seen in @sas_search_string.
|
||
The search string in @sas_search_string was created using the research questions in @research_questions_s and was always applied to the full text of the papers. \
|
||
In the second step all duplicates which where returned from multiple databases where removed from the results and the amount was limited to fit the scope of this paper. \
|
||
In the third step the selection was filtered by applying all selection criteria from @selection_criteria_s. \
|
||
In the fourth step we snowballed the previously acquired results. This was to find relevant papers which where not included because of either the search string or the search criteria. \
|
||
Afterwards all papers found via the snowballing where filtered again by applying the selection criteria in @selection_criteria_s. \
|
||
In the end all papers from the third step and the papers of the snowballing where evaluated based on the data extraction items mentioned in @data_extraction_s.
|
||
#place(
|
||
auto,
|
||
scope: "parent",
|
||
float: true,
|
||
[
|
||
#set par(leading: 0.3em)
|
||
#set text(size: 8pt)
|
||
#figure(
|
||
caption: [Search and selection process],
|
||
cetz.canvas({
|
||
import cetz.draw: *
|
||
let bs = (2.8, 1)
|
||
|
||
set-style(stroke: (thickness: 0.5pt))
|
||
|
||
rect((0, 0), (rel: bs), name: "acm")
|
||
rect((0, -(bs.at(1)+0.3)*1), (rel: bs), name: "ieee")
|
||
rect((0, -(bs.at(1)+0.3)*2), (rel: bs), name: "springer")
|
||
rect((bs.at(0)+1.5, -(bs.at(1)+0.3)), (rel: bs), name: "dup")
|
||
rect((bs.at(0)*2+2.25, -(bs.at(1)+0.3)), (rel: bs), name: "sel")
|
||
rect((bs.at(0)*3+3, -(bs.at(1)+0.3)), (rel: bs), name: "snow")
|
||
rect((bs.at(0)*4+3.75, -(bs.at(1)+0.3)), (rel: bs), name: "reap")
|
||
rect((bs.at(0)*5+4.25, -(bs.at(1)+0.3)), (rel: bs), name: "inc")
|
||
|
||
line("acm.east", (rel: (0.75, 0)), name: "dlu")
|
||
line("ieee.east", (rel: (0.75, 0)))
|
||
line("springer.east", (rel: (0.75, 0)), name: "dld")
|
||
line("dlu.end", "dld.end", name: "dl")
|
||
|
||
set-style(mark: (end: "straight"))
|
||
line("dl.50%", "dup.west")
|
||
line("dup.east", "sel.west")
|
||
line("sel.east", "snow.west")
|
||
line("snow.east", "reap.west")
|
||
line("reap.east", "inc.west")
|
||
|
||
content("acm", align(center)[ACM Digital Library \ n = 3594])
|
||
content("ieee", align(center)[IEEE Xplore \ n = 1720])
|
||
content("springer", align(center)[Springer Link \ n = 786])
|
||
content("dup", align(center)[Duplicate removal and \ preliminary filtering \ n = 471])
|
||
content("sel", align(center)[Application of \ selection criteria \ n = 10])
|
||
content("snow", align(center)[Snowballing \ n = 110])
|
||
content("reap", align(center)[Rreapplication \ of selection criteria \ n = 15])
|
||
content("inc", align(center)[Publications included \ n = 15])
|
||
})
|
||
) <sas_fig>
|
||
]
|
||
)
|
||
|
||
== Selection criteria <selection_criteria_s>
|
||
For a publication to be relevant it has to satisfy at least one inclusion criteria and not any exclusion criteria. The criteria were chosen to include as any publications as possible but still filter out irrelevant ones.
|
||
#[
|
||
#v(10pt)
|
||
#set enum(numbering: (.., i) => "IC" + str(i))
|
||
+ Publications discussing advantages and disadvantages of DFA compared to other optimization techniques.
|
||
+ Publications focusing on one or more compilers (e.g., LLVM, Java JIT, C\# JIT).
|
||
+ Publications providing an implementation for a DFA optimization.
|
||
#v(10pt)
|
||
]
|
||
We chose _IC1_ to help answer _RQ1_. \
|
||
_IC2_ is to include publications which talk about a compiler and how DFA is implemented in it. \
|
||
_IC3_ is to further include publications which directly provide an implementation.
|
||
#[
|
||
#v(10pt)
|
||
#set enum(numbering: (.., i) => "EC" + str(i))
|
||
+ Publications which discuss DFA in a non-compiler context.
|
||
+ Publications written in a language other than english.
|
||
+ Secondary and tertiary publications (e.g., systematic literature reviews, surveys).
|
||
+ Publications in the form of tutorial papers, short papers, poster papers, editorials.
|
||
+ Publications for which the full text is not available.
|
||
#v(10pt)
|
||
]
|
||
_EC1_ is to exclude publications which talk about DFA in other contexts which are not relevant to compiler optimization. \
|
||
_EC2-EC5_ are to exclude publications which do not provide enough information to include them in this publication.
|
||
|
||
== Data extraction <data_extraction_s>
|
||
Based on the research questions I collected 9 data items to exrtract from all included publications. @data_extraction_table lists all data items. \
|
||
Data items _D1-D3_ are to document the source of the publication. \
|
||
_D4_ and _D5_ are to explicitly list the advantages and disadvantages for answering _RQ1_. \
|
||
_D6_ and _D7_ show in which compiler DFA was implemented and if it is running directly on a programming language like C++ or if it runs on a intermediate language like LLVM IR. \
|
||
_D8_ lists which optimizations where performed based on the results of DFA and _D9_ lists the limitations of the executed DFA. (e.g., only run on function scope). \
|
||
All data items were extracted from the full text of all included publications.
|
||
#place(
|
||
auto,
|
||
scope: "parent",
|
||
float: true,
|
||
[
|
||
#set par(leading: 0.3em)
|
||
#set text(size: 9pt)
|
||
#figure(
|
||
caption: [Data items],
|
||
supplement: "Table",
|
||
table(
|
||
columns: (1fr, 8fr, 2fr),
|
||
stroke: (x, y) => if y == 0 { (bottom: 0.7pt + black) },
|
||
align: left,
|
||
inset: (x: 6pt, y: 2pt),
|
||
[ID], [Data], [Purpose],
|
||
..(
|
||
([Author(s)], [Documentation]),
|
||
([Publication year], [Documentation]),
|
||
([Title], [Documentation]),
|
||
([Named advantage(s) of DFA for CO], [RQ1]),
|
||
([Named disadvantage(s) of DFA for CO], [RQ1]),
|
||
([Analyzed compiler(s)], [RQ2]),
|
||
([Targeted language(s) of the optimization], [RQ2]),
|
||
([What optimizations are implemented with DFA], [RQ2]),
|
||
([Limitations of the analysis], [RQ2])
|
||
).enumerate(start: 1).map(((i, arr)) => ([D#i], ..arr)).flatten()
|
||
)
|
||
) <data_extraction_table>
|
||
]
|
||
)
|
||
|
||
= Findings
|
||
In this chapter we list our findings from the conducted systematic literature analysis.
|
||
|
||
== Demographic
|
||
#v(1em, weak: true)
|
||
#figure(
|
||
caption: "Publication years of the publications",
|
||
{
|
||
let data = (
|
||
(1973, 1),
|
||
(1997, 1),
|
||
(2010, 2),
|
||
(2011, 2),
|
||
(2012, 1),
|
||
(2013, 2),
|
||
(2015, 1),
|
||
(2018, 1),
|
||
(2019, 1),
|
||
(2020, 2),
|
||
(2024, 1)
|
||
)
|
||
lq.diagram(
|
||
width: 8.5cm,
|
||
xlim: (1972, 2026),
|
||
ylim: (0, 2.5),
|
||
yaxis: (subticks: none, ticks: range(0, 3)),
|
||
xaxis: (ticks: range(1975, 2026, step: 5)),
|
||
lq.bar(
|
||
data.map(v => v.at(0)),
|
||
data.map(v => v.at(1))
|
||
)
|
||
)
|
||
}
|
||
) <demographic_pub_year>
|
||
#figure(
|
||
caption: "Target languages of the publications",
|
||
{
|
||
let data = (
|
||
("None", 1),
|
||
("Custom", 1),
|
||
("C", 3),
|
||
("LLVM IR", 5),
|
||
("Java Bytecode", 2),
|
||
("Graal IR", 1),
|
||
("SSA of Java", 2)
|
||
)
|
||
|
||
cetz.canvas({
|
||
//let colors = (red, eastern, green, blue, navy, purple, maroon, orange)
|
||
let colors = gradient.linear(..color.map.rainbow.map(v => v.darken(20%).saturate(20%)))
|
||
|
||
cetz_chart.piechart(
|
||
data,
|
||
value-key: 1,
|
||
label-key: 0,
|
||
radius: 3,
|
||
slice-style: colors,
|
||
inner-radius: 0,
|
||
inner-label: (content: (value, _) => [#text(white, str(value))], radius: 150%),
|
||
outer-label: (content: (value, _) => [], radius: 0%),
|
||
legend: (
|
||
position: "east",
|
||
anchor: "south",
|
||
orientation: ttb,
|
||
offset: (1.7cm, -2.5cm)
|
||
)
|
||
)
|
||
})
|
||
}
|
||
) <demographic_target_lang>
|
||
As seen in @demographic_pub_year most of the analyzed publication are from the last 15 years, which indicates that this field is still actively being researched and explore, but research has already start back in 1983. \
|
||
@demographic_target_lang shows a strong trend towards implementing DFA optimizations either with LLVM directly or by operating on the LLVM IR, while java is either directly used as bytecode or at SSA representation of Java.
|
||
|
||
== RQ1: Advantages and disadvantages of using Dataflow analysis for compiler optimization
|
||
DFA makes many big compiler optimizations possible but it also brings many trade-offs and not just for performance.
|
||
These optimizations eliminate unused code and simplify expressions, which reduces execution time and memory footprint during runtime.
|
||
[*P1*] is one of the first publications talking about DFA and how it allows to use previously existing optimizations, which could only be applied on code sections without branches, with branching by checking how data flows through the branches. Later publications [*P2*, *P5*]
|
||
describe ways to apply these optimization interprocedurally and across thread synchronisation boundaries. But [*P5*] also describes, that programs must be well synchronized, otherwise DFA can not be used because of possible data races. \
|
||
=== Analysis performance
|
||
While perofrmance is not the biggest concern for DFA, since it runs at compile-time and accuracy is more important [*P4*], many publications [*P4*, *P6*, *P14*, *P15*] have investigated how to improve the performance of DFA. This is done with several techniques: In [*P4*, *P6*] different function calls are run on different threads, but it has the problem of creating and queue a task for each function, which can lead to a big overhead. In [*P6*] independent branches are also run on separate threads. A big problem with both approaches is to avoid, that some functions could be queued for analysis be more than one thread, which leads to unnecessary redundancy. \
|
||
Another approach [*P14*] is to pipeline the function calls. This is done by analyzing all variables, which do not depend on any function calls. When the function calls have finished being analyzed, the variables, which depend on that function call are analyzed. Thereby more parallel work is possible.
|
||
=== Implementation complexity
|
||
Another problem with DFA is the difficulty to implement optimizations with it [*P3*, *P11*]. DFA is often also deeply entangled with the compiler internals, which makes it difficult to reuse existing optimizations with other compilers or implement new optimizations quickly and it is complicated to implemented, as seen in LLVM: "simple peephole optimizations in the LLVM instcombine pass contain approximately 30000 lines of complex C++ code, despite the transformations being simple" [*P11*] \
|
||
One solutions to this problem is described in [*P3*] by implementing a library in Haskell which performs the dataflow analysis and provides an interface, which "is made possible by sophisticated aspects of Haskell’s type system, such as higher-rank polymorphism, GADTs, and type functions" [*P3*], to implement various optimizations, which also then can be reused for other compilers. The biggest drawback of this library is it's limited to compilers implemented in Haskell. \
|
||
[*P11*] describes a domain specific language to implement LLVM optimization passes. This is done by a having a simple language for directly implementing the logic of the optimization, while a custom transpiler then converts it into a LLVM pass written in C++. Since the LLVM pass is implemented in a more generic way to fit this purpose, it leads to a moderate compile time increase. There is no formal verification done on the implemented optimization pass. Because of these disadvantages it is a great tool to quickly implement, test and iterate optimizations, but for a more permanent passes, hand-written C++ code should be used.
|
||
|
||
== RQ2: Usage of dataflow analysis in current compilers
|
||
The Glasgow Haskell Compiler (GHC), LLVM and GCC are good examples for compilers which already extensively use DFA to implement optimizations.
|
||
These optimizations include common sub-expression elimination [*P1*, *P7*, *P13*], copy propagation [*P5*, *P7*], constant propagation [*P1*, conditional branch elimination [*P2*] and dead code elimination [*P13*].
|
||
// TODO
|
||
|
||
= Conclusion
|
||
Our findings show that DFA is already extensively used in current compilers and brings big advantages for runtime speed. The cost of this is a higher compilation duration, which makes it unsuitable for JIT compilation. Furthermore DFA allows complex optimizations across branches and function boundaries which would not be possible with traditional straight-line optimizations. \
|
||
The high implementation complexity and the deep entangled with the compiler internals also poses a big problem for advancing this field further.
|
||
The recent release of new publications on this topic indicates that researchers are continuisly searching for better and faster ways to implement DFA and to make better use of the analysis results. \
|
||
The adaptability of LLVM and the associated immediate representation makes it an invaluable platform to do testing and research with DFA.
|
||
|
||
#pagebreak(weak: true)
|
||
#set heading(numbering: "A.a.a")
|
||
#counter(heading).update(0)
|
||
|
||
#set page(flipped: true, columns: 1)
|
||
= SLR Results
|
||
#{
|
||
set table(stroke: (x, _) => if x in (1, 4, 6) { (x: 2pt, y: 1pt) } else { 1pt })
|
||
table(
|
||
columns: (auto, auto, auto, auto, auto, auto, 6em, 4em, auto, auto),
|
||
inset: (x: 5pt, y: 3pt),
|
||
..csv("pubs.csv").flatten()
|
||
)
|
||
}
|
||
|
||
#set page(flipped: false, columns: 2)
|
||
#pagebreak(weak: true)
|
||
#set heading(numbering: none)
|
||
#bibliography("refs.bib", title: "References", style: "association-for-computing-machinery")
|
||
|
||
|