diff --git a/cc-by.svg b/cc-by.svg deleted file mode 100644 index e44c25f..0000000 --- a/cc-by.svg +++ /dev/null @@ -1,155 +0,0 @@ - - - - - - - - - image/svg+xml - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/main.typ b/main.typ index 76edf3b..be0902f 100644 --- a/main.typ +++ b/main.typ @@ -1,6 +1,9 @@ //#import "@preview/clean-acmart:0.0.1": acmart, acmart-ccs, acmart-keywords, acmart-ref, to-string #import "clean-acmart.typ": acmart #import "@preview/cetz:0.3.4" +#import "@preview/lilaq:0.3.0" as lq +#import "@preview/cetz:0.3.2" +#import "@preview/cetz-plot:0.1.1": chart as cetz_chart #let title = [Dataflow Analysis for Compiler Optimization] #let authors = ( @@ -38,14 +41,50 @@ #set heading(numbering: "1.1.1") = Abstract -// define DFA and CO here or in introduction -todo +Dataflow analysis is an important part of compiler optimization since it allows to eliminate parts of the code with various techniques such as: constant propagation, dead code elimination, branch elimination. This work aims to look at the advantages and disadvantages of using dataflow analysis, how it is already used in current compilers, on which programming languages / immediate representations it operates and what limitations still exist. \ +For this purpose we conducted a systematic literature in which we analyzed 15 publications selected from 571 entries. Finally following conclusions were drawn: // TODO + = Introduction -todo +Program performance remains a large concern in modern computing and programming, since it has a direct impact on user and developer experience. As software is becoming more complex, manual optimization is increasingly complex and harder for developers to implement. +Another problem with this increasing complexity is that large codebases are spread out over more files, which also makes it harder for developers to keep an overview and to implement optimizations. Because of these reasons automatic optimization is needed in compilers. \ +#figure( + caption: [C code and respective SSA form], + kind: "raw", + grid( + columns: (1fr, 1fr), + ```C + int x = 8; + x = x - 2; + if (x < 4) + x = 10; + else + x = 12; + int y = x * 2; + ```, + /*``` + x₁ = 8 + x₂ = x₁ - 2 + if (x₂ < 4) + x₃ = 10 + else + x₄ = 12 + x₅ = ɸ(x₃, x₄) + y₁ = x₅ * 2 + ```*/ + image("ssa-example.svg", height: 16em) + ) +) +Many modern compilers and analysis tools operate on a Static Single-Assignment (SSA) form @cooper_keith_d_engineering_nodate @cytron_efficiently_1991. The SSA form works by assigning each variable only once. This is done by creating multiple subvariable $x_1, x_2, ...$ for each variable $x$. After a branch in the program a #{sym.Phi}-Node is used to select the new value of the variable based on branch executed. +An example of the SSA form can be seen in @ssa_form_example. On the left is a simple C code in a function body and on right is the respective SSA form of the C code. \ +Dataflow analysis is a technique used to gather information about the state of variables throughout the flow of the program. It plays an important role in many compilers, since by analyzing how, where and what variables are assigned and how these variables are used, many complex optimizations, which require context from the surrounding code, can be implemented. \ +Dataflow analysis is an evolving field where regularly new techniques are created and older techniques improved. Different compilers and analysis framework implement different methods and optimizations with dataflow analysis. This work aims to summarrize the current state and past achievements of this technology. +This work is divided into the following sections in @methodology_c the methology used to create this work is described. // TODO +// TODO explain constant propagation, ... +// TODO LLVM, GCC as examples -= Methodology -This publication is created following the process described in @process_fig. The protocol for the review is divided up into the object of the research see @research_questions_s, the search strategy see @sas_s, the selection criteria see @selection_criteria_s and the data extraction strategy see @data_extraction_s. += Methodology +This work is created following the process described in @process_fig. The protocol for the review is divided up into the object of the research see @research_questions_s, the search strategy see @sas_s, the selection criteria see @selection_criteria_s and the data extraction strategy see @data_extraction_s. #place( bottom + center, scope: "parent", @@ -68,7 +107,6 @@ This goal has been defined in two research questions: This questions aims to identify how DFA is already used in current compilers, what optimizations are done with it and if it is used during normal compilation or if it has to be explicitly enabled. == Search and selection strategy -Our search strategy consisted of 4 steps as seen in @sas_fig. \ #figure( caption: [Search string used in electronic databases], kind: "raw", @@ -86,12 +124,14 @@ Our search strategy consisted of 4 steps as seen in @sas_fig. \ ``` ] ) +Our search strategy consisted of 5 steps as seen in @sas_fig. \ The papers from the first steps are collected from the electronic databases ACM Digital Library, IEEE Xplore, Springer Link with the search string seen in @sas_search_string. The search string in @sas_search_string was created using the research questions in @research_questions_s and was always applied to the full text of the papers. \ -In the second step all duplicates which where returned from multiple databases where removed from the results. \ +In the second step all duplicates which where returned from multiple databases where removed from the results and the amount was limited to fit the scope of this paper. \ In the third step the selection was filtered by applying all selection criteria from @selection_criteria_s. \ -In the fourth step I snowballed the previously acquired results. This was to find relevant papers which where not included because of either the search string or the search criteria. \ -Afterwards all papers of the snowballing where evaluated based on the data extraction items mentioned in @data_extraction_s. +In the fourth step we snowballed the previously acquired results. This was to find relevant papers which where not included because of either the search string or the search criteria. \ +Afterwards all papers found via the snowballing where filtered again by applying the selection criteria in @selection_criteria_s. \ +In the end all papers from the third step and the papers of the snowballing where evaluated based on the data extraction items mentioned in @data_extraction_s. #place( auto, scope: "parent", @@ -113,7 +153,8 @@ Afterwards all papers of the snowballing where evaluated based on the data extra rect((bs.at(0)+1.5, -(bs.at(1)+0.3)), (rel: bs), name: "dup") rect((bs.at(0)*2+2.25, -(bs.at(1)+0.3)), (rel: bs), name: "sel") rect((bs.at(0)*3+3, -(bs.at(1)+0.3)), (rel: bs), name: "snow") - rect((bs.at(0)*4+3.75, -(bs.at(1)+0.3)), (rel: bs), name: "inc") + rect((bs.at(0)*4+3.75, -(bs.at(1)+0.3)), (rel: bs), name: "reap") + rect((bs.at(0)*5+4.25, -(bs.at(1)+0.3)), (rel: bs), name: "inc") line("acm.east", (rel: (0.75, 0)), name: "dlu") line("ieee.east", (rel: (0.75, 0))) @@ -124,15 +165,17 @@ Afterwards all papers of the snowballing where evaluated based on the data extra line("dl.50%", "dup.west") line("dup.east", "sel.west") line("sel.east", "snow.west") - line("snow.east", "inc.west") + line("snow.east", "reap.west") + line("reap.east", "inc.west") - content("acm", align(center)[ACM Digital Library \ n = ]) - content("ieee", align(center)[IEEE Xplore \ n = ]) - content("springer", align(center)[Springer Link \ n = ]) - content("dup", align(center)[Duplicate removal \ n = ]) - content("sel", align(center)[Application of \ selection criteria \ n = ]) - content("snow", align(center)[Snowballing \ n = ]) - content("inc", align(center)[Publications included \ n = ]) + content("acm", align(center)[ACM Digital Library \ n = 3594]) + content("ieee", align(center)[IEEE Xplore \ n = 1720]) + content("springer", align(center)[Springer Link \ n = 786]) + content("dup", align(center)[Duplicate removal and \ preliminary filtering \ n = 471]) + content("sel", align(center)[Application of \ selection criteria \ n = 10]) + content("snow", align(center)[Snowballing \ n = 110]) + content("reap", align(center)[Rreapplication \ of selection criteria \ n = 15]) + content("inc", align(center)[Publications included \ n = 15]) }) ) ] @@ -156,15 +199,13 @@ _IC3_ is to further include publications which directly provide an implementatio #set enum(numbering: (.., i) => "EC" + str(i)) + Publications which discuss DFA in a non-compiler context. + Publications written in a language other than english. - + Secondary and tertiary publications (e.g., systematic literaturer reviews, surveys). + + Secondary and tertiary publications (e.g., systematic literature reviews, surveys). + Publications in the form of tutorial papers, short papers, poster papers, editorials. + Publications for which the full text is not available. - + Publications published before 2010. #v(10pt) ] _EC1_ is to exclude publications which talk about DFA in other contexts which are not relevant to compiler optimization. \ -_EC2-EC5_ are to exclude publications which do not provide enough information to include them in this publication. \ -_EC6_ is to make sure the publications are still relevant. +_EC2-EC5_ are to exclude publications which do not provide enough information to include them in this publication. == Data extraction Based on the research questions I collected 9 data items to exrtract from all included publications. @data_extraction_table lists all data items. \ @@ -205,15 +246,121 @@ All data items were extracted from the full text of all included publications. ] ) -#colbreak() += Findings +In this chapter we list our findings from the conducted systematic literature analysis. + +== Demographic +#v(1em, weak: true) +#figure( + caption: "Publication years of the publications", + { + let data = ( + (1973, 1), + (1997, 1), + (2010, 2), + (2011, 2), + (2012, 1), + (2013, 2), + (2015, 1), + (2018, 1), + (2019, 1), + (2020, 2), + (2024, 1) + ) + lq.diagram( + width: 8.5cm, + xlim: (1972, 2026), + ylim: (0, 2.5), + yaxis: (subticks: none, ticks: range(0, 3)), + xaxis: (ticks: range(1975, 2026, step: 5)), + lq.bar( + data.map(v => v.at(0)), + data.map(v => v.at(1)) + ) + ) + } +) +#figure( + caption: "Target languages of the publications", + { + let data = ( + ("None", 1), + ("Custom", 1), + ("C", 3), + ("LLVM IR", 5), + ("Java Bytecode", 2), + ("Graal IR", 1), + ("SSA of Java", 2) + ) + + cetz.canvas({ + //let colors = (red, eastern, green, blue, navy, purple, maroon, orange) + let colors = gradient.linear(..color.map.rainbow.map(v => v.darken(20%).saturate(20%))) + + cetz_chart.piechart( + data, + value-key: 1, + label-key: 0, + radius: 3, + slice-style: colors, + inner-radius: 0, + inner-label: (content: (value, _) => [#text(white, str(value))], radius: 150%), + outer-label: (content: (value, _) => [], radius: 0%), + legend: ( + position: "east", + anchor: "south", + orientation: ttb, + offset: (1.7cm, -2.5cm) + ) + ) + }) + } +) +As seen in @demographic_pub_year most of the analyzed publication are from the last 15 years, which indicates that this field is still actively being researched and explore, but research has already start back in 1983. \ +@demographic_target_lang shows a strong trend towards implementing DFA optimizations either with LLVM directly or by operating on the LLVM IR, while java is either directly used as bytecode or at SSA representation of Java. + +== RQ1: Advantages and disadvantages of using Dataflow analysis for compiler optimization +DFA makes many big compiler optimizations possible but it also brings many trade-offs and not just for performance. +These optimizations eliminate unused code and simplify expressions, which reduces execution time and memory footprint during runtime. +[*P1*] is one of the first publications talking about DFA and how it allows to use previously existing optimizations, which could only be applied on code sections without branches, with branching by checking how data flows through the branches. Later publications [*P2*, *P5*] +describe ways to apply these optimization interprocedurally and across thread synchronisation boundaries. But [*P5*] also describes, that programs must be well synchronized, otherwise DFA can not be used because of possible data races. \ +=== Analysis performance +While perofrmance is not the biggest concern for DFA, since it runs at compile-time and accuracy is more important [*P4*], many publications [*P4*, *P6*, *P14*, *P15*] have investigated how to improve the performance of DFA. This is done with several techniques: In [*P4*, *P6*] different function calls are run on different threads, but it has the problem of creating and queue a task for each function, which can lead to a big overhead. In [*P6*] independent branches are also run on separate threads. A big problem with both approaches is to avoid, that some functions could be queued for analysis be more than one thread, which leads to unnecessary redundancy. \ +Another approach [*P14*] is to pipeline the function calls. This is done by analyzing all variables, which do not depend on any function calls. When the function calls have finished being analyzed, the variables, which depend on that function call are analyzed. Thereby more parallel work is possible. +=== Implementation complexity +Another problem with DFA is the difficulty to implement optimizations with it [*P3*, *P11*]. DFA is often also deeply entangled with the compiler internals, which makes it difficult to reuse existing optimizations with other compilers or implement new optimizations quickly and it is complicated to implemented, as seen in LLVM: "simple peephole optimizations in the LLVM instcombine pass contain approximately 30000 lines of complex C++ code, despite the transformations being simple" [*P11*] \ +One solutions to this problem is described in [*P3*] by implementing a library in Haskell which performs the dataflow analysis and provides an interface, which "is made possible by sophisticated aspects of Haskell’s type system, such as higher-rank polymorphism, GADTs, and type functions" [*P3*], to implement various optimizations, which also then can be reused for other compilers. The biggest drawback of this library is it's limited to compilers implemented in Haskell. \ +[*P11*] describes a domain specific language to implement LLVM optimization passes. This is done by a having a simple language for directly implementing the logic of the optimization, while a custom transpiler then converts it into a LLVM pass written in C++. Since the LLVM pass is implemented in a more generic way to fit this purpose, it leads to a moderate compile time increase. There is no formal verification done on the implemented optimization pass. Because of these disadvantages it is a great tool to quickly implement, test and iterate optimizations, but for a more permanent passes, hand-written C++ code should be used. + +== RQ2: Usage of dataflow analysis in current compilers +The Glasgow Haskell Compiler (GHC), LLVM and GCC are good examples for compilers which already extensively use DFA to implement optimizations. +These optimizations include common sub-expression elimination [*P1*, *P7*, *P13*], copy propagation [*P5*, *P7*], constant propagation [*P1*, conditional branch elimination [*P2*] and dead code elimination [*P13*]. +// TODO + += Conclusion +Our findings show that DFA is already extensively used in current compilers and brings big advantages for runtime speed. The cost of this is a higher compilation duration, which makes it unsuitable for JIT compilation. Furthermore DFA allows complex optimizations across branches and function boundaries which would not be possible with traditional straight-line optimizations. \ +The high implementation complexity and the deep entangled with the compiler internals also poses a big problem for advancing this field further. +The recent release of new publications on this topic indicates that researchers are continuisly searching for better and faster ways to implement DFA and to make better use of the analysis results. \ +The adaptability of LLVM and the associated immediate representation makes it an invaluable platform to do testing and research with DFA. + +#pagebreak(weak: true) +#set heading(numbering: "A.a.a") +#counter(heading).update(0) + +#set page(flipped: true, columns: 1) += SLR Results +#{ + set table(stroke: (x, _) => if x in (1, 4, 6) { (x: 2pt, y: 1pt) } else { 1pt }) + table( + columns: (auto, auto, auto, auto, auto, auto, 6em, 4em, auto, auto), + inset: (x: 5pt, y: 3pt), + ..csv("pubs.csv").flatten() + ) +} + +#set page(flipped: false, columns: 2) +#pagebreak(weak: true) #set heading(numbering: none) #bibliography("refs.bib", title: "References", style: "association-for-computing-machinery") -/* -#colbreak(weak: true) -#set heading(numbering: "A.a.a") - -= Artifact Appendix -In this section we show how to reproduce our findings. -*/ diff --git a/pubs.csv b/pubs.csv new file mode 100644 index 0000000..1df21a8 --- /dev/null +++ b/pubs.csv @@ -0,0 +1,16 @@ +ID,D1,D2,D3,D4,D5,D6,D7,D8,D9 +P1,"Kildall, Gary A.",1973,A unified approach to global program optimization,Allows straight-line optimization techniques for branch structure,,General Techniques,None,"Constant Propagation, Common Subexpression Elimination, Register Optimization", +P2,Rastislav Bodik; Rajiv Gupta; Mary Lou Soffa,1997,Interprocedural conditional branch elimination,Reduction of instruction count,Exponential/Polynomial worst-case time complexity,ICC,C,"Conditional Branch Elimination, Elimination of correlated conditionals and operations", +P3,"Ramsey, Norman; Dias, João; Peyton Jones, Simon",2010,"Hoopl: a modular, reusable library for dataflow analysis and transformation",Reusable library for DFA,"DFA typically entangled with compiler, Algorithms complicated and hard to understand","Library, used by GHC",Custom,"Interleaved analysis and rewriting, speculative rewriting, computing fixed points, dynamic fault isolation",Only usable from Haskell +P4,"Edvinsson, Marcus; Lowe, Welf",2010,A multi-threaded approach for data-flow analysis,Accuracy more important than speed since at compile time,"Low usability for JIT because of time consumption, DFA is computation intense, DFA often implemented sequentially",Custom,"SSA, Java","High speed-up for analysis, for benchmarks without benefit max loss of 13% speed",Only speed-up of 1.78 on 8 cores +P5,"Joisha, Pramod G.; Schreiber, Robert S.; Banerjee, Prithviraj; Boehm, Hans J.; Chakrabarti, Dhruva R.",2011,A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code,,"Sequential transformations can not be applied to parallel code, Need to watch out for data races and data synchronization",GCC,C,"Bidirectional DFA across synchronizations in well-synchronized programs, Can reuse existing optimizations, Value numbering, Copy propagation", +P6,"Edvinsson, Marcus; Lundberg, Jonas; Löwe, Welf",2011,Parallel points-to analysis for multi-core machines,Points-To Analysis analyses whole program,SSA nodes in Points-to SSA are sequentially dependent,Custom,"SSA, Java",Parallel Points-to Analysis, +P7,"Tang, Xiaolong; Järvi, Jaakko",2012,Summary-based data-flow analysis that understands regular composite objects and iterators,,Hard to make assumptions about user-defined types,LLVM,LLVM IR,"Common Sub-expression elimination, Copy propagation, Equational reasoning", +P8,"Urban, Bernhard; Steinlechner, Harald",2013,Implementing a Java JIT compiler in Haskell: case study,,,Custom JIT,Java Bytecode,Liveness Analysis, +P9,"Duboscq, Gilles; Stadler, Lukas; Würthinger, Thomas; Simon, Doug; Wimmer, Christian; Mössenböck, Hanspeter",2013,Graal IR: An Extensible Declarative Intermediate Representation,Easier optimization implementation with Graph-IR,,"GraalVM, uses P3",Java Bytecode,IR which is simple to run optimizations on,Not implemented: commutative edges on nodes for better congruent detection +P10,"Zaidi, Ali Mustafa; Greaves, David",2015,Value State Flow Graph: A Dataflow Compiler IR for Accelerating Control-Intensive Code in Spatial Hardware,Performance improvement through execution of dataflow graph,,Custom LLVM Backend,LLVM IR,,"Structs, Multidimensional-Arrays not supported" +P11,"Ginsbach, Philip; Crawford, Lewis; O'Boyle, Michael F. P.",2018,CAnDL: a domain specific language for compiler analysis,DSL for optimization implementation makes implementation simpler and iterations quicker,"Optimizations are hard to implement in LLVM, Simple peephole optimization is 30000 LOC",DSL to LLVM Pass,LLVM IR,,"Moderate compile time increase, no formal verification" +P12,"Pathade, Komal; Khedker, Uday P.",2019,Path sensitive MFP solutions in presence of intersecting infeasible control flow path segments,,Path insensitive solutions overapproximate data flow values,TCS Embedded Code Analyzer,C,"Reaching Definition, Def-Use Pairs, Unitialized Variables, 300% precision increase",100% analysis time increase +P13,"Reissmann, Nico; Meyer, Jan Christian; Bahmann, Helge; Själander, Magnus",2020,RVSDG: An Intermediate Representation for Optimizing Compilers,,Structures like loops not encoded in SSA,Custom,LLVM IR,"Common Node Elimination, Dead Node Elimination", +P14,"Shi, Qingkai; Zhang, Charles",2020,Pipelining bottom-up data flow analysis,,Calling dependence limit parallelism of bottom-up DFA,Custom based on LLVM,LLVM IR,2x to 3x speedup by relaxing calling dependence,Inline assembly and c++ stl not modeled +P15,"Aigner, Christoph; Barany, Gergö; Mössenböck, Hanspeter",2024,Lazy Sparse Conditional Constant Propagation in the Sea of Nodes,,Detecting all compile time constant is undecidable problem,GraalVM,Sea of Nodes / Graal IR,Lazy iteration to reduce portion of necessary graph, diff --git a/refs.bib b/refs.bib index 5bb4e29..a1679a7 100644 --- a/refs.bib +++ b/refs.bib @@ -34,3 +34,28 @@ year = {2021}, pages = {469--503}, } + +@article{cytron_efficiently_1991, + title = {Efficiently computing static single assignment form and the control dependence graph}, + volume = {13}, + issn = {0164-0925, 1558-4593}, + url = {https://dl.acm.org/doi/10.1145/115372.115320}, + doi = {10.1145/115372.115320}, + language = {en}, + number = {4}, + urldate = {2025-05-31}, + journal = {ACM Transactions on Programming Languages and Systems}, + author = {Cytron, Ron and Ferrante, Jeanne and Rosen, Barry K. and Wegman, Mark N. and Zadeck, F. Kenneth}, + month = oct, + year = {1991}, + pages = {451--490}, +} + +@book{cooper_keith_d_engineering_nodate, + edition = {2nd edition}, + title = {Engineering a {Compiler}}, + isbn = {978-0-08-091661-3}, + language = {englisch}, + publisher = {Elsevier Science, 2011 Boston, MA : Safari}, + author = {{Cooper, Keith D.} and {Torczon, Linda}}, +} diff --git a/ssa-example.dot b/ssa-example.dot new file mode 100644 index 0000000..220759a --- /dev/null +++ b/ssa-example.dot @@ -0,0 +1,16 @@ +digraph SSA { + ranksep=0.3; + node[shape=box]; + + start[label="x₁ = 8\lx₂ = x₁ - 2\lbranch x₂ < 4\l"]; + + start -> b0 [label="0"]; + start -> b1 [label="1"]; + + b0[label="x₄ = 12"]; + b1[label="x₃ = 10"]; + + b0 -> end; + b1 -> end; + end[label="x₅ = ɸ(x₃, x₄)\ly₁ = x₅ * 2\l"]; +} diff --git a/ssa-example.svg b/ssa-example.svg new file mode 100644 index 0000000..33ad4f2 --- /dev/null +++ b/ssa-example.svg @@ -0,0 +1,66 @@ + + + + + + +SSA + + + +start + +x₁ = 8 +x₂ = x₁ - 2 +branch x₂ < 4 + + + +b0 + +x₄ = 12 + + + +start->b0 + + +0 + + + +b1 + +x₃ = 10 + + + +start->b1 + + +1 + + + +end + +x₅ = ɸ(x₃, x₄) +y₁ = x₅ * 2 + + + +b0->end + + + + + +b1->end + + + + +