1
0

Getting closer
All checks were successful
/ Build pdf (push) Successful in 36s

This commit is contained in:
Matthias Veigel 2025-07-10 20:02:45 +02:00
parent 4e5593b35d
commit 5ad3b5c0ae
Signed by: root
GPG Key ID: 2437494E09F13876

View File

@ -81,7 +81,7 @@
#set heading(numbering: "1.1.1") #set heading(numbering: "1.1.1")
// cSpell:enable // cSpell:enable
// TODO Small info what is LLVM, LowLevelVM + footnote link
= Abstract = Abstract
Dataflow analysis is an important part of compiler optimization since it allows to eliminate or rewrite parts of the code with various techniques such as: constant propagation, dead code elimination, branch elimination. This work aims to look at the advantages and disadvantages of using dataflow analysis, how it is already used in current compilers, on which programming languages or immediate representations it operates and what limitations still exist. \ Dataflow analysis is an important part of compiler optimization since it allows to eliminate or rewrite parts of the code with various techniques such as: constant propagation, dead code elimination, branch elimination. This work aims to look at the advantages and disadvantages of using dataflow analysis, how it is already used in current compilers, on which programming languages or immediate representations it operates and what limitations still exist. \
For this purpose we conducted a systematic literature in which we analyze 15 publications selected from 571 entries. Finally, following conclusions were drawn: dataflow analysis is used in many of todays popular compilers and the field is actively being researched. The advantages of dataflow analysis are huge for performance gain, but its implementations are complex and you need to be careful that the implementation does not change the program in an unwanted way. For this purpose we conducted a systematic literature in which we analyze 15 publications selected from 571 entries. Finally, following conclusions were drawn: dataflow analysis is used in many of todays popular compilers and the field is actively being researched. The advantages of dataflow analysis are huge for performance gain, but its implementations are complex and you need to be careful that the implementation does not change the program in an unwanted way.
@ -89,13 +89,13 @@ For this purpose we conducted a systematic literature in which we analyze 15 pub
= Introduction = Introduction
Program performance remains a large concern in modern computing and programming, since it has a direct impact on user and developer experience. As software is becoming more complex, manual optimization is increasingly complex and harder for developers to implement. Program performance remains a large concern in modern computing and programming, since it has a direct impact on user and developer experience. As software is becoming more complex, manual optimization is increasingly complex and harder for developers to implement.
Another problem with this increasing complexity is that large codebases are spread out over more files, which also makes it harder for developers to keep an overview and to implement optimizations. Because of these reasons automatic optimization is needed in compilers. \ Another problem with this increasing complexity is that large codebases are spread out over more files, which also makes it harder for developers to keep an overview and to implement optimizations. Because of these reasons automatic optimization is needed in compilers. \
Dataflow analysis is a technique used to gather information about the state of variables throughout the flow of the program. It plays an important role in many compilers, since by analyzing how, where and what variables are assigned and how these variables are used, many complex optimizations, which require context from the surrounding code, can be implemented. \ The technique dataflow analysis is used to gather information about the state of variables throughout the flow of the program. It plays an important role in many compilers, since by analyzing how, where and what variables are assigned and how these variables are used, many complex optimizations, which require context from the surrounding code, can be implemented. \
Dataflow analysis is a well-established field where regularly new techniques are created and older techniques improved. Different compilers and analysis framework implement different methods and optimizations with dataflow analysis. This work aims to summarize the current state and past achievements of this technology. \ Dataflow analysis is a well-established field where regularly new techniques are created and older techniques improved. Different compilers and analysis framework implement different methods and optimizations with dataflow analysis. This work aims to summarize the current state and past achievements of this technology. \
While this paper talks about dataflow analysis in the context of compiler optimization, these techniques can also be used to create either more detailed or previously not possible compilation warnings and errors. For example: with dataflow analysis writes to an invalid memory location or usage of a not initialized object or variable can be detected at compile time, which leads to a better coding experience and less crashes at runtime. Examples for this are the Clang Static Analyzer #footnote[https://clang.llvm.org/docs/ClangStaticAnalyzer.html] and the static analysis options of GCC #footnote[https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html]. \
This work is divided into the following sections: in @background_c the background required to understand this work is given, in @methodology_c the methodology used to create this work is described, in @findings_c the contents of the papers are analyzed and evaluated, in @conclusion_c the results from this work are summarized. This work is divided into the following sections: in @background_c the background required to understand this work is given, in @methodology_c the methodology used to create this work is described, in @findings_c the contents of the papers are analyzed and evaluated, in @conclusion_c the results from this work are summarized.
= Background <background_c> = Background <background_c>
== Static Single Assignment form (SSA / SSA form) == Control flow graph
#figure( // ssa_form_example #figure( // ssa_form_example
caption: [C code and respective SSA in control flow graph form, adapted from Fig. 1 in the work of Reissmann, Meyer and Soffa @y-reissmann_rvsdg_2020], caption: [C code and respective SSA in control flow graph form, adapted from Fig. 1 in the work of Reissmann, Meyer and Soffa @y-reissmann_rvsdg_2020],
kind: "raw", kind: "raw",
@ -113,6 +113,8 @@ This work is divided into the following sections: in @background_c the backgroun
image("ssa-example.svg", height: 16em) image("ssa-example.svg", height: 16em)
) )
) <ssa_form_example> ) <ssa_form_example>
A control flow graph is a directed graph consisting of blocks of code as nodes and edges between these blocks for the program execution flow. The right part of @ssa_form_example show a small example of a control flow graph. The nodes always contain sequential code with a statement at the end that changes the control flow. In the example this is either the `branch`, which branches the flow based on a condition, or the hidden `goto` at the end of both blocks for `x₃` and `x₄`, which just unconditionally jump to another block of code. Edges after a branching statement also have a label to indicate for what condition this branch is used.
== Static Single Assignment form (SSA / SSA form)
Many modern compilers and analysis tools operate on a Static Single-Assignment (SSA) form @x-cooper_keith_d_engineering_2011 @x-cytron_efficiently_1991. The SSA form works by assigning each variable only once. This is done by creating multiple sub-variables $x_1, x_2, ...$ for each variable $x$. After a branch in the program a #{sym.Phi}-Node is used to select the new value of the variable based on branch executed. Many modern compilers and analysis tools operate on a Static Single-Assignment (SSA) form @x-cooper_keith_d_engineering_2011 @x-cytron_efficiently_1991. The SSA form works by assigning each variable only once. This is done by creating multiple sub-variables $x_1, x_2, ...$ for each variable $x$. After a branch in the program a #{sym.Phi}-Node is used to select the new value of the variable based on branch executed.
An example of the SSA form can be seen in @ssa_form_example. On the left is a simple C code in a function body and on right is the respective SSA form of the C code. The immediate representation of LLVM is closely modeled after the SSA form. An example of the SSA form can be seen in @ssa_form_example. On the left is a simple C code in a function body and on right is the respective SSA form of the C code. The immediate representation of LLVM is closely modeled after the SSA form.
== Dataflow analysis (DFA) == Dataflow analysis (DFA)
@ -137,11 +139,11 @@ The facts which the algorithm knows about variable either must be true or may be
Points-to analysis is for handling DFA with pointers and references. Specifically, it show if one variable can point to another variable during the execution of the program. Points-to analysis has multiple levels of precision. \ Points-to analysis is for handling DFA with pointers and references. Specifically, it show if one variable can point to another variable during the execution of the program. Points-to analysis has multiple levels of precision. \
One of the most important aspects for precision is context-sensitivity. If you have a function `void* id(void* p)` which just returns the pointer `p` you give it, in context-insensitive points-to analysis you would get that every pointer which is supplied as argument, could be return as results, while with context-sensitive analysis you would only get the pointer which you supplied. As example: with the code `void* a, b; id(a); id(b);` you would get `id(a), id(b) ∈ {a, b}` because the analysis could not differentiate between those calls, while with context-sensitive analysis it would be `id(a) == a` and `id(b) == b`. \ One of the most important aspects for precision is context-sensitivity. If you have a function `void* id(void* p)` which just returns the pointer `p` you give it, in context-insensitive points-to analysis you would get that every pointer which is supplied as argument, could be return as results, while with context-sensitive analysis you would only get the pointer which you supplied. As example: with the code `void* a, b; id(a); id(b);` you would get `id(a), id(b) ∈ {a, b}` because the analysis could not differentiate between those calls, while with context-sensitive analysis it would be `id(a) == a` and `id(b) == b`. \
There many design choices, which impact the performance and the precision, that can be made when implementing points-to analysis: There many design choices, which impact the performance and the precision, that can be made when implementing points-to analysis:
Subset-based analysis where each pointer has a set of variables to which it can point. When pointer `a` is assigned to pointer `b` (`b = a;`) the variables which `a` points to must be a subset of `b` (`b ⊇ a`). Later these sets can be merged for faster analysis, but this leads to information loss. Subset-based analysis where each pointer has a set of variables to which it can point. When pointer `a` is assigned to pointer `b` (`b = a;`) the variables which `a` points to must be a subset of `b` (`b ⊇ a`). Later these sets can be merged for faster analysis, but this leads to information loss. \
A more precise variation is equivalence-based points-to analysis. This works by having a separate set for each pointer and copying these sets when assigning pointers to other pointers. Because the analysis needs to keep a set for every pointer, it is much slower and requires more memory during analysis. A more precise variation is equivalence-based points-to analysis. This works by having a separate set for each pointer and copying these sets when assigning pointers to other pointers. Because the analysis needs to keep a set for every pointer, it is much slower and requires more memory during analysis. \
A even more precise method and most relevant to this paper is flow-sensitive analysis. By analyzing the control flow it is possible to precisely define to which variable a pointer points to at a certain time in the code and to make optimizations based on that. Then drawback of this is the bad performance of the analysis and the complicated implementation. \ A even more precise method and most relevant to this paper is flow-sensitive analysis. By analyzing the control flow it is possible to precisely define to which variable a pointer points to at a certain time in the code and to make optimizations based on that. Then drawback of this is the bad performance of the analysis and the complicated implementation. \
While subset-based and equivalence-based analysis is enough for simple optimizations and simple compile time checks, for safety critical applications and complex optimizations it is necessary to use context and flow sensitive algorithms. It is also necessary to make this choice based on the size of the analyzed codebase and how long the compile time should be. Field-sensitivity treats every field of an object as a separate object instead of just storing the entire object which gets pointed at. This allows for more detailed analysis on which fields actually get accessed and modified, but it comes with a big performance overhead. A similar options is Array-sensitivity, which models each entry of an array as a separate object instead of just using the whole array when something in it is referenced. \
// TODO summary-based While subset-based and equivalence-based analysis is enough for simple optimizations and simple compile time checks, for safety critical applications and complex optimizations it is necessary to use context and flow sensitive algorithms. It is also necessary to make this choice based on the size of the analyzed codebase and how long the compile time should be. \
=== Constant folding and propagation @x-optimizing_compiler_wikipedia === Constant folding and propagation @x-optimizing_compiler_wikipedia
An example based on @ssa_form_example would be the compiler calculating $x_1$ to be $8$. This is called constant folding and done by replacing all calculations which are possible at compile time with their result. Constant propagation then replaces the $x_1$ in the calculation of $x_2$ with its value. When constant folding is the applied again $x_2$ would be $6$. An example based on @ssa_form_example would be the compiler calculating $x_1$ to be $8$. This is called constant folding and done by replacing all calculations which are possible at compile time with their result. Constant propagation then replaces the $x_1$ in the calculation of $x_2$ with its value. When constant folding is the applied again $x_2$ would be $6$.
=== Conditional branch elimination === Conditional branch elimination
@ -388,6 +390,7 @@ As seen in @demographic_pub_year most of the analyzed publication are from the l
} }
) <demographic_target_lang> ) <demographic_target_lang>
@demographic_target_lang shows a 33% trend towards implementing DFA optimizations either with LLVM directly or by operating on the LLVM IR, while Java is either directly used as bytecode or as SSA representation of Java. This shows that LLVM is a good platform for implementing optimizations and that it has a lower barrier of entry for developing optimizations. @demographic_target_lang shows a 33% trend towards implementing DFA optimizations either with LLVM directly or by operating on the LLVM IR, while Java is either directly used as bytecode or as SSA representation of Java. This shows that LLVM is a good platform for implementing optimizations and that it has a lower barrier of entry for developing optimizations.
// TODO mention which pubs are in each category
=== Research focus === Research focus
#figure( // demographic_research_focus #figure( // demographic_research_focus
caption: "Research focus of the publications", caption: "Research focus of the publications",
@ -446,6 +449,13 @@ Since inlining is required to perform rewrites, it can lead to bloating the exec
== RQ2: Usage of dataflow analysis in current compilers == RQ2: Usage of dataflow analysis in current compilers
The Glasgow Haskell Compiler (GHC), LLVM, and GCC are good examples for compilers which already extensively use DFA to implement optimizations. The optimizations implemented by the analyzed papers are described in the following sections. The Glasgow Haskell Compiler (GHC), LLVM, and GCC are good examples for compilers which already extensively use DFA to implement optimizations. The optimizations implemented by the analyzed papers are described in the following sections.
These optimizations include common sub-expression elimination #cgy[@y-kildall_unified_1973 @y-tang_summary-based_2012 @y-reissmann_rvsdg_2020], copy propagation #cgy[@y-joisha_technique_2011 @y-tang_summary-based_2012], constant propagation @y-kildall_unified_1973, conditional branch elimination @y-rastislav_bodik_interprocedural_1997 and dead code elimination @y-reissmann_rvsdg_2020. These optimizations include common sub-expression elimination #cgy[@y-kildall_unified_1973 @y-tang_summary-based_2012 @y-reissmann_rvsdg_2020], copy propagation #cgy[@y-joisha_technique_2011 @y-tang_summary-based_2012], constant propagation @y-kildall_unified_1973, conditional branch elimination @y-rastislav_bodik_interprocedural_1997 and dead code elimination @y-reissmann_rvsdg_2020.
=== Summary-based analysis
The work by Tang and Järvi @y-tang_summary-based_2012 describes how to implement summary-based analysis and how to use it for user defined types and objects. Summary-based analysis can be used to keep the sensitivity and most of the information from the analysis but still save time while analyzing the code. It is described in the work by Tang and Järvi @y-tang_summary-based_2012 to commonly consist of two steps. \
The first step is to traverse the call graph from the bottom-up (start with the procedures which depend on no other procedures, then go through procedures where only already analyzed procedures are called) and compute the side effects and points-to relations of each procedure. This first steps computes everything without the calling context, so it results only in the side effects and relations which can happen independent of where the functions are called. \
The second step then performs a top-down analysis (start with the normal entry points of the program and then go through the procedures as they would be called in the program) where the actual call arguments are passed to the procedures. The exact side effects and points-to relations are calculated in this step. \
The results of this analysis are then stored as a tuple. The first entry of the tuple are the points-to relations dependant an the objects accessible in the procedure and the pointers accessible in the procedure. The second entry are the side-effect, specifically reads and modifications of the objects in the procedure. \
While the work by Tang and Järvi @y-tang_summary-based_2012 does not directly implement any optimizations based on their analysis, they show that the approach leads in almost all cases to a more concrete and smaller result for the points-to analysis and the side effects analysis. Because of this other optimization are able to run quicker with almost the same accuracy.
// TODO explain 3.1 composite objects
=== Copy propagation === Copy propagation
Copy propagation is implemented in the work of Joisha, Schreiber, Banerjee, Boehm and Chakrabarti @y-joisha_technique_2011 with focus on making it possible to apply in multi-threaded environments. Copy propagation is implemented in the work of Joisha, Schreiber, Banerjee, Boehm and Chakrabarti @y-joisha_technique_2011 with focus on making it possible to apply in multi-threaded environments.
It is implemented based on a procedural concurrency graph which is build from the source code. The nodes are all procedures which could run in the program. The edges between the nodes represent a MHP (may-happen-in-parallel) relation (@y-joisha_technique_2011, p. 627), which is a possible overlap of execution of both nodes. The function $I((p_1, p_2))$ lists the variables which the procedures $p_1$ and $p_2$ interfere. Interference in this context is a read and a write in overlapping (parallel) regions of the procedures. As long as there is no interfere between two function on a variable or the corresponding lock for a variable is held, it is possible to do copy propagation for the variable. It is implemented based on a procedural concurrency graph which is build from the source code. The nodes are all procedures which could run in the program. The edges between the nodes represent a MHP (may-happen-in-parallel) relation (@y-joisha_technique_2011, p. 627), which is a possible overlap of execution of both nodes. The function $I((p_1, p_2))$ lists the variables which the procedures $p_1$ and $p_2$ interfere. Interference in this context is a read and a write in overlapping (parallel) regions of the procedures. As long as there is no interfere between two function on a variable or the corresponding lock for a variable is held, it is possible to do copy propagation for the variable.
@ -483,23 +493,6 @@ It is implemented based on a procedural concurrency graph which is build from th
This technique can be explained based on @copy_prop_rq2_example. In thread $t_1$ there are two opportunities for applying copy propagation. The first is the variable `a` on line 1 can be propagated to the `print` in line 4, since no writes happen in this thread for the global variable `X`. The second is the variable `b` since access to the global variable `Y` is locked behind the mutex `my`. In thread $t_2$ copy propagation can not be performed, since the variable `a` reads from the global variable `Y` and it is not protected by locking the mutex `my`. This could result in `Y` being a different value on line 3 and line 5, because it is also written in $t_1$. This technique can be explained based on @copy_prop_rq2_example. In thread $t_1$ there are two opportunities for applying copy propagation. The first is the variable `a` on line 1 can be propagated to the `print` in line 4, since no writes happen in this thread for the global variable `X`. The second is the variable `b` since access to the global variable `Y` is locked behind the mutex `my`. In thread $t_2$ copy propagation can not be performed, since the variable `a` reads from the global variable `Y` and it is not protected by locking the mutex `my`. This could result in `Y` being a different value on line 3 and line 5, because it is also written in $t_1$.
// TODO rewrite // TODO rewrite
/*
= y-tang_summary-based_2012
based on equal-reasoning
summary-based:
uses approximations for most things
based on two steps
1
points-to
side effects
make summary for procedure from above
runs as bottom-up cg traversal
2
propagates actual arguments
computes final points-to and side effects
runs as top-down cg traversal
*/
// TODO mention DFA also used for compiler warnings/errors
= Conclusion <conclusion_c> = Conclusion <conclusion_c>
Our findings show that DFA is already extensively used in current compilers and brings big advantages for runtime speed. The cost of this is a higher compilation duration, which makes it unsuitable for JIT compilation. Furthermore, DFA allows complex optimizations across branches and function boundaries which would not be possible with traditional straight-line optimizations. \ Our findings show that DFA is already extensively used in current compilers and brings big advantages for runtime speed. The cost of this is a higher compilation duration, which makes it unsuitable for JIT compilation. Furthermore, DFA allows complex optimizations across branches and function boundaries which would not be possible with traditional straight-line optimizations. \