upload android base code part3
This commit is contained in:
parent
71b83c22f1
commit
b9e30e05b1
15122 changed files with 2089659 additions and 0 deletions
548
android/build/kati/INTERNALS.md
Normal file
548
android/build/kati/INTERNALS.md
Normal file
|
@ -0,0 +1,548 @@
|
|||
Kati internals
|
||||
==============
|
||||
|
||||
This is an informal document about internals of kati. This document is not meant
|
||||
to be a comprehensive document of kati or GNU make. This explains some random
|
||||
topics which other programmers may be interested in.
|
||||
|
||||
Motivation
|
||||
----------
|
||||
|
||||
The motivation of kati was to speed up Android platform build. Especially, its
|
||||
incremental build time was the main focus. Android platform's build system is a
|
||||
very unique system. It provides a DSL, (ab)using Turing-completeness of GNU
|
||||
make. The DSL allows developers to write build rules in a descriptive way, but
|
||||
the downside is it's complicated and slow.
|
||||
|
||||
When we say a build system is slow, we consider "null build" and "full
|
||||
build". Null build is a build which does nothing, because all output files are
|
||||
already up-to-date. Full build is a build which builds everything, because there
|
||||
were nothing which have been already built. Actual builds in daily development
|
||||
are somewhere between null build and full build. Most benchmarks below were done
|
||||
for null build.
|
||||
|
||||
For Android with my fairly beefy workstation, null build took ~100 secs with GNU
|
||||
make. This means you needed to wait ~100 secs to see if there's a compile error
|
||||
when you changed a single C file. To be fair, things were not that bad. There
|
||||
are tools called mm/mmm. They allow developers to build an individual module. As
|
||||
they ignore dependencies between modules, they are fast. However, you need to be
|
||||
somewhat experienced to use them properly. You should know which modules will be
|
||||
affected by your change. It would be nicer if you can just type "make" whenever
|
||||
you change something.
|
||||
|
||||
This is why we started this project. We decided to create a GNU make clone from
|
||||
scratch, but there were some other options. One option was to replace all
|
||||
Android.mk by files with a better format. There is actually a longer-term
|
||||
project for this. Kati was planned to be a short-term project. Another option
|
||||
was to hack GNU make instead of developing a clone. We didn't take this option
|
||||
because we thought the source code of GNU make is somewhat complicated due to
|
||||
historical reason. It's written in old-style C, has a lot of ifdefs for some
|
||||
unknown architectures, etc.
|
||||
|
||||
Currently, kati's main mode is --ninja mode. Instead of executing build commands
|
||||
by itself, kati generates build.ninja file and
|
||||
[ninja](https://github.com/martine/ninja) actually runs commands. There were
|
||||
some back-and-forths before kati became the current form. Some experiments
|
||||
succeeded and some others failed. We even changed the language for kati. At
|
||||
first, we wrote kati in Go. We naively expected we can get enough performance
|
||||
with Go. I guessed at least one of the following statements are true: 1. GNU
|
||||
make is not very optimized for computation heavy Makefiles, 2. Go is fast for
|
||||
our purpose, or 3. we can come up with some optimization tricks for Android's
|
||||
build system. As for 3, some of such optimization succeeded but it's performance
|
||||
gain didn't cancel the slowness of Go.
|
||||
|
||||
Go's performance would be somewhat interesting topic. I didn't study the
|
||||
performance difference in detail, but it seemed both our use of Go and Go
|
||||
language itself were making the Go version of kati slower. As for our fault, I
|
||||
think Go version has more unnecessary string allocations than C++ version
|
||||
has. As for Go itself, it seemed GC was the main show-stopper. For example,
|
||||
Android's build system defines about one million make variables, and buffers for
|
||||
them will be never freed. IIRC, this kind of allocation pattern isn't good for
|
||||
non-generational GC.
|
||||
|
||||
Go version and test cases were written by ukai and me, and C++ rewrite was done
|
||||
mostly by me. The rest of this document is mostly about the C++ version.
|
||||
|
||||
Overall architecture
|
||||
--------------------
|
||||
|
||||
Kati consists of the following components:
|
||||
|
||||
* Parser
|
||||
* Evaluator
|
||||
* Dependency builder
|
||||
* Executor
|
||||
* Ninja generator
|
||||
|
||||
A Makefile has some statements which consist of zero or more expressions. There
|
||||
are two parsers and two evaluators - one for statements and the other for
|
||||
expressions.
|
||||
|
||||
Most of users of GNU make may not care about the evaluator much. However, GNU
|
||||
make's evaluator is very powerful and is Turing-complete. For Android's null
|
||||
build, most time is spent in this phase. Other tasks, such as building
|
||||
dependency graphs and calling stat function for build targets, are not the
|
||||
bottleneck. This would be a very Android specific characteristics. Android's
|
||||
build system uses a lot of GNU make black magics.
|
||||
|
||||
The evaluator outputs a list of build rules and a variable table. The dependency
|
||||
builder creates a dependency graph from the list of build rules. Note this step
|
||||
doesn't use the variable table.
|
||||
|
||||
Then either executor or ninja generator will be used. Either way, kati runs its
|
||||
evaluator again for command lines. The variable table is used again for this
|
||||
step.
|
||||
|
||||
We'll look at each components closely. GNU make is a somewhat different language
|
||||
from modern languages. Let's see.
|
||||
|
||||
Parser for statements
|
||||
---------------------
|
||||
|
||||
I'm not 100% sure, but I think GNU make parses and evaluates Makefiles
|
||||
simultaneously, but kati has two phases for parsing and evaluation. The reason
|
||||
of this design is for performance. For Android build, kati (or GNU make) needs
|
||||
to read ~3k files ~50k times. The file which is read most often is read ~5k
|
||||
times. It's waste of time to parse such files again and again. Kati can re-use
|
||||
parsed results when it needs to evaluate a Makefile second time. If we stop
|
||||
caching the parsed results, kati will be two times slower for Android's
|
||||
build. Caching parsed statements is done in *file_cache.cc*.
|
||||
|
||||
The statement parser is defined in *parser.cc*. In kati, there are four kinds of
|
||||
statements:
|
||||
|
||||
* Rules
|
||||
* Assignments
|
||||
* Commands
|
||||
* Make directives
|
||||
|
||||
Data structures for them are defined in *stmt.h*. Here are examples of these
|
||||
statements:
|
||||
|
||||
VAR := yay! # An assignment
|
||||
all: # A rule
|
||||
echo $(VAR) # A command
|
||||
include xxx.mk # A make directive (include)
|
||||
|
||||
In addition to include directive, there are ifeq/ifneq/ifdef/ifndef directives
|
||||
and export/unexport directives. Also, kati internally uses "parse error
|
||||
statement". As GNU make doesn't show parse errors in branches which are not
|
||||
taken, we need to delay parse errors to evaluation time.
|
||||
|
||||
### Context dependent parser
|
||||
|
||||
A tricky point of parsing make statements is that the parsing depends on the
|
||||
context of the evaluation. See the following Makefile chunk for example:
|
||||
|
||||
$(VAR)
|
||||
X=hoge echo $${X}
|
||||
|
||||
You cannot tell whether the second line is a command or an assignment until
|
||||
*$(VAR)* is evaluated. If *$(VAR)* is a rule statement, the second line is a
|
||||
command and otherwise it's an assignment. If the previous line is
|
||||
|
||||
VAR := target:
|
||||
|
||||
the second line will turn out to be a command.
|
||||
|
||||
For some reason, GNU make expands expressions before it decides the type of
|
||||
a statement only for rules. Storing assignments or directives in a variable
|
||||
won't work as assignments or directives. For example
|
||||
|
||||
ASSIGN := A=B
|
||||
$(ASSIGN):
|
||||
|
||||
doesn't assign "*B:*" to *A*, but defines a build rule whose target is *A=B*.
|
||||
|
||||
Anyway, as a line starts with a tab character can be either a command statement
|
||||
or other statements depending on the evaluation result of the previous line,
|
||||
sometimes kati's parser cannot tell the statement type of a line. In this case,
|
||||
kati's parser speculatively creates a command statement object, keeping the
|
||||
original line. If it turns out the line is actually not a command statement,
|
||||
the evaluator re-runs the parser.
|
||||
|
||||
### Line concatenations and comments
|
||||
|
||||
In most programming languages, line concatenations by a backslash character and
|
||||
comments are handled at a very early stage of a language
|
||||
implementation. However, GNU make changes the behavior for them depending on
|
||||
parse/eval context. For example, the following Makefile outputs "has space" and
|
||||
"hasnospace":
|
||||
|
||||
VAR := has\
|
||||
space
|
||||
all:
|
||||
echo $(VAR)
|
||||
echo has\
|
||||
nospace
|
||||
|
||||
GNU make usually inserts a whitespace between lines, but for command lines it
|
||||
doesn't. As we've seen in the previous subsection, sometimes kati cannot tell
|
||||
a line is a command statement or not. This means we should handle them after
|
||||
evaluating statements. Similar discussion applies for comments. GNU make usually
|
||||
trims characters after '#', but it does nothing for '#' in command lines.
|
||||
|
||||
We have a bunch of comment/backslash related testcases in the testcase directory
|
||||
of kati's repository.
|
||||
|
||||
Parser for expressions
|
||||
----------------------
|
||||
|
||||
A statement may have one or more expressions. The number of expressions in a
|
||||
statement depends on the statement's type. For example,
|
||||
|
||||
A := $(X)
|
||||
|
||||
This is an assignment statement, which has two expressions - *A* and
|
||||
*$(X)*. Types of expressions and their parser are defined in *expr.cc*. Like
|
||||
other programming languages, an expression is a tree of expressions. The type of
|
||||
a leaf expression is either literal, variable reference,
|
||||
[substitution references](http://www.gnu.org/software/make/manual/make.html#Substitution-Refs),
|
||||
or make functions.
|
||||
|
||||
As written, backslashes and comments change their behavior depending on the
|
||||
context. Kati handles them in this phase. *ParseExprOpt* is the enum for the
|
||||
contexts.
|
||||
|
||||
As a nature of old systems, GNU make is very permissive. For some reason, it
|
||||
allows some kind of unmatched pairs of parentheses. For example, GNU make
|
||||
doesn't think *$($(foo)* is an error - this is a reference to variable
|
||||
*$(foo*. If you have some experiences with parsers, you may wonder how one can
|
||||
implement a parser which allows such expressions. It seems GNU make
|
||||
intentionally allows this:
|
||||
|
||||
http://git.savannah.gnu.org/cgit/make.git/tree/expand.c#n285
|
||||
|
||||
No one won't use this feature intentionally. However, as GNU make allows this,
|
||||
some Makefiles have unmatched parentheses, so kati shouldn't raise an error for
|
||||
them, unfortunately.
|
||||
|
||||
GNU make has a bunch of functions. Most users would use only simple ones such as
|
||||
*$(wildcard ...)* and *$(subst ...)*. There are also more complex functions such
|
||||
as *$(if ...)* and *$(call ...)*, which make GNU make Turing-complete. Make
|
||||
functions are defined in *func.cc*. Though *func.cc* is not short, the
|
||||
implementation is fairly simple. There is only one weirdness I remember around
|
||||
functions. GNU make slightly changes its parsing for *$(if ...)*, *$(and ...)*,
|
||||
and *$(or ...)*. See *trim_space* and *trim_right_space_1st* in *func.h* and how
|
||||
they are used in *expr.cc*.
|
||||
|
||||
Evaluator for statements
|
||||
------------------------
|
||||
|
||||
Evaluator for statements are defined in *eval.cc*. As written, there are four
|
||||
kinds of statements:
|
||||
|
||||
* Rules
|
||||
* Assignments
|
||||
* Commands
|
||||
* Make directives
|
||||
|
||||
There is nothing tricky around commands and make directives. A rule statement
|
||||
have some forms and should be parsed after evaluating expression by the third
|
||||
parser. This will be discussed in the next section.
|
||||
|
||||
Assignments in GNU make is tricky a bit. There are two kinds of variables in GNU
|
||||
make - simple variables and recursive variables. See the following code snippet:
|
||||
|
||||
A = $(info world!) # recursive
|
||||
B := $(info Hello,) # simple
|
||||
$(A)
|
||||
$(B)
|
||||
|
||||
This code outputs "Hello," and "world!", in this order. The evaluation of
|
||||
a recursive variable is delayed until the variable is referenced. So the first
|
||||
line, which is an assignment of a recursive variable, outputs nothing. The
|
||||
content of the variable *$(A)* will be *$(info world!)* after the first
|
||||
line. The assignment in the second line uses *:=* which means this is a simple
|
||||
variable assignment. For simple variables, the right hand side is evaluated
|
||||
immediately. So "Hello," will be output and the value of *$(B)* will be an empty
|
||||
string ($(info ...) returns an empty string). Then, "world!" will be shown when
|
||||
the third line is evaluated as *$(A)* is evaluated, and lastly the forth line
|
||||
does nothing, as *$(B)* is an empty string.
|
||||
|
||||
There are two more kinds of assignments (i.e., *+=* and *?=*). These assignments
|
||||
keep the type of the original variable. Evaluation of them will be done
|
||||
immediately only when the left hand side of the assignment is already defined
|
||||
and is a simple variable.
|
||||
|
||||
Parser for rules
|
||||
----------------
|
||||
|
||||
After evaluating a rule statement, kati needs to parse the evaluated result. A
|
||||
rule statement can actually be the following four things:
|
||||
|
||||
* A rule
|
||||
* A [target specific variable](http://www.gnu.org/software/make/manual/make.html#Target_002dspecific)
|
||||
* An empty line
|
||||
* An error (there're non-whitespace characters without a colon)
|
||||
|
||||
Parsing them is mostly done in *rule.cc*.
|
||||
|
||||
### Rules
|
||||
|
||||
A rule is something like *all: hello.exe*. You should be familiar with it. There
|
||||
are several kinds of rules such as pattern rules, double colon rules, and order
|
||||
only dependencies, but they don't complicate the rule parser.
|
||||
|
||||
A feature which complicates the parser is semicolon. You can write the first
|
||||
build command on the same line as the rule. For example,
|
||||
|
||||
target:
|
||||
echo hi!
|
||||
|
||||
and
|
||||
|
||||
target: ; echo hi!
|
||||
|
||||
have the same meaning. This is tricky because kati shouldn't evaluate expressions
|
||||
in a command until the command is actually invoked. As a semicolon can appear as
|
||||
the result of expression evaluation, there are some corner cases. A tricky
|
||||
example:
|
||||
|
||||
all: $(info foo) ; $(info bar)
|
||||
$(info baz)
|
||||
|
||||
should output *foo*, *baz*, and then *bar*, in this order, but
|
||||
|
||||
VAR := all: $(info foo) ; $(info bar)
|
||||
$(VAR)
|
||||
$(info baz)
|
||||
|
||||
outputs *foo*, *bar*, and then *baz*.
|
||||
|
||||
Again, for the command line after a semicolon, kati should also change how
|
||||
backslashes and comments are handled.
|
||||
|
||||
target: has\
|
||||
space ; echo no\
|
||||
space
|
||||
|
||||
The above example says *target* depends on two targets, *has* and *space*, and
|
||||
to build *target*, *echo nospace* should be executed.
|
||||
|
||||
### Target specific variables
|
||||
|
||||
You may not familiar with target specific variables. This feature allows you to
|
||||
define variable which can be referenced only from commands in a specified
|
||||
target. See the following code:
|
||||
|
||||
VAR := X
|
||||
target1: VAR := Y
|
||||
target1:
|
||||
echo $(VAR)
|
||||
target2:
|
||||
echo $(VAR)
|
||||
|
||||
In this example, *target1* shows *Y* and *target2* shows *X*. I think this
|
||||
feature is somewhat similar to namespaces in other programming languages. If a
|
||||
target specific variable is specified for a non-leaf target, the variable will
|
||||
be used even in build commands of prerequisite targets.
|
||||
|
||||
In general, I like GNU make, but this is the only GNU make's feature I don't
|
||||
like. See the following Makefile:
|
||||
|
||||
hello: CFLAGS := -g
|
||||
hello: hello.o
|
||||
gcc $(CFLAGS) $< -o $@
|
||||
hello.o: hello.c
|
||||
gcc $(CFLAGS) -c $< -o $@
|
||||
|
||||
If you run make for the target *hello*, *CFLAGS* is applied for both commands:
|
||||
|
||||
$ make hello
|
||||
gcc -g -c hello.c -o hello.o
|
||||
gcc -g hello.o -o hello
|
||||
|
||||
However, *CFLAGS* for *hello* won't be used when you build only *hello.o*:
|
||||
|
||||
$ make hello.o
|
||||
gcc -c hello.c -o hello.o
|
||||
|
||||
Things could be even worse when two targets with different target specific
|
||||
variables depend on a same target. The build result will be inconsistent. I
|
||||
think there is no valid usage of this feature for non-leaf targets.
|
||||
|
||||
Let's go back to the parsing. Like for semicolons, we need to delay the
|
||||
evaluation of the right hand side of the assignment for recursive variables. Its
|
||||
implementation is very similar to the one for semicolons, but the combination of
|
||||
the assignment and the semicolon makes parsing a bit trickier. An example:
|
||||
|
||||
target1: ;X=Y echo $(X) # A rule with a command
|
||||
target2: X=;Y echo $(X) # A target specific variable
|
||||
|
||||
Evaluator for expressions
|
||||
-------------------------
|
||||
|
||||
Evaluation of expressions is done in *expr.cc*, *func.cc*, and
|
||||
*command.cc*. The amount of code for this step is fairly large especially
|
||||
because of the number of GNU make functions. However, their implementations are
|
||||
fairly straightforward.
|
||||
|
||||
One tricky function is $(wildcard ...). It seems GNU make is doing some kind of
|
||||
optimization only for this function and $(wildcard ...) in commands seem to be
|
||||
evaluated before the evaluation phase for commands. Both C++ kati and Go kati
|
||||
are different from GNU make's behavior in different ways, but it seems this
|
||||
incompatibility is OK for Android build.
|
||||
|
||||
There is an important optimization done for Android. Android's build system has
|
||||
a lot of $(shell find ...) calls to create a list of all .java/.mk files under a
|
||||
directory, and they are slow. For this, kati has a builtin emulator of GNU
|
||||
find. The find emulator traverses the directory tree and creates an in-memory
|
||||
directory tree. Then the find emulator returns results of find commands using
|
||||
the cached tree. For my environment, the find command emulator makes kati ~1.6x
|
||||
faster for AOSP.
|
||||
|
||||
The implementations of some IO-related functions in commands are tricky in the
|
||||
ninja generation mode. This will be described later.
|
||||
|
||||
Dependency builder
|
||||
------------------
|
||||
|
||||
Now we get a list of rules and a variable table. *dep.cc* builds a dependency
|
||||
graph using the list of rules. I think this step is what GNU make is supposed to
|
||||
do for normal users.
|
||||
|
||||
This step is fairly complex like other components but there's nothing
|
||||
strange. There are three types of rules in GNU make:
|
||||
|
||||
* explicit rule
|
||||
* implicit rule
|
||||
* suffix rule
|
||||
|
||||
The following code shows the three types:
|
||||
|
||||
all: foo.o
|
||||
foo.o:
|
||||
echo explicit
|
||||
%.o:
|
||||
echo implicit
|
||||
.c.o:
|
||||
echo suffix
|
||||
|
||||
In the above example, all of these three rules match the target *foo.o*. GNU
|
||||
make prioritizes explicit rules first. When there's no explicit rule for a
|
||||
target, it uses an implicit rule with longer pattern string. Suffix rules are
|
||||
used only when there are no explicit/implicit rules.
|
||||
|
||||
Android has more than one thousand implicit rules and there are ten thousands of
|
||||
targets. It's too slow to do matching for them with a naive O(NM)
|
||||
algorithm. Kati uses a trie to speed up this step.
|
||||
|
||||
Multiple rules without commands should be merged into the rule with a
|
||||
command. For example:
|
||||
|
||||
foo.o: foo.h
|
||||
%.o: %.c
|
||||
$(CC) -c $< -o $@
|
||||
|
||||
*foo.o* depends not only on *foo.c*, but also on *foo.h*.
|
||||
|
||||
Executor
|
||||
--------
|
||||
|
||||
C++ kati's executor is fairly simple. This is defined in *exec.cc*. This is
|
||||
useful only for testing because this lacks some important features for a build
|
||||
system (e.g., parallel build).
|
||||
|
||||
Expressions in commands are evaluated at this stage. When they are evaluated,
|
||||
target specific variables and some special variables (e.g., $< and $@) should be
|
||||
considered. *command.cc* is handling them. This file is used by both the
|
||||
executor and the ninja generator.
|
||||
|
||||
Evaluation at this stage is tricky when both *+=* and target specific variables
|
||||
are involved. Here is an example code:
|
||||
|
||||
all: test1 test2 test3 test4
|
||||
|
||||
A:=X
|
||||
B=X
|
||||
X:=foo
|
||||
|
||||
test1: A+=$(X)
|
||||
test1:
|
||||
@echo $(A) # X bar
|
||||
|
||||
test2: B+=$(X)
|
||||
test2:
|
||||
@echo $(B) # X bar
|
||||
|
||||
test3: A:=
|
||||
test3: A+=$(X)
|
||||
test3:
|
||||
@echo $(A) # foo
|
||||
|
||||
test4: B=
|
||||
test4: B+=$(X)
|
||||
test4:
|
||||
@echo $(B) # bar
|
||||
|
||||
X:=bar
|
||||
|
||||
*$(A)* in *test3* is a simple variable. Though *$(A)* in the global scope is
|
||||
simple, *$(A)* in *test1* is a recursive variable. This means types of global
|
||||
variables don't affect types of target specific variables. However, The result
|
||||
of *test1* ("X bar") shows the value of a target specific variable is
|
||||
concatenated to the value of a global variable.
|
||||
|
||||
Ninja generator
|
||||
---------------
|
||||
|
||||
*ninja.cc* generates a ninja file using the results of other components. This
|
||||
step is actually fairly complicated because kati needs to map GNU make's
|
||||
features to ninja's.
|
||||
|
||||
A build rule in GNU make may have multiple commands, while ninja's has always a
|
||||
single command. To mitigate this, the ninja generator translates multiple
|
||||
commands into something like *(cmd1) && (cmd2) && ...*. Kati should also escape
|
||||
some special characters for ninja and shell.
|
||||
|
||||
The tougher thing is $(shell ...) in commands. Current kati's implementation
|
||||
translates it into shell's $(...). This works for many cases. But this approach
|
||||
won't work when the result of $(shell ...) is passed to another make
|
||||
function. For example
|
||||
|
||||
all:
|
||||
echo $(if $(shell echo),FAIL,PASS)
|
||||
|
||||
should output PASS, because the result of $(shell echo) is an empty string. GNU
|
||||
make and kati's executor mode output PASS correctly. However, kati's ninja
|
||||
generator emits a ninja file which shows FAIL.
|
||||
|
||||
I wrote a few experimental patches for this issue, but they didn't
|
||||
work well. The current kati's implementation has an Android specific workaround
|
||||
for this. See *HasNoIoInShellScript* in *func.cc* for detail.
|
||||
|
||||
Ninja regeneration
|
||||
------------------
|
||||
|
||||
C++ kati has --regen flag. If this flag is specified, kati checks if anything
|
||||
in your environment was changed after the previous run. If kati thinks it doesn't
|
||||
need to regenerate the ninja file, it finishes quickly. For Android, running
|
||||
kati takes ~30 secs at the first run but the second run takes only ~1 sec.
|
||||
|
||||
Kati thinks it needs to regenerate the ninja file when one of the followings is
|
||||
changed:
|
||||
|
||||
* The command line flags passed to kati
|
||||
* A timestamp of a Makefile used to generate the previous ninja file
|
||||
* An environment variable used while evaluating Makefiles
|
||||
* A result of $(wildcard ...)
|
||||
* A result of $(shell ...)
|
||||
|
||||
Quickly doing the last check is not trivial. It takes ~18 secs to run all
|
||||
$(shell ...) in Android's build system due to the slowness of $(shell find
|
||||
...). So, for find commands executed by kati's find emulator, kati stores the
|
||||
timestamps of traversed directories with the find command itself. For each find
|
||||
commands, kati checks the timestamps of them. If they are not changed, kati
|
||||
skips re-running the find command.
|
||||
|
||||
Kati doesn't run $(shell date ...) and $(shell echo ...) during this check. The
|
||||
former always changes so there's no sense to re-run them. Android uses the
|
||||
latter to create a file and the result of them are empty strings. We don't want
|
||||
to update these files to get empty strings.
|
||||
|
||||
TODO
|
||||
----
|
||||
|
||||
A big TODO is sub-makes invoked by $(MAKE). I wrote some experimental patches
|
||||
but nothing is ready to be used as of writing.
|
Loading…
Add table
Add a link
Reference in a new issue