upload android base code part8

This commit is contained in:
August 2018-08-08 20:10:12 +08:00
parent 841ae54672
commit 5425409085
57075 changed files with 9846578 additions and 0 deletions

View file

@ -0,0 +1,259 @@
<html devsite>
<head>
<title>AddressSanitizer</title>
<meta name="project_path" value="/_project.yaml" />
<meta name="book_path" value="/_book.yaml" />
</head>
<body>
<!--
Copyright 2017 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>AddressSanitizer (ASan) is a fast compiler-based tool for detecting memory bugs
in native code. It is comparable to Valgrind (Memcheck tool), but, unlike it,
ASan:</p>
<ul>
<li> + detects overflows on stack and global objects
<li> - does not detect uninitialized reads and memory leaks
<li> + is much faster (two-three times slowdown compared to Valgrinds 20-100x)
<li> + has less memory overhead
</ul>
<p>This document describes how to build and run parts of the Android platform with
AddressSanitizer. If you are looking to build a standalone (i.e. SDK/NDK)
application with AddressSanitizer, see the <a
href="https://github.com/google/sanitizers/wiki/AddressSanitizerOnAndroid">AddressSanitizerOnAndroid</a>
public project site instead.</p>
<p>AddressSanitizer consists of a compiler (<code>external/clang</code>) and a runtime library
(<code>external/compiler-rt/lib/asan</code>).</p>
<p class="note"><strong>Note</strong>: Use the current master
branch to gain access to the <a href="#sanitize_target">SANITIZE_TARGET</a>
feature and the ability to build the entire Android platform with
AddressSanitizer at once. Otherwise, you are limited to using
<code>LOCAL_SANITIZE</code>.</p>
<h2 id=building_with_clang>Building with Clang</h2>
<p>As a first step to building an ASan-instrumented binary, make sure that your
code builds with Clang. This is done by default on the master branch, so there should be nothing
you need to do. If you believe that the module you'd like to test is being built with GCC,
you can switch to Clang by adding <code>LOCAL_CLANG:=true</code>
to the build rules. Clang may find bugs in your code that GCC missed.</p>
<h2 id=building_executables_with_addresssanitizer>Building executables with AddressSanitizer</h2>
<p>Add <code>LOCAL_SANITIZE:=address</code> to the build rule of the
executable.</p>
<pre class="devsite-click-to-copy">
LOCAL_SANITIZE:=address
</pre>
<p>When a bug is detected, ASan prints a verbose report both to the standard
output and to <code>logcat</code> and then crashes the process.</p>
<h2 id=building_shared_libraries_with_addresssanitizer>Building shared libraries with AddressSanitizer</h2>
<p>Due to the way ASan works, a library built with ASan cannot be used by an
executable that's built without ASan.</p>
<p class="note"><strong>Note</strong>: In runtime situations where an ASan library is
loaded into an incorrect process, you will see unresolved symbol messages
starting with <code>_asan</code> or <code>_sanitizer</code>.</p>
<p>To sanitize a shared library that is used in multiple executables, not all of
which are built with ASan, you'll need two copies of the library. The
recommended way to do this is to add the following to <code>Android.mk</code>
for the module in question:</p>
<pre class="devsite-click-to-copy">
LOCAL_SANITIZE:=address
LOCAL_MODULE_RELATIVE_PATH := asan
</pre>
<p>This puts the library in <code>/system/lib/asan</code> instead of
<code>/system/lib</code>. Then, run your executable with:
<code>LD_LIBRARY_PATH=/system/lib/asan</code></p>
<p>For system daemons, add the following to the appropriate section of
<code>/init.rc</code> or <code>/init.$device$.rc</code>.</p>
<pre class="devsite-click-to-copy">
setenv LD_LIBRARY_PATH /system/lib/asan
</pre>
<p class="warning"><strong>Warning</strong>: The <code>LOCAL_MODULE_RELATIVE_PATH</code>
setting <strong>moves</strong> your library to <code>/system/lib/asan</code>,
meaning that clobbering and rebuilding from scratch will result in the
library missing from <code>/system/lib</code>, and probably an unbootable
image. That's an unfortunate limitation of the
current build system. Don't clobber; do <code>make -j $N</code> and <code>adb
sync</code>.</p>
<p>Verify the process is using libraries from <code>/system/lib/asan</code>
when present by reading <code>/proc/$PID/maps</code>. If it's not, you may need
to disable SELinux, like so:</p>
<pre class="devsite-click-to-copy">
<code class="devsite-terminal">adb root</code>
<code class="devsite-terminal">adb shell setenforce 0</code>
# restart the process with adb shell kill $PID
# if it is a system service, or may be adb shell stop; adb shell start.
</pre>
<h2 id=better_stack_traces>Better stack traces</h2>
<p>AddressSanitizer uses a fast, frame-pointer-based unwinder to record a stack
trace for every memory allocation and deallocation event in the program. Most
of Android is built without frame pointers. As a result, you will often get
only one or two meaningful frames. To fix this, either rebuild the library with
ASan (recommended!), or with:</p>
<pre class="devsite-click-to-copy">
LOCAL_CFLAGS:=-fno-omit-frame-pointer
LOCAL_ARM_MODE:=arm
</pre>
<p>Or set <code>ASAN_OPTIONS=fast_unwind_on_malloc=0</code> in the process
environment. The latter can be very CPU-intensive, depending on
the load.</p>
<h2 id=symbolization>Symbolization</h2>
<p>Initially, ASan reports contain references to offsets in binaries and shared
libraries. There are two ways to obtain source file and line information:</p>
<ul>
<li>Ensure llvm-symbolizer binary is present in <code>/system/bin</code>.
Llvm-symbolizer is built from sources in:
<code>third_party/llvm/tools/llvm-symbolizer</code> <li>Filter the report
through the <code>external/compiler-rt/lib/asan/scripts/symbolize.py</code>
script.
</ul>
<p>The second approach can provide more data (i.e. file:line locations) because of
the availability of symbolized libraries on the host.</p>
<h2 id=addresssanitizer_in_the_apps>AddressSanitizer in the apps</h2>
<p>AddressSanitizer cannot see into Java code, but it can detect bugs in the JNI
libraries. For that, you'll need to build the executable with ASan, which in
this case is <code>/system/bin/app_process(<em>32|64</em>)</code>. This will
enable ASan in all apps on the device at the same time, which is a
bit stressful, but nothing that a 2GB RAM device cannot handle.</p>
<p>Add the usual <code>LOCAL_SANITIZE:=address</code> to
the app_process build rule in <code>frameworks/base/cmds/app_process</code>. Ignore
the <code>app_process__asan</code> target in the same file for now (if it is
still there at the time you read
this). Edit the Zygote record in
<code>system/core/rootdir/init.zygote(<em>32|64</em>).rc</code> to add the
following lines:</p>
<pre class="devsite-click-to-copy">
setenv LD_LIBRARY_PATH /system/lib/asan:/system/lib
setenv ASAN_OPTIONS
allow_user_segv_handler=true
</pre>
<p>Build, adb sync, fastboot flash boot, reboot.</p>
<h2 id=using_the_wrap_property>Using the wrap property</h2>
<p>The approach in the previous section puts AddressSanitizer into every
application in the system (actually, into every descendant of the Zygote
process). It is possible to run only one (or several) applications with ASan,
trading some memory overhead for slower application startup.</p>
<p>This can be done by starting your app with the “wrap.” property, the same one
thats used to run apps under Valgrind. The following example runs the Gmail app
under ASan:</p>
<pre class="devsite-click-to-copy">
<code class="devsite-terminal">adb root</code>
<code class="devsite-terminal">adb shell setenforce 0 # disable SELinux</code>
<code class="devsite-terminal">adb shell setprop wrap.com.google.android.gm "asanwrapper"</code>
</pre>
<p>In this context, asanwrapper rewrites <code>/system/bin/app_process</code>
to <code>/system/bin/asan/app_process</code>, which is built with
AddressSanitizer. It also adds <code>/system/lib/asan</code> at the start of
the dynamic library search path. This way ASan-instrumented
libraries from <code>/system/lib/asan</code> are preferred to normal libraries
in <code>/system/lib</code> when running with asanwrapper.</p>
<p>Again, if a bug is found, the app will crash, and the report will be printed to
the log.</p>
<h2 id=sanitize_target>SANITIZE_TARGET</h2>
<p>The master branch has support for building the entire Android platform with
AddressSanitizer at once.</p>
<p>Run the following commands in the same build tree.</p>
<pre class="devsite-click-to-copy">
<code class="devsite-terminal">make -j42</code>
<code class="devsite-terminal">SANITIZE_TARGET=address make -j42</code>
</pre>
<p>In this mode, <code>userdata.img</code> contains extra libraries and must be
flashed to the device as well. Use the following command line:</p>
<pre class="devsite-terminal devsite-click-to-copy">
fastboot flash userdata && fastboot flashall
</pre>
<p>At the moment of this writing, modern Nexus and Pixel devices boot to the UI in this mode.</p>
<p>This works by building two sets of shared libraries: normal in
<code>/system/lib</code> (the first make invocation), ASan-instrumented in
<code>/data/asan/lib</code> (the second make invocation). Executables from the
second build overwrite the ones from the first build. ASan-instrumented
executables get a different library search path that includes
<code>/data/asan/lib</code> before <code>/system/lib</code> through the use of
"/system/bin/linker_asan" in PT_INTERP.</p>
<p>The build system clobbers intermediate object directories when the
<code>$SANITIZE_TARGET</code> value has changed. This forces a rebuild of all
targets while preserving installed binaries under <code>/system/lib</code>.</p>
<p>Some targets cannot be built with ASan:</p>
<ul>
<li>Statically linked executables.
<li><code>LOCAL_CLANG:=false</code> targets
<li><code>LOCAL_SANITIZE:=false</code> will not be ASan'd for <code>SANITIZE_TARGET=address</code>
</ul>
<p>Executables like these are skipped in the SANITIZE_TARGET build, and the
version from the first make invocation is left in <code>/system/bin</code>.</p>
<p>Libraries like this are simply built without ASan. They can contain some ASan
code anyway from the static libraries they depend upon.</p>
<h2 id=supporting_documentation>Supporting documentation</h2>
<p><a href="https://github.com/google/sanitizers/wiki/AddressSanitizerOnAndroid">AddressSanitizerOnAndroid</a> public project site</p>
<p><a href="https://www.chromium.org/developers/testing/addresssanitizer">AddressSanitizer and Chromium</a></p>
<p><a href="https://github.com/google/sanitizers">Other Google Sanitizers</a></p>
</body>
</html>

View file

@ -0,0 +1,268 @@
<html devsite>
<head>
<title>Evaluating Performance</title>
<meta name="project_path" value="/_project.yaml" />
<meta name="book_path" value="/_book.yaml" />
</head>
<body>
<!--
Copyright 2017 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>There are two user-visible indicators of performance:</p>
<ul>
<li><strong>Predictable, perceptible performance</strong>. Does the user
interface (UI) drop frames or consistently render at 60FPS? Does audio play
without artifacts or popping? How long is the delay between the user touching
the screen and the effect showing on the display?</li>
<li><strong>Length of time required for longer operations</strong> (such as
opening applications).</li>
</ul>
<p>The first is more noticeable than the second. Users typically notice jank
but they won't be able to tell 500ms vs 600ms application startup time unless
they are looking at two devices side-by-side. Touch latency is immediately
noticeable and significantly contributes to the perception of a device.</p>
<p>As a result, in a fast device, the UI pipeline is the most important thing in
the system other than what is necessary to keep the UI pipeline functional. This
means that the UI pipeline should preempt any other work that is not necessary
for fluid UI. To maintain a fluid UI, background syncing, notification delivery,
and similar work must all be delayed if UI work can be run. It is
acceptable to trade the performance of longer operations (HDR+ runtime,
application startup, etc.) to maintain a fluid UI.</p>
<h2 id="capacity_vs_jitter">Capacity vs jitter</h2>
<p>When considering device performance, <em>capacity</em> and <em>jitter</em>
are two meaningful metrics.</p>
<h3 id="capacity">Capacity</h3>
<p>Capacity is the total amount of some resource that the device possesses over
some amount of time. This can be CPU resources, GPU resources, I/O resources,
network resources, memory bandwidth, or any similar metric. When examining
whole-system performance, it can be useful to abstract the individual components
and assume a single metric that determines performance (especially when tuning a
new device because the workloads run on that device are likely fixed).</p>
<p>The capacity of a system varies based on the computing resources online.
Changing CPU/GPU frequency is the primary means of changing capacity, but there
are others such as changing the number of CPU cores online. Accordingly, the
capacity of a system corresponds with power consumption; <strong>changing
capacity always results in a similar change in power consumption.</strong></p>
<p>The capacity required at a given time is overwhelmingly determined by the
running application. As a result, the platform can do little to adjust the
capacity required for a given workload, and the means to do so are limited to
runtime improvements (Android framework, ART, Bionic, GPU compiler/drivers,
kernel).</p>
<h3 id="jitter">Jitter</h3>
<p>While the required capacity for a workload is easy to see, jitter is a more
nebulous concept. For a good introduction to jitter as an impediment to fast
systems, refer to
<em><a href="http://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-03-3116">THE
CASE OF THE MISSING SUPERCOMPUTER PERFORMANCE: ACHIEVING OPTIMAL PERFORMANCE ON
THE 8,192 PROCESSORS OF ASCl Q</em></a>. (It's an investigation of why the ASCI
Q supercomputer did not achieve its expected performance and is a great
introduction to optimizing large systems.)</p>
<p>This page uses the term jitter to describe what the ASCI Q paper calls
<em>noise</em>. Jitter is the random system behavior that prevents perceptible
work from running. It is often work that must be run, but it may not have strict
timing requirements that cause it to run at any particular time. Because it is
random, it is extremely difficult to disprove the existence of jitter for a
given workload. It is also extremely difficult to prove that a known source of
jitter was the cause of a particular performance issue. The tools most commonly
used for diagnosing causes of jitter (such as tracing or logging) can introduce
their own jitter.</p>
<p>Sources of jitter experienced in real-world implementations of Android
include:</p>
<ul>
<li>Scheduler delay</li>
<li>Interrupt handlers</li>
<li>Driver code running for too long with preemption or interrupts disabled</li>
<li>Long-running softirqs</li>
<li>Lock contention (application, framework, kernel driver, binder lock, mmap
lock)</li>
<li>File descriptor contention where a low-priority thread holds the lock on a
file, preventing a high-priority thread from running</li>
<li>Running UI-critical code in workqueues where it could be delayed</li>
<li>CPU idle transitions</li>
<li>Logging</li>
<li>I/O delays</li>
<li>Unnecessary process creation (e.g., CONNECTIVITY_CHANGE broadcasts)</li>
<li>Page cache thrashing caused by insufficient free memory</li>
</ul>
<p>The required amount of time for a given period of jitter may or may not
decrease as capacity increases. For example, if a driver leaves interrupts
disabled while waiting for a read from across an i2c bus, it will take a fixed
amount of time regardless of whether the CPU is at 384MHz or 2GHz. Increasing
capacity is not a feasible solution to improve performance when jitter is
involved. As a result, <strong>faster processors will not usually improve
performance in jitter-constrained situations.</strong></p>
<p>Finally, unlike capacity, jitter is almost entirely within the domain of the
system vendor.</p>
<h3 id="memory_consumption">Memory consumption</h3>
<p>Memory consumption is traditionally blamed for poor performance. While
consumption itself is not a performance issue, it can cause jitter via
lowmemorykiller overhead, service restarts, and page cache thrashing. Reducing
memory consumption can avoid the direct causes of poor performance, but there
may be other targeted improvements that avoid those causes as well (for example,
pinning the framework to prevent it from being paged out when it will be paged
in soon after).</p>
<h2 id="analyze_initial">Analyzing initial device performance</h2>
<p>Starting from a functional but poorly-performing system and attempting to fix
the system's behavior by looking at individual cases of user-visible poor
performance is <strong>not</strong> a sound strategy. Because poor performance
is usually not easily reproducible (i.e., jitter) or an application issue, too
many variables in the full system prevent this strategy from being effective. As
a result, it's very easy to misidentify causes and make minor improvements while
missing systemic opportunities for fixing performance across the system.</p>
<p>Instead, use the following general approach when bringing up a new
device:</p>
<ol>
<li>Get the system booting to UI with all drivers running and some basic
frequency governor settings (if you change the frequency governor settings,
repeat all steps below).</li>
<li>Ensure the kernel supports the <code>sched_blocked_reason</code> tracepoint
as well as other tracepoints in the display pipeline that denote when the frame
is delivered to the display.</li>
<li>Take long traces of the entire UI pipeline (from receiving input via an IRQ
to final scanout) while running a lightweight and consistent workload (e.g.,
<a href="https://android.googlesource.com/platform/frameworks/base.git/+/master/tests/UiBench/">UiBench</a>
or the ball test in <a href="#touchlatency">TouchLatency)</a>.</li>
<li>Fix the frame drops detected in the lightweight and consistent
workload.</li>
<li>Repeat steps 3-4 until you can run with zero dropped frames for 20+ seconds
at a time. </li>
<li>Move on to other user-visible sources of jank.</li>
</ol>
<p>Other simple things you can do early on in device bringup include:</p>
<ul>
<li>Ensure your kernel has the
<a href="https://android.googlesource.com/kernel/msm/+/c9f00aa0e25e397533c198a0fcf6246715f99a7b%5E!/">sched_blocked_reason
tracepoint patch</a>. This tracepoint is enabled with the sched trace category
in systrace and provides the function responsible for sleeping when that
thread enters uninterruptible sleep. It is critical for performance analysis
because uninterruptible sleep is a very common indicator of jitter.</li>
<li>Ensure you have sufficient tracing for the GPU and display pipelines. On
recent Qualcomm SOCs, tracepoints are enabled using:</li>
<pre class="devsite-click-to-copy">
<code class="devsite-terminal">adb shell "echo 1 &gt; /d/tracing/events/kgsl/enable"</code>
<code class="devsite-terminal">adb shell "echo 1 &gt; /d/tracing/events/mdss/enable"</code>
</pre>
<p>These events remain enabled when you run systrace so you can see additional
information in the trace about the display pipeline (MDSS) in the
<code>mdss_fb0</code> section. On Qualcomm SOCs, you won't see any additional
information about the GPU in the standard systrace view, but the results are
present in the trace itself (for details, see
<a href="/devices/tech/debug/systrace.html">Understanding
systrace</a>).</p>
<p>What you want from this kind of display tracing is a single event that
directly indicates a frame has been delivered to the display. From there, you
can determine if you've hit your frame time successfully; if event X<em>n</em>
occurs less than 16.7ms after event X<em>n-1</em> (assuming a 60Hz display),
then you know you did not jank. If your SOC does not provide such signals, work
with your vendor to get them. Debugging jitter is extremely difficult without a
definitive signal of frame completion.</p></ul>
<h3 id="synthetic_benchmarks">Using synthetic benchmarks</h3>
<p>Synthetic benchmarks are useful for ensuring a device's basic functionality
is present. However, treating benchmarks as a proxy for perceived device
performance is not useful.</p>
<p>Based on experiences with SOCs, differences in synthetic benchmark
performance between SOCs is not correlated with a similar difference in
perceptible UI performance (number of dropped frames, 99th percentile frame
time, etc.). Synthetic benchmarks are capacity-only benchmarks; jitter impacts
the measured performance of these benchmarks only by stealing time from the bulk
operation of the benchmark. As a result, synthetic benchmark scores are mostly
irrelevant as a metric of user-perceived performance.</p>
<p>Consider two SOCs running Benchmark X that renders 1000 frames of UI and
reports the total rendering time (lower score is better).</p>
<ul>
<li>SOC 1 renders each frame of Benchmark X in 10ms and scores 10,000.</li>
<li>SOC 2 renders 99% of frames in 1ms but 1% of frames in 100ms and scores
19,900, a dramatically better score.</li>
</ul>
<p>If the benchmark is indicative of actual UI performance, SOC 2 would be
unusable. Assuming a 60Hz refresh rate, SOC 2 would have a janky frame every
1.5s of operation. Meanwhile, SOC 1 (the slower SOC according to Benchmark X)
would be perfectly fluid.</p>
<h3 id="bug_reports">Using bug reports</h3>
<p>Bug reports are sometimes useful for performance analysis, but because they
are so heavyweight, they are rarely useful for debugging sporadic jank issues.
They may provide some hints on what the system was doing at a given time,
especially if the jank was around an application transition (which is logged in
a bug report). Bug reports can also indicate when something is more broadly
wrong with the system that could reduce its effective capacity (such as thermal
throttling or memory fragmentation).</p>
<h3 id="touchlatency">Using TouchLatency</h3>
<p>Several examples of bad behavior come from TouchLatency, which is the
preferred periodic workload used for the Pixel and Pixel XL. It's available at
<code>frameworks/base/tests/TouchLatency</code> and has two modes: touch latency
and bouncing ball (to switch modes, click the button in the upper-right
corner).</p>
<p>The bouncing ball test is exactly as simple as it appears: A ball bounces
around the screen forever, regardless of user input. It is usually also
<strong>by far</strong> the hardest test to run perfectly, but the closer it
comes to running without any dropped frames, the better your device will be. The
bouncing ball test is difficult because it is a trivial but perfectly consistent
workload that runs at a very low clock (this assumes device has a frequency
governor; if the device is instead running with fixed clocks, downclock the
CPU/GPU to near-minimum when running the bouncing ball test for the first time).
As the system quiesces and the clocks drop closer to idle, the required CPU/GPU
time per frame increases. You can watch the ball and see things jank, and you'll
be able to see missed frames in systrace as well.</p>
<p>Because the workload is so consistent, you can identify most sources of
jitter much more easily than in most user-visible workloads by tracking what
exactly is running on the system during each missed frame instead of the UI
pipeline. <strong>The lower clocks amplify the effects of jitter by making it
more likely that any jitter causes a dropped frame.</strong> As a result, the
closer TouchLatency is to 60FPS, the less likely you are to have bad system
behaviors that cause sporadic, hard-to-reproduce jank in larger
applications.</p>
<p>As jitter is often (but not always) clockspeed-invariant, use a test that
runs at very low clocks to diagnose jitter for the following reasons:</p>
<ul>
<li>Not all jitter is clockspeed-invariant; many sources just consume CPU
time.</li>
<li>The governor should get the average frame time close to the deadline by
clocking down, so time spent running non-UI work can push it over the edge to
dropping a frame.</li>
</ul>
</body>
</html>

View file

@ -0,0 +1,360 @@
<html devsite>
<head>
<title>Using ftrace</title>
<meta name="project_path" value="/_project.yaml" />
<meta name="book_path" value="/_book.yaml" />
</head>
<body>
<!--
Copyright 2017 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>ftrace is a debugging tool for understanding what is going on inside the
Linux kernel. The following sections detail basic ftrace functionality, ftrace
usage with atrace (which captures kernel events), and dynamic ftrace.</p>
<p>For details on advanced ftrace functionality that is not available from
systrace, refer to the ftrace documentation at
<a href="https://www.kernel.org/doc/Documentation/trace/ftrace.txt"><code>&lt;kernel
tree&gt;/Documentation/trace/ftrace.txt</code></a>.</p>
<h2 id="atrace">Capturing kernel events with atrace</h2>
<p>atrace (<code>frameworks/native/cmds/atrace</code>) uses ftrace to capture
kernel events. In turn, systrace.py (or run_systrace.py in later versions of
<a href="https://github.com/catapult-project/catapult">Catapult</a>) uses adb
to run atrace on the device. atrace does the following:</p>
<ul>
<li>Sets up user-mode tracing by setting a property
(<code>debug.atrace.tags.enableflags</code>).</li>
<li>Enables the desired ftrace functionality by writing to the appropriate
ftrace sysfs nodes. However, as ftrace supports more features, you might set
some sysfs nodes yourself then use atrace. </li>
</ul>
<p>With the exception of boot-time tracing, rely on using atrace to set the
property to the appropriate value. The property is a bitmask and there's no good
way to determine the correct values other than looking at the appropriate header
(which could change between Android releases).</p>
<h2 id="enabling_events">Enabling ftrace events</h2>
<p>The ftrace sysfs nodes are in <code>/d/tracing</code> and trace events are
divided into categories in <code>/d/tracing/events</code>.
<p>To enable events on a per-category basis, use:
<pre class="devsite-terminal devsite-click-to-copy">
echo 1 &gt; /d/tracing/events/irq/enable
</pre>
<p>To enable events on per-event basis, use:
<pre class="devsite-terminal devsite-click-to-copy">
echo 1 &gt; /d/tracing/events/sched/sched_wakeup/enable
</pre>
<p>If extra events have been enabled by writing to sysfs nodes, they will
<strong>not</strong> be reset by atrace. A common pattern
for Qualcomm device bringup is to enable <code>kgsl</code> (GPU) and
<code>mdss</code> (display pipeline) tracepoints and then use atrace or
<a href="/devices/tech/debug/systrace.html">systrace</a>:</p>
<pre class="devsite-click-to-copy">
<code class="devsite-terminal">adb shell "echo 1 &gt; /d/tracing/events/mdss/enable"</code>
<code class="devsite-terminal">adb shell "echo 1 &gt; /d/tracing/events/kgsl/enable"</code>
<code class="devsite-terminal">./systrace.py sched freq idle am wm gfx view binder_driver irq workq ss sync -t 10 -b 96000 -o full_trace.html</code>
</pre>
<p>You can also use ftrace without atrace or systrace, which is
useful when you want kernel-only traces (or if you've taken the time to write
the user-mode tracing property by hand). To run just ftrace:</p>
<ol>
<li>Set the buffer size to a value large enough for your trace:
<pre class="devsite-terminal devsite-click-to-copy">
echo 96000 &gt; /d/tracing/buffer_size_kb
</pre>
</li>
<li>Enable tracing:
<pre class="devsite-terminal devsite-click-to-copy">
echo 1 &gt; /d/tracing/tracing_on
</pre>
</li>
<li>Run your test, then disable tracing:
<pre class="devsite-terminal devsite-click-to-copy">
echo 0 &gt; /d/tracing/tracing_on
</pre>
</li>
<li>Dump the trace:
<pre class="devsite-terminal devsite-click-to-copy">
cat /d/tracing/trace &gt; /data/local/tmp/trace_output
</pre>
</li>
</ol>
<p>The trace_output gives the trace in text form. To visualize it using
Catapult, get the
<a href="https://github.com/catapult-project/catapult/tree/master/">Catapult
repository</a> from Github and run trace2html:</p>
<pre class="devsite-terminal devsite-click-to-copy">
catapult/tracing/bin/trace2html ~/path/to/trace_file
</pre>
<p>By default, this writes <code>trace_file.html</code> in the same
directory.</p>
<h2 id="correlate">Correlating events</h2>
<p>It is often useful to look at the Catapult visualization and the ftrace
log simultaneously; for example, some ftrace events (especially vendor-specific
ones) are not visualized by Catapult. However, Catapult's timestamps are
relative either to the first event in the trace or to a specific timestamp
dumped by atrace, while the raw ftrace timestamps are based on a particular
absolute clock source in the Linux kernel.</p>
<p>To find a given ftrace event from a Catapult event:</p>
<ol>
<li>Open the raw ftrace log. Traces in recent versions of systrace are
compressed by default:
<ul>
<li>If you captured your systrace with <code>--no-compress</code>, this is in
the html file in the section beginning with BEGIN TRACE.</li>
<li>If not, run html2trace from the
<a href="https://github.com/catapult-project/catapult/tree/master/">Catapult
tree</a> (<code>tracing/bin/html2trace</code>) to uncompress the trace.</li>
</ul>
</li>
<li>Find the relative timestamp in the Catapult visualization.</li>
<li>Find a line at the beginning of the trace containing
<code>tracing_mark_sync</code>. It should look something like this:
<pre class="devsite-click-to-copy">
&lt;5134&gt;-5134 (-----) [003] ...1 68.104349: tracing_mark_write: trace_event_clock_sync: parent_ts=68.104286
</pre>
<br>If this line does not exist (or if you used ftrace without atrace), then
timings will be relative from the first event in the ftrace log.
<ol style="list-style-type: lower-alpha">
<li>Add the relative timestamp (in milliseconds) to the value in
<code>parent_ts</code> (in seconds).</li>
<li>Search for the new timestamp.</li>
</ol>
</li>
</ol>
<p>These steps should put you at (or at least very close to) the event.</p>
<h2 id="dftrace">Using dynamic ftrace</h2>
<p>When systrace and standard ftrace are insufficient, there is one last
recourse available: <em>dynamic ftrace</em>. Dynamic ftrace involves rewriting
of kernel code after boot, and as a result it is not available in production
kernels for security reasons. However, every single difficult performance bug in
2015 and 2016 was ultimately root-caused using dynamic ftrace. It is especially
powerful for debugging uninterruptible sleeps because you can get a stack trace
in the kernel every time you hit the function triggering uninterruptible sleep.
You can also debug sections with interrupts and preemptions disabled, which can
be very useful for proving issues.</p>
<p>To turn on dynamic ftrace, edit your kernel's defconfig:</p>
<ol>
<li>Remove CONFIG_STRICT_MEMORY_RWX (if it's present). If you're on 3.18 or
newer and arm64, it's not there.</li>
<li>Add the following: CONFIG_DYNAMIC_FTRACE=y, CONFIG_FUNCTION_TRACER=y,
CONFIG_IRQSOFF_TRACER=y, CONFIG_FUNCTION_PROFILER=y, and CONFIG_PREEMPT_TRACER=y
</li>
<li>Rebuild and boot the new kernel.</li>
<li>Run the following to check for available tracers:
<pre class="devsite-terminal devsite-click-to-copy">
cat /d/tracing/available_tracers
</pre>
</li>
<li>Confirm the command returns <code>function</code>, <code>irqsoff</code>,
<code>preemptoff</code>, and <code>preemptirqsoff</code>.</li>
<li>Run the following to ensure dynamic ftrace is working:
<pre class="devsite-terminal devsite-click-to-copy">
cat /d/tracing/available_filter_functions | grep &lt;a function you care about&gt;
</pre>
</li>
</ol>
<p>After completing these steps, you have dynamic ftrace, the function profiler,
the irqsoff profiler, and the preemptoff profiler available. We <strong>strongly
recommend</strong> reading ftrace documentation on these topics before using
them as they are powerful but complex. irqsoff and preemptoff are primarily
useful for confirming that drivers may be leaving interrupts or preemption
turned off for too long.</p>
<p>The function profiler is the best option for performance issues and is often
used to find out where a function is being called.</p>
<section class="expandable">
<h4 class="showalways">Show Issue: HDR photo + rotating viewfinder</h4>
<p>In this issue, using a Pixel XL to take an HDR+ photo then immediately
rotating the viewfinder caused jank every time. We used the function profiler to
debug the issue in less than one hour. To follow along with the example,
<a href="perf_traces.zip">download the zip file</a> of traces (which also
includes other traces referred to in this section), unzip the file, and open the
trace_30898724.html file in your browser.</p>
<p>The trace shows several threads in the cameraserver process blocked in
uninterruptible sleep on <code>ion_client_destroy</code>. That's an expensive
function, but it should be called very infrequently because ion clients should
encompass many allocations. Initially, the blame fell on the Hexagon code in
Halide, which was indeed one of the culprits (it created a new client for every
ion allocation and destroyed that client when the allocation was freed, which
was way too expensive). Moving to a single ion client for all Hexagon
allocations improved the situation, but the jank wasn't fixed.</p>
<p>At this point we need to know who is calling <code>ion_client_destroy</code>,
so it's time to use the function profiler:</p>
<p></p>
<ol>
<li>As functions are sometimes renamed by the compiler, confirm
<code>ion_client_destroy</code> is there by using:
<pre class="devsite-terminal devsite-click-to-copy">
cat /d/tracing/available_filter_functions | grep ion_client_destroy
</pre>
</li>
<li>After confirming it is there, use it as the ftrace filter:
<pre class="devsite-terminal devsite-click-to-copy">
echo ion_client_destroy &gt; /d/tracing/set_ftrace_filter
</pre>
</li>
<li>Turn on the function profiler:
<pre class="devsite-terminal devsite-click-to-copy">
echo function &gt; /d/tracing/current_tracer
</pre>
</li>
<li>Turn on stack traces whenever a filter function is called:
<pre class="devsite-terminal devsite-click-to-copy">
echo func_stack_trace &gt; /d/tracing/trace_options
</pre>
</li>
<li>Increase the buffer size:
<pre class="devsite-terminal devsite-click-to-copy">
echo 64000 &gt; /d/tracing/buffer_size_kb
</pre>
</li>
<li>Turn on tracing:
<pre class="devsite-terminal devsite-click-to-copy">
echo 1 &gt; /d/tracing/trace_on
</pre>
</li>
<li>Run the test and get the trace:
<pre class="devsite-terminal devsite-click-to-copy">
cat /d/tracing/trace &gt; /data/local/tmp/trace
</pre>
</li>
<li>View the trace to see lots and lots of stack traces:
<pre class="devsite-click-to-copy">
cameraserver-643 [003] ...1 94.192991: ion_client_destroy &lt;-ion_release
cameraserver-643 [003] ...1 94.192997: &lt;stack trace&gt;
=&gt; ftrace_ops_no_ops
=&gt; ftrace_graph_call
=&gt; ion_client_destroy
=&gt; ion_release
=&gt; __fput
=&gt; ____fput
=&gt; task_work_run
=&gt; do_notify_resume
=&gt; work_pending
</pre>
</li>
</ol>
<p>Based on inspection of the ion driver, we can see that
<code>ion_client_destroy</code> is being spammed by a userspace function closing
an fd to <code>/dev/ion</code>, not a random kernel driver. By searching the
Android codebase for <code>\"/dev/ion\"</code>, we find several vendor drivers
doing the same thing as the Hexagon driver and opening/closing
<code>/dev/ion</code> (creating and destroying a new ion client) every time they
need a new ion allocation. Changing those to
<a href="https://android.googlesource.com/platform/hardware/qcom/camera/+/8f7984018b6643f430c229725a58d3c6bb04acab">use
a single ion client</a> for the lifetime of the process fixed the bug.</p>
</section>
<hr>
<p>If the data from function profiler isn't specific enough, you can combine
ftrace tracepoints with the function profiler. ftrace events can be enabled in
exactly the same way as usual, and they will be interleaved with your trace.
This is great if there's an occasional long uninterruptible sleep in a specific
function you want to debug: set the ftrace filter to the function you want,
enable tracepoints, take a trace. You can parse the resulting trace with
<code>trace2html</code>, find the event you want, then get nearby stack traces
in the raw trace.</p>
<h2 id="lock_stat">Using lockstat</h2>
<p>Sometimes, ftrace isn't enough and you really need to debug what appears to
be kernel lock contention. There is one more kernel option worth trying:
<code>CONFIG_LOCK_STAT</code>. This is a last resort as it is extremely
difficult to get working on Android devices because it inflates the size of the
kernel beyond what most devices can handle.</p>
<p>However, lockstat uses the debug
locking infrastructure, which is useful for many other applications. Everyone
working on device bringup should figure out some way to get that option working
on every device because there <strong>will</strong> be a time when you think
"If only I could turn on <code>LOCK_STAT</code>, I could confirm or refute this
as the problem in five minutes instead of five days."</p>
<section class="expandable">
<h4 class="showalways">Show Issue: Stall in SCHED_FIFO when cores at max load
with non-SCHED_FIFO</h4>
<p>In this issue, the SCHED_FIFO thread stalled when all cores were at maximum
load with non-SCHED_FIFO threads. We had traces showing significant lock
contention on an fd in VR apps, but we couldn't easily identify the fd in use.
To follow along with the example, <a href="perf_traces.zip">download the zip
file</a> of traces (which also includes other traces referred to in this
section), unzip the file, and open the trace_30905547.html file in your browser.
</p>
<p>We hypothesized that ftrace itself was the source of lock contention, when a
low priority thread would start writing to the ftrace pipe and then get
preempted before it could release the lock. This is a worst-case scenario that
was exacerbated by a mixture of extremely low-priority threads writing to the
ftrace marker along with some higher priority threads spinning on CPUs to
simulate a completely loaded device.</p>
<p>As we couldn't use ftrace to debug, we got <code>LOCK_STAT</code> working
then turned off all other tracing from the app. The results showed the lock
contention was actually from ftrace because none of the contention showed up in
the lock trace when ftrace was not running.</p>
</section>
<hr>
<p>If you can boot a kernel with the config option, lock tracing is similar to
ftrace:</p>
<ol>
<li>Enable tracing:
<pre class="devsite-terminal devsite-click-to-copy">
echo 1 &gt; /proc/sys/kernel/lock_stat
</pre>
</li>
<li>Run your test.</li>
<li>Disable tracing:
<pre class="devsite-terminal devsite-click-to-copy">
echo 0 &gt; /proc/sys/kernel/lock_stat
</pre>
</li>
<li>Dump your trace:
<pre class="devsite-terminal devsite-click-to-copy">
cat /proc/lock_stat &gt; /data/local/tmp/lock_stat
</pre>
</li>
</ol>
<p>For help interpreting the resulting output, refer to lockstat documentation
at <a href="https://www.kernel.org/doc/Documentation/locking/lockstat.txt"><code>&lt;kernel&gt;/Documentation/locking/lockstat.txt</code></a>.</p>
</body>
</html>

View file

@ -0,0 +1,141 @@
<html devsite>
<head>
<title>Using GDB</title>
<meta name="project_path" value="/_project.yaml" />
<meta name="book_path" value="/_book.yaml" />
</head>
<body>
<!--
Copyright 2017 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>The GNU Project debugger (GDB) is a commonly used Unix debugger. This page
details using <code>gdb</code> to debug Android apps and processes.</p>
<h2 id=running>Debugging running apps or processes</h2>
<p>To connect to an already-running app or native daemon, use
<code>gdbclient</code> with a PID. For example, to debug the process with PID
1234, run:</p>
<pre class="devsite-terminal devsite-click-to-copy">
gdbclient 1234
</pre>
<p>The script sets up port forwarding, starts the appropriate
<code>gdbserver</code> on the device, starts the appropriate <code>gdb</code> on
the host, configures <code>gdb</code> to find symbols, and connects
<code>gdb</code> to the remote <code>gdbserver</code>.</p>
<h2 id=starts>Debugging native process startup</h2>
<p>To debug a process as it starts, use <code>gdbserver</code> or
<code>gdbserver64</code> (for 64-bit processes). For example:</p>
<pre class="devsite-terminal devsite-click-to-copy">
adb shell gdbserver :5039 /system/bin/<var>MY_TEST_APP</var>
</pre>
<p>Example output:</p>
<pre class="devsite-click-to-copy">
Process <var>MY_TEST_APP</var> created; pid = 3460
Listening on port 5039
</pre>
<p>Next, identify the application PID from the <code>gdbserver</code> output and
use it in another terminal window:</p>
<pre class="devsite-terminal devsite-click-to-copy">
gdbclient <var>APP_PID</var>
</pre>
<p>Finally, enter <strong>continue</strong> at the <code>gdb</code> prompt.</p>
<p class="note"><strong>Note:</strong> If you specify the wrong
<code>gdbserver</code>, you'll get an unhelpful error message (such as
"<code>Reply contains invalid hex digit 59</code>").</p>
<h2 id="app-startup">Debugging app startup</h2>
<p>Sometimes you want to debug an app as it starts, such as when there's a crash
and you want to step through code to see what happens <em>before</em> the crash.
<a href="#running">Attaching</a> works in some cases, but in other cases is
impossible because the app crashes before you can attach. The
<code>logwrapper</code> approach (used for <code>strace</code> and
<code>valgrind</code>) does not always work because the app might not have
permissions to open a port, and <code>gdbserver</code> inherits that
restriction.</p>
<p>To debug app startup, use the developer options in Settings to instruct
the app to wait for a Java debugger to attach:</p>
<ol>
<li>Go to <em>Settings > Developer options > Select debug app</em> and choose
your app from the list, then press <strong>Wait for debugger</strong>.</li>
<li>Start the app, either from the launcher or by using the command line to run:
<pre class="devsite-terminal devsite-click-to-copy">
am start -a android.intent.action.MAIN -n <var>APP_NAME</var>/.<var>APP_ACTIVITY</var>
</pre></li>
<li>Wait for the app to load and a dialog to appear telling you the app is
waiting for a debugger.</li>
<li>Attach <code>gdbserver</code>/<code>gdbclient</code> normally, set
breakpoints, then continue the process.</li></ol>
<p>To let the app actually run, attach a Java Debug Wire Protocol (JDWP)
debugger such as Java Debugger (jdb):</p>
<pre class="devsite-click-to-copy">
<code class="devsite-terminal">adb forward tcp:12345 jdwp:<var>XXX</var> # (Where XXX is the pid of the debugged process.)</code>
<code class="devsite-terminal">jdb -attach localhost:12345</code>
</pre>
<h2 id=crash>Debugging apps or processes that crash</h2>
<p>If you want <code>debuggerd</code> to suspend crashed processes so you can
attach <code>gdb</code>, set the appropriate property:</p>
<pre class="devsite-click-to-copy">
# Android 7.0 Nougat and later.
<code class="devsite-terminal">adb shell setprop debug.debuggerd.wait_for_gdb true</code>
</pre>
<pre class="devsite-click-to-copy">
# Android 6.0 Marshmallow and earlier.
<code class="devsite-terminal">adb shell setprop debug.db.uid 999999</code>
</pre>
<p>At the end of the usual crash output, <code>debuggerd</code> provides
instructions on how to connect <code>gdb</code> using the command:
<pre class="devsite-terminal devsite-click-to-copy">
gdbclient <var>PID</var>
</pre>
<h2 id=symbols>Debugging without symbols</h2>
<p>For 32-bit ARM, if you dont have symbols, <code>gdb</code> can get confused
about the instruction set it is disassembling (ARM or Thumb). To specify the
instruction set chosen as the default when symbol information is missing, set
the following property:</p>
<pre class="devsite-terminal devsite-click-to-copy">
set arm fallback-mode arm # or thumb
</pre>
</body>
</html>

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

View file

@ -0,0 +1,170 @@
<html devsite>
<head>
<title>Debugging Native Android Platform Code</title>
<meta name="project_path" value="/_project.yaml" />
<meta name="book_path" value="/_book.yaml" />
</head>
<body>
<!--
Copyright 2017 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>This section summarizes useful tools and related commands for debugging,
tracing, and profiling native Android platform code when developing
platform-level features.</p>
<p>This page covers use of <code>debuggerd</code>, a daemon process for
collecting error information after applications crash. Other pages in this
section explore system services with
<a href="https://developer.android.com/studio/command-line/dumpsys.html">Dumpsys</a>, viewing
<a href="/devices/tech/debug/native-memory.html">native memory</a>,
<a href="https://developer.android.com/studio/command-line/dumpsys.html#network">network</a>, and
<a href="https://developer.android.com/studio/command-line/dumpsys.html#procstats">RAM</a> usage, using
<a href="/devices/tech/debug/asan.html">AddressSanitizer</a> to detect memory
bugs in native code, evaluating
<a href="/devices/tech/debug/eval_perf.html"> performance issues</a> (includes
<a href="/devices/tech/debug/systrace">systrace</a>), and using
<a href="/devices/tech/debug/gdb.html">GNU Project debugger (GDB)</a> and
other debugging tools.</p>
<h2 id=debuggerd>Using debuggerd</h2>
<p>The <code>debuggerd</code> process dumps registers and unwinds the stack.
When a dynamically linked executable starts, several signal handlers are
registered that connect to <code>debuggerd</code> (or <code>debuggerd64)</code>
in the event that signals (such as SIGSEGV or SIGABRT) are sent to the process.</p>
<p>It's possible for <code>debuggerd</code> to attach only if nothing else is
already attached, which means using tools such as <code>strace</code> or
<code>gdb</code> will prevent <code>debuggerd</code> from working. You can also
explicitly prevent <code>debuggerd</code> from attaching by calling
<code>prctl(PR_SET_DUMPABLE, 0)</code>, which can be useful when you need to
opt out of crash reporting.</p>
<p>Example <code>debuggerd</code> output (with timestamps and extraneous
information removed):</p>
<pre class="devsite-click-to-copy">
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'Android/aosp_angler/angler:7.1.1/NYC/enh12211018:eng/test-keys'
Revision: '0'
ABI: 'arm'
pid: 17946, tid: 17949, name: crasher &gt;&gt;&gt; crasher &lt;&lt;&lt;
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xc
r0 0000000c r1 00000000 r2 00000000 r3 00000000
r4 00000000 r5 0000000c r6 eccdd920 r7 00000078
r8 0000461a r9 ffc78c19 sl ab209441 fp fffff924
ip ed01b834 sp eccdd800 lr ecfa9a1f pc ecfd693e cpsr 600e0030
backtrace:
#00 pc 0004793e /system/lib/libc.so (pthread_mutex_lock+1)
#01 pc 0001aa1b /system/lib/libc.so (readdir+10)
#02 pc 00001b91 /system/xbin/crasher (readdir_null+20)
#03 pc 0000184b /system/xbin/crasher (do_action+978)
#04 pc 00001459 /system/xbin/crasher (thread_callback+24)
#05 pc 00047317 /system/lib/libc.so (_ZL15__pthread_startPv+22)
#06 pc 0001a7e5 /system/lib/libc.so (__start_thread+34)
Tombstone written to: /data/tombstones/tombstone_06
</pre>
<p>The last line of <code>debuggerd</code> output dumps a summary to the log and
writes a full <em>tombstone</em> to disk. The tombstone is simply a file with
extra data about the crashed process; it contains information that can be
helpful in debugging a crash, in particular the stack traces for all threads in
the crashing process (not just the thread that caught the signal) and a full
memory map.</p>
<p>Assuming the unstripped binaries can be found, you can get a more detailed
unwind with line number information by pasting the above example into
<code>development/scripts/stack</code>:</p>
<p class="key-point"><strong>Tip:</strong> For convenience, if you've lunched
<code>stack</code> will be on your $PATH already so you don't need to give the
full path.</p>
<pre class="devsite-terminal devsite-click-to-copy">
development/tools/stack
</pre>
<p>Example output:</p>
<pre class="devsite-click-to-copy">
Reading native crash info from stdin
03-02 23:53:49.477 17951 17951 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
03-02 23:53:49.477 17951 17951 F DEBUG : Build fingerprint: 'Android/aosp_angler/angler:7.1.1/NYC/enh12211018:eng/test-keys'
03-02 23:53:49.477 17951 17951 F DEBUG : Revision: '0'
03-02 23:53:49.477 17951 17951 F DEBUG : ABI: 'arm'
03-02 23:53:49.478 17951 17951 F DEBUG : pid: 17946, tid: 17949, name: crasher &gt;&gt;&gt; crasher &lt;&lt;&lt;
03-02 23:53:49.478 17951 17951 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xc
03-02 23:53:49.478 17951 17951 F DEBUG : r0 0000000c r1 00000000 r2 00000000 r3 00000000
03-02 23:53:49.478 17951 17951 F DEBUG : r4 00000000 r5 0000000c r6 eccdd920 r7 00000078
03-02 23:53:49.478 17951 17951 F DEBUG : r8 0000461a r9 ffc78c19 sl ab209441 fp fffff924
03-02 23:53:49.478 17951 17951 F DEBUG : ip ed01b834 sp eccdd800 lr ecfa9a1f pc ecfd693e cpsr 600e0030
03-02 23:53:49.491 17951 17951 F DEBUG :
03-02 23:53:49.491 17951 17951 F DEBUG : backtrace:
03-02 23:53:49.492 17951 17951 F DEBUG : #00 pc 0004793e /system/lib/libc.so (pthread_mutex_lock+1)
03-02 23:53:49.492 17951 17951 F DEBUG : #01 pc 0001aa1b /system/lib/libc.so (readdir+10)
03-02 23:53:49.492 17951 17951 F DEBUG : #02 pc 00001b91 /system/xbin/crasher (readdir_null+20)
03-02 23:53:49.492 17951 17951 F DEBUG : #03 pc 0000184b /system/xbin/crasher (do_action+978)
03-02 23:53:49.492 17951 17951 F DEBUG : #04 pc 00001459 /system/xbin/crasher (thread_callback+24)
03-02 23:53:49.492 17951 17951 F DEBUG : #05 pc 00047317 /system/lib/libc.so (_ZL15__pthread_startPv+22)
03-02 23:53:49.492 17951 17951 F DEBUG : #06 pc 0001a7e5 /system/lib/libc.so (__start_thread+34)
03-02 23:53:49.492 17951 17951 F DEBUG : Tombstone written to: /data/tombstones/tombstone_06
Reading symbols from /huge-ssd/aosp-arm64/out/target/product/angler/symbols
Revision: '0'
pid: 17946, tid: 17949, name: crasher &gt;&gt;&gt; crasher &lt;&lt;&lt;
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xc
r0 0000000c r1 00000000 r2 00000000 r3 00000000
r4 00000000 r5 0000000c r6 eccdd920 r7 00000078
r8 0000461a r9 ffc78c19 sl ab209441 fp fffff924
ip ed01b834 sp eccdd800 lr ecfa9a1f pc ecfd693e cpsr 600e0030
Using arm toolchain from: /huge-ssd/aosp-arm64/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/bin/
Stack Trace:
RELADDR FUNCTION FILE:LINE
0004793e pthread_mutex_lock+2 bionic/libc/bionic/pthread_mutex.cpp:515
v------> ScopedPthreadMutexLocker bionic/libc/private/ScopedPthreadMutexLocker.h:27
0001aa1b readdir+10 bionic/libc/bionic/dirent.cpp:120
00001b91 readdir_null+20 system/core/debuggerd/crasher.cpp:131
0000184b do_action+978 system/core/debuggerd/crasher.cpp:228
00001459 thread_callback+24 system/core/debuggerd/crasher.cpp:90
00047317 __pthread_start(void*)+22 bionic/libc/bionic/pthread_create.cpp:202 (discriminator 1)
0001a7e5 __start_thread+34 bionic/libc/bionic/clone.cpp:46 (discriminator 1)
</pre>
<p class="note"><strong>Note:</strong> Some system libraries are built with
<code>LOCAL_STRIP_MODULE := keep_symbols</code> to provide usable backtraces
directly from <code>debuggerd</code> without taking up anywhere near as much
space as an unstripped version.</p>
<p>You can also <code>stack</code> an entire tombstone. Example:</p>
<pre class="devsite-terminal devsite-click-to-copy">
stack < FS/data/tombstones/tombstone_05</code>
</pre>
<p>This is useful if you've just unzipped a bugreport in the current directory.
For more information about diagnosing native crashes and tombstones, see
<a href="/devices/tech/debug/native-crash.html">Diagnosing Native Crashes</a>.
</p>
<h2 id="tombstone">Getting a stack trace/tombstone from a running process</h2>
<p>You can also use <code>debuggerd</code> on a running process. From the
command line, invoke <code>debuggerd</code> using a process ID (PID) to dump the
full tombstone to <code>stdout</code>. To get just the stack for every thread in
the process, include the <code>-b</code> or <code>--backtrace</code> flag.</p>
</body>
</html>

View file

@ -0,0 +1,127 @@
<html devsite>
<head>
<title>Identifying Capacity-Related Jank</title>
<meta name="project_path" value="/_project.yaml" />
<meta name="book_path" value="/_book.yaml" />
</head>
<body>
<!--
Copyright 2017 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>Capacity is the total amount of some resource (CPU, GPU, etc.) a device
possesses over some amount of time. This page describes how to identify and
address capacity-related jank issues.</p>
<h2 id="governor">Governor slow to react</h2>
<p>To avoid jank, the CPU frequency governor needs to be able to respond quickly
to bursty workloads. Most UI applications follow the same basic pattern:</p>
<ol>
<li>User is reading the screen.</li>
<li>User touches the screen: taps a button, scrolls, etc.</li>
<li>Screen scrolls, changes activity, or animates in some way in response to
input.</li>
<li>System quiesces as new content is displayed.</li>
<li>User goes back to reading the screen.</li>
</ol>
<p>Pixel and Nexus devices implement touch boost to modify CPU frequency
governor (and scheduler) behavior on touch. To avoid a slow ramp to a high clock
frequency (which could cause the device to drop frames on touch), touch boost
usually sets a frequency floor on the CPU to ensure plenty of CPU capacity is
available on touch. A floor lasts for some amount of time after touch (usually
around two seconds).</p>
<p>Pixel also uses the schedtune cgroup provided by Energy Aware Scheduling
(EAS) as an additional touch boost signal: Top applications get additional
weight via schedtune to ensure they get enough CPU capacity to run quickly. The
Nexus 5X and 6P have a much bigger performance gap between the little and big
CPU clusters (A53 and A57, respectively) than the Pixel with the Kryo CPU. We
found that the little CPU cluster was not always adequate for smooth UI
rendering, especially given other sources of jitter on the device.</p>
<p>Accordingly, on the Nexus 5X and 6P, touch boost modifies the scheduler
behavior to make it more likely for foreground applications to move to the big
cores (this is conceptually similar to the floor on CPU frequency). Without the
scheduler change to make foreground applications more likely to move to the big
CPU cluster, foreground applications may have insufficient CPU capacity to
render until the scheduler decided to load balance the thread to a big CPU core.
By changing the scheduler behavior during touch boost, it is more likely for a
UI thread to immediately run on a big core and avoid jank while not forcing it
to always run on a big core, which could have severe impacts on power
consumption.</p>
<h2 id="thermal-throttling">Thermal throttling</h2>
<p>Thermal throttling occurs when the device must reduce its overall thermal
output, usually performed by reducing CPU, GPU, and DRAM clocks. Unsurprisingly,
this often results in jank as the system may no longer be able to provide enough
capacity to render within a given timeslice. The only way to avoid thermal
throttling is to use less power. There are not a lot of ways to do this, but
based on our experiences with past SOCs, we have a few recommendations for
system vendors.</p>
<p>First, when building a new SOC with heterogeneous CPU architectures, ensure
the performance/W curves of the CPU clusters overlap. The overall performance/W
curve for the entire processor should be a continuous line. Discontinuities in
the perf/W curve force the scheduler and frequency governor to guess what a
workload needs; to prevent jank, the scheduler and frequency governor err on
the side of giving the workload more capacity than it requires. This results in
spending too much power, which contributes to thermal throttling.</p>
<p>Imagine a hypothetical SOC with two CPU clusters:</p>
<ul>
<li>Cluster 1, the little cluster, can spend between 100-300mW and scores
100-300 in a throughput benchmark depending on clocks.</li>
<li>Cluster 2, the big cluster, can spend between 1000 and 1600mW and scores
between 800 and 1200 in the same throughput benchmark depending on clocks.</li>
</ul>
<p>In this benchmark, a higher score is faster. While not more desirable than
slower, faster == greater power consumption.</p>
<p>If the scheduler believes a UI workload would require the equivalent of a
score of 310 on that throughput benchmark, its best option to avoid jank is to
run the big cluster at the lowest frequency, wasting significant power. (This
depends on cpuidle behavior and race to idle; SOCs with continuous perf/W curves
are easier to optimize for.)</p>
<p>Second, use cpusets. Ensure you have enabled cpusets in your kernel and in
your <code>BoardConfig.mk</code>. You must also set up the actual cpuset
assignments in your device-specific <code>init.rc</code>. Some vendors leave
this disabled in their BSPs in the hopes they can use other hints to influence
scheduler behavior; we feel this doesn't make sense. cpusets are useful for
ensuring load balancing between CPUs is done in a way that reflects what the
user is actually doing on the device.</p>
<p>ActivityManager assigns apps to different cpusets based on the relative
importance of those apps (top, foreground, background), with more important
applications getting more access to CPU cores. This helps ensure quality of
service for foreground and top apps.</p>
<p>cpusets are useful on homogeneous CPU configurations, but you should not ship
a device with a heterogeneous CPU configuration without cpusets enabled. Nexus
6P is a good model for how to use cpusets on heterogeneous CPU configurations;
use that as a basis for your own device's configuration.</p>
<p>cpusets also offer power advantages by ensuring background threads that are
not performance-critical never get load balanced to big CPU cores, where they
could spend significantly more power without any user-perceived benefit. This
can help to avoid thermal throttling as well. While thermal throttling is a
capacity issue, jitter improvements have an outsize impact on UI performance
when thermal throttling. Because the system will be running closer to its
ability to render 60FPS, it takes less jitter to cause a dropped frame.</p>
</body>
</html>

View file

@ -0,0 +1,430 @@
<html devsite>
<head>
<title>Identifying Jitter-Related Jank</title>
<meta name="project_path" value="/_project.yaml" />
<meta name="book_path" value="/_book.yaml" />
</head>
<body>
<!--
Copyright 2017 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>Jitter is the random system behavior that prevents perceptible work from
running. This page describes how to identify and address jitter-related jank
issues.</p>
<h2 id="application">Application thread scheduler delay</h2>
<p>Scheduler delay is the most obvious symptom of jitter: A process that should
be run is made runnable but does not run for some significant amount of time.
The significance of the delay varies according to the context. For example:</p>
<ul>
<li>A random helper thread in an app can probably be delayed for many
milliseconds without an issue.</li>
<li>An application's UI thread may be able to tolerate 1-2ms of jitter.</li>
<li>Driver kthreads running as SCHED_FIFO may cause issues if they are runnable
for 500us before running.</li>
</ul>
<p>Runnable times can be identified in systrace by the blue bar preceding a
running segment of a thread. A runnable time can also be determined by the
length of time between the <code>sched_wakeup</code> event for a thread and the
<code>sched_switch</code> event that signals the start of thread execution.</p>
<h3 id="long_threads">Threads that run too long</h3>
<p>Application UI threads that are runnable for too long can cause problems.
Lower-level threads with long runnable times generally have different causes,
but attempting to push UI thread runnable time toward zero may require fixing
some of the same issues that cause lower level threads to have long runnable
times. To mitigate delays:</p>
<ol>
<li>Use cpusets as described in
<a href="/devices/tech/debug/jank_capacity.html#thermal-throttling">Thermal
throttling</a>.</li>
<li>Increase the CONFIG_HZ value.<ul>
<li>Historically, the value has been set to 100 on arm and arm64 platforms.
However, this is an accident of history and is not a good value to use for
interactive devices. CONFIG_HZ=100 means that a jiffy is 10ms long, which means
that load balancing between CPUs may take 20ms (two jiffies) to happen. This can
significantly contribute to jank on a loaded system.</li>
<li>Recent devices (Nexus 5X, Nexus 6P, Pixel, and Pixel XL) have shipped with
CONFIG_HZ=300. This should have a negligible power cost while significantly
improving runnable times. If you do see significant increases in power
consumption or performance issues after changing CONFIG_HZ, it's likely one of
your drivers is using a timer based on raw jiffies instead of milliseconds and
converting to jiffies. This is usually an easy fix (see the
<a href="https://android.googlesource.com/kernel/msm.git/+/daf0cdf29244ce4098cff186d16df23cfa782993%5E!/">patch</a>
that fixed kgsl timer issues on Nexus 5X and 6P when converting to
CONFIG_HZ=300).</li>
<li>Finally, we've experimented with CONFIG_HZ=1000 on Nexus/Pixel and found it
offers a noticeable performance and power reduction due to decreased RCU
overhead.</li>
</ul>
</li>
</ol>
<p>With those two changes alone, a device should look much better for UI thread
runnable time under load.</p>
<h3 id="sys.use_fifo_ui">Using sys.use_fifo_ui</h3>
<p>You can try to drive UI thread runnable time to zero by setting the
<code>sys.use_fifo_ui</code> property to 1.</p>
<p class="warning"><strong>Warning</strong>: Do not use this option on
heterogeneous CPU configurations unless you have a capacity-aware RT scheduler.
And, at this moment, <strong>NO CURRENTLY SHIPPING RT SCHEDULER IS
CAPACITY-AWARE</strong>. We're working on one for EAS, but it's not yet
available. The default RT scheduler is based purely on RT priorities and whether
a CPU already has a RT thread of equal or higher priority.
<br><br>As a result, the default RT scheduler will happily move your
relatively-long-running UI thread from a high-frequency big core to a little
core at minimum frequency if a higher priority FIFO kthread happens to wake up
on the same big core. <strong>This will introduce significant performance
regressions</strong>. As this option has not yet been used on a shipping Android
device, if you want to use it get in touch with the Android performance team to
help you validate it.</p>
<p>When <code>sys.use_fifo_ui</code> is enabled, ActivityManager tracks the UI
thread and RenderThread (the two most UI-critical threads) of the top
application and makes those threads SCHED_FIFO instead of SCHED_OTHER. This
effectively eliminates jitter from UI and RenderThreads; the traces we've
gathered with this option enabled show runnable times on the order of
microseconds instead of milliseconds.</p>
<p>However, because the RT load balancer was not capacity-aware, there was a
30% reduction in application startup performance because the UI thread
responsible for starting up the app would be moved from a 2.1Ghz gold Kryo core
to a 1.5GHz silver Kryo core. With a capacity-aware RT load balancer, we see
equivalent performance in bulk operations and a 10-15% reduction in 95th and
99th percentile frame times in many of our UI benchmarks.</p>
<h2 id="interrupt">Interrupt traffic</h2>
<p>Because ARM platforms deliver interrupts to CPU 0 only by default, we
recommend the use of an IRQ balancer (irqbalance or msm_irqbalance on Qualcomm
platforms).</p>
<p>During Pixel development, we saw jank that could be directly attributed to
overwhelming CPU 0 with interrupts. For example, if the <code>mdss_fb0</code>
thread was scheduled on CPU 0, there was a much greater likelihood to jank
because of an interrupt that is triggered by the display almost immediately
before scanout. <code>mdss_fb0</code> would be in the middle of its own work
with a very tight deadline, and then it would lose some time to the MDSS
interrupt handler. Initially, we attempted to fix this by setting the CPU
affinity of the mdss_fb0 thread to CPUs 1-3 to avoid contention with the
interrupt, but then we realized that we had not yet enabled msm_irqbalance. With
msm_irqbalance enabled, jank was noticeably improved even when both mdss_fb0 and
the MDSS interrupt were on the same CPU due to reduced contention from other
interrupts.</p>
<p>This can be identified in systrace by looking at the sched section as well as
the irq section. The sched section shows what has been scheduled, but an
overlapping region in the irq section means an interrupt is running during that
time instead of the normally scheduled process. If you see significant chunks of
time taken during an interrupt, your options include:</p>
<ul>
<li>Make the interrupt handler faster.</li>
<li>Prevent the interrupt from happening in the first place.</li>
<li>Change the frequency of the interrupt to be out of phase with other regular
work that it may be interfering with (if it is a regular interrupt).</li>
<li>Set CPU affinity of the interrupt directly and prevent it from being
balanced.</li>
<li>Set CPU affinity of the thread the interrupt is interfering with to avoid
the interrupt.</li>
<li>Rely on the interrupt balancer to move the interrupt to a less loaded
CPU.</li>
</ul>
<p>Setting CPU affinity is generally not recommended but can be useful for
specific cases. In general, it's too hard to predict the state of the system for
most common interrupts, but if you have a very specific set of conditions that
triggers certain interrupts where the system is more constrained than normal
(such as VR), explicit CPU affinity may be a good solution.</p>
<h2>Long softirqs</h2>
<p>While a softirq is running, it disables preemption. softirqs can also be
triggered at many places within the kernel and can run inside of a user process.
If there's enough softirq activity, user processes will stop running softirqs,
and ksoftirqd wakes to run softirqs and be load balanced. Usually, this is fine.
However, a single very long softirq can wreak havoc on the system.</p>
<section class="expandable">
<h4 class="showalways">Show Issue: Janky headtracking while streaming data over
Wi-Fi</h4>
<p>In this issue, we saw that VR performance was inconsistent under specific
network conditions (Wi-Fi). What we found in the trace was that a single NET_RX
softirq could run for 30+ milliseconds. This was eventually tracked down to
Receive Packet Steering, a Qualcomm Wi-Fi feature that coalesces many NET_RX
softirqs into a single softirq. The resulting softirq could, under the right
conditions, have a very large (potentially unbounded) runtime.</p>
<p>While this feature may have reduced total CPU cycles spent on networking, it
prevented the system from running the right things at the right time. Disabling
the feature didn't impact network throughput or battery life, but it did fix the
VR headtracking jank by allowing ksoftirqd to load balance the softirqs instead
of circumventing it.</p>
</section>
<hr>
<p>softirqs are visible within the irq section of a trace, so they are easy to
spot if the issue can be reproduced while tracing. Because a softirq can run
within a user process, a bad softirq can also manifest as extra runtime inside
of a user process for no obvious reason. If you see that, check the irq section
to see if softirqs are to blame.</p>
<h2 id="drivers">Drivers leaving preemption or IRQs disabled too long</h2>
<p>Disabling preemption or interrupts for too long (tens of milliseconds)
results in jank. Typically, the jank manifests as a thread becoming runnable but
not running on a particular CPU, even if the runnable thread is significantly
higher priority (or SCHED_FIFO) than the other thread.</p>
<p>Some guidelines:</p>
<ul>
<li>If runnable thread is SCHED_FIFO and running thread is SCHED_OTHER, the
running thread has preemption or interrupts disabled.</li>
<li>If runnable thread is significantly higher priority (100) than the running
thread (120), the running thread likely has preemption or interrupts disabled if
the runnable thread doesn't run within two jiffies.</li>
<li>If the runnable thread and the running thread have the same priority, the
running thread likely has preemption or interrupts disabled if the runnable
thread doesn't run within 20ms.</li>
</ul>
<p>Keep in mind that running an interrupt handler prevents you from servicing
other interrupts, which also disables preemption.</p>
<section class="expandable">
<h4 class="showalways">Show Issue: CONFIG_BUS_AUTO_SUSPEND causes significant
jank</h4>
<p>In this issue, we identified a major source of jank in Pixel during bringup.
To follow along with the example, <a href="perf_traces.zip">download the zip
file</a> of traces (which also includes other traces referred to in this
section), upzip the file, and open the trace_30293222.html file in your
browser.</p>
<p>In the trace, locate the SurfaceFlinger EventThread beginning
at 2235.195 ms. When performing bouncing ball tuning, we often saw one dropped
frame when either SurfaceFlinger or a UI-critical thread ran after being
runnable for 6.6ms (two jiffies at CONFIG_HZ=300). Critical SurfaceFlinger
threads and the app's UI thread were SCHED_FIFO at the time.</p>
<p>According to the trace, the thread would wake on a particular CPU, remain
runnable for two jiffies, and get load balanced to a different CPU, where it
would then run. The thread that was running while the UI thread and
SurfaceFlinger were runnable was always a priority 120 kworker in
pm_runtime_work.</p>
<p>While digging through the kernel to see what pm_runtime_work was actually
doing, we discovered the Wi-Fi driver's power management was handled through
pm_runtime_work. We took additional traces with Wi-Fi disabled and the jank
disappeared. To double-check, we also disabled the Wi-Fi driver's power
management in the kernel and took more traces with Wi-Fi connected, and the jank
was still gone. Qualcomm was then able to find the offending region with
preemption disabled and fix it, and we were able to reenable that option for
launch.</p>
</section>
<hr>
<p>Another option for identifying offending regions is with the preemptirqsoff
tracer (see <a href="/devices/tech/debug/ftrace.html#dftrace">Using
dynamic ftrace</a>). This tracer can give a much greater insight into the root
cause of an uninterruptible region (such as function names), but requires more
invasive work to enable. While it may have more of a performance impact, it is
definitely worth trying.</p>
<h2 id="workqueues">Incorrect use of workqueues</h2>
<p>Interrupt handlers often need to do work that can run outside of an interrupt
context, enabling work to be farmed out to different threads in the kernel. A
driver developer may notice the kernel has a very convenient system-wide
asynchronous task functionality called <em>workqueues</em> and might use that
for interrupt-related work.</p>
<p>However, workqueues are almost always the wrong answer for this problem
because they are always SCHED_OTHER. Many hardware interrupts are in the
critical path of performance and must be run immediately. Workqueues have no
guarantees about when they will be run. Every time we've seen a workqueue in the
critical path of performance, it's been a source of sporadic jank, regardless of
the device. On Pixel, with a flagship processor, we saw that a single workqueue
could be delayed up to 7ms if the device was under load depending on scheduler
behavior and other things running on the system.</p>
<p>Instead of a workqueue, drivers that need to handle interrupt-like work
inside a separate thread should create their own SCHED_FIFO kthread. For help
doing this with kthread_work functions, refer to this
<a href="https://android.googlesource.com/kernel/msm/+/1a7a93bd33f48a369de29f6f2b56251127bf6ab4%5E!/">patch</a>.</p>
<h2 id="framework-lock">Framework lock contention</h2>
<p>Framework lock contention can be a source of jank or other performance
issues. It's usually caused by the ActivityManagerService lock but can be seen
in other locks as well. For example, the PowerManagerService lock can impact
screen on performance. If you're seeing this on your device, there's no good fix
because it can only be improved via architectural improvements to the framework.
However, if you are modifying code that runs inside of system_server, it's
critical to avoid holding locks for a long time, especially the
ActivityManagerService lock.</p>
<h2 id="binder-lock">Binder lock contention</h2>
<p>Historically, binder has had a single global lock. If the thread running a
binder transaction was preempted while holding the lock, no other thread can
perform a binder transaction until the original thread has released the lock.
This is bad; binder contention can block everything in the system, including
sending UI updates to the display (UI threads communicate with SurfaceFlinger
via binder).</p>
<p>Android 6.0 included several patches to improve this behavior by disabling
preemption while holding the binder lock. This was safe only because the binder
lock should be held for a few microseconds of actual runtime. This dramatically
improved performance in uncontended situations and prevented contention by
preventing most scheduler switches while the binder lock was held. However,
preemption could not be disabled for the entire runtime of holding the binder
lock, meaning that preemption was enabled for functions that could sleep (such
as copy_from_user), which could cause the same preemption as the original case.
When we sent the patches upstream, they promptly told us that this was the worst
idea in history. (We agreed with them, but we also couldn't argue with the
efficacy of the patches toward preventing jank.)</p>
<h2 id="fd-contention">fd contention within a process</h2>
<p>This is rare. Your jank is probably not caused by this.</p>
<p>That said, if you have multiple threads within a process writing the same fd,
it is possible to see contention on this fd, however the only time we saw this
during Pixel bringup is during a test where low-priority threads attempted to
occupy all CPU time while a single high-priority thread was running within the
same process. All threads were writing to the trace marker fd and the
high-priority thread could get blocked on the trace marker fd if a low-priority
thread was holding the fd lock and was then preempted. When tracing was disabled
from the low-priority threads, there was no performance issue.</p>
<p>We weren't able to reproduce this in any other situation, but it's worth
pointing out as a potential cause of performance issues while tracing.</p>
<h2 id="cpu-idle">Unnecessary CPU idle transitions</h2>
<p>When dealing with IPC, especially multi-process pipelines, it's common to see
variations on the following runtime behavior:</p>
<ol>
<li>Thread A runs on CPU 1.</li>
<li>Thread A wakes up thread B.</li>
<li>Thread B starts running on CPU 2.</li>
<li>Thread A immediately goes to sleep, to be awakened by thread B when thread B
has finished its current work.</li>
</ol>
<p>A common source of overhead is between steps 2 and 3. If CPU 2 is idle, it
must be brought back to an active state before thread B can run. Depending on
the SOC and how deep the idle is, this could be tens of microseconds before
thread B begins running. If the actual runtime of each side of the IPC is close
enough to the overhead, the overall performance of that pipeline can be
significantly reduced by CPU idle transitions. The most common place for Android
to hit this is around binder transactions, and many services that use binder end
up looking like the situation described above.</p>
<p>First, use the <code>wake_up_interruptible_sync()</code> function in your
kernel drivers and support this from any custom scheduler. Treat this as a
requirement, not a hint. Binder uses this today, and it helps a lot with
synchronous binder transactions avoiding unnecessary CPU idle transitions.</p>
<p>Second, ensure your cpuidle transition times are realistic and the cpuidle
governor is taking these into account correctly. If your SOC is thrashing in and
out of your deepest idle state, you won't save power by going to deepest
idle.</p>
<h2 id="logging">Logging</h2>
<p>Logging is not free for CPU cycles or memory, so don't spam the log buffer.
Logging costs cycles in your application (directly) and in the log daemon.
Remove any debugging logs before shipping your device.</p>
<h2 id="io-issues">I/O issues</h2>
<p>I/O operations are common sources of jitter. If a thread accesses a
memory-mapped file and the page is not in the page cache, it faults and reads
the page from disk. This blocks the thread (usually for 10+ ms) and if it
happens in the critical path of UI rendering, can result in jank. There are too
many causes of I/O operations to discuss here, but check the following locations
when trying to improve I/O behavior:</p>
<ul>
<li><strong>PinnerService</strong>. Added in Android 7.0, PinnerService enables
the framework to lock some files in the page cache. This removes the memory for
use by any other process, but if there are some files that are known a priori to
be used regularly, it can be effective to mlock those files.<br><br>
On Pixel and Nexus 6P devices running Android 7.0, we mlocked four files:
<ul>
<li>/system/framework/arm64/boot-framework.oat</li>
<li>/system/framework/oat/arm64/services.odex</li>
<li>/system/framework/arm64/boot.oat</li>
<li>/system/framework/arm64/boot-core-libart.oat</li>
</ul>
These files are constantly in use by most applications and system_server, so
they should not be paged out. In particular, we've found that if any of those
are paged out, they will be paged back in and cause jank when switching from a
heavyweight application.
</li>
<li><strong>Encryption</strong>. Another possible cause of I/O problems. We find
inline encryption offers the best performance when compared to CPU-based
encryption or using a hardware block accessible via DMA. Most importantly,
inline encryption reduces the jitter associated with I/O, especially when
compared to CPU-based encryption. Because fetches to the page cache are often in
the critical path of UI rendering, CPU-based encryption introduces additional
CPU load in the critical path, which adds more jitter than just the I/O fetch.
<br><br>
DMA-based hardware encryption engines have a similar problem, since the kernel
has to spend cycles managing that work even if other critical work is available
to run. We strongly recommend any SOC vendor building new hardware to include
support for inline encryption.</li>
</ul>
<h2 id="small-task">Aggressive small task packing</h2>
<p>Some schedulers offer support for packing small tasks onto single CPU cores
to try to reduce power consumption by keeping more CPUs idle for longer. While
this works well for throughput and power consumption, it can be
<em>catastrophic</em> to latency. There are several short-running threads in the
critical path of UI rendering that can be considered small; if these threads are
delayed as they are slowly migrated to other CPUs, it <strong>will</strong>
cause jank. We recommend using small task packing very conservatively.</p>
<h2 id="page-cache">Page cache thrashing</h2>
<p>A device without enough free memory may suddenly become extremely sluggish
while performing a long-running operation, such as opening a new application. A
trace of the application may reveal it is consistently blocked in I/O during a
particular run even when it often is not blocked in I/O. This is usually a sign
of page cache thrashing, especially on devices with less memory.</p>
<p>One way to identify this is to take a systrace using the pagecache tag and
feed that trace to the script at
<code>system/extras/pagecache/pagecache.py</code>. pagecache.py translates
individual requests to map files into the page cache into aggregate per-file
statistics. If you find that more bytes of a file have been read than the total
size of that file on disk, you are definitely hitting page cache thrashing.</p>
<p>What this means is that the working set required by your workload (typically
a single application plus system_server) is greater than the amount of memory
available to the page cache on your device. As a result, as one part of the
workload gets the data it needs in the page cache, another part that will be
used in the near future will be evicted and will have to be fetched again,
causing the problem to occur again until the load has completed. This is the
fundamental cause of performance problems when not enough memory is available
on a device.</p>
<p>There's no foolproof way to fix page cache thrashing, but there are a few
ways to try to improve this on a given device.</p>
<ul>
<li>Use less memory in persistent processes. The less memory used by persistent
processes, the more memory available to applications and the page cache.</li>
<li>Audit the carveouts you have for your device to ensure you are not
unnecessarily removing memory from the OS. We've seen situations where carveouts
used for debugging were accidentally left in shipping kernel configurations,
consuming tens of megabytes of memory. This can make the difference between
hitting page cache thrashing and not, especially on devices with less
memory.</li>
<li>If you're seeing page cache thrashing in system_server on critical files,
consider pinning those files. This will increase memory pressure elsewhere, but
it may modify behavior enough to avoid thrashing.</li>
<li>Retune lowmemorykiller to try to keep more memory free. lowmemorykiller's
thresholds are based on both absolute free memory and the page cache, so
increasing the threshold at which processes at a given oom_adj level are killed
may result in better behavior at the expense of increased background app
death.</li>
<li>Try using ZRAM. We use ZRAM on Pixel, even though Pixel has 4GB, because it
could help with rarely used dirty pages.</li>
</ul>
</body>
</html>

View file

@ -0,0 +1,567 @@
<html devsite>
<head>
<title>Diagnosing Native Crashes</title>
<meta name="project_path" value="/_project.yaml" />
<meta name="book_path" value="/_book.yaml" />
</head>
<body>
<!--
Copyright 2017 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>
If you've never seen a native crash before, start with
<a href="/devices/tech/debug/index.html">Debugging Native Android
Platform Code</a>.
</p>
<h2 id=crashtypes>Types of native crash</h2>
<p>
The sections below detail the most common kinds of native crash. Each includes
an example chunk of <code>debuggerd</code> output, with the key evidence that helps you
distinguish that specific kind of crash highlighted in orange italic text.
</p>
<h3 id=abort>Abort</h3>
<p>
Aborts are interesting because they're deliberate. There are many different ways
to abort (including calling <code><a
href="http://man7.org/linux/man-pages/man3/abort.3.html">abort(3)</a></code>,
failing an <code><a
href="http://man7.org/linux/man-pages/man3/assert.3.html">assert(3)</a></code>,
using one of the Android-specific fatal logging types), but they all involve
calling <code>abort</code>. A call to <code>abort</code> basically signals the
calling thread with SIGABRT, so a frame showing "abort" in <code>libc.so</code>
plus SIGABRT are the things to look for in the <code>debuggerd</code> output to
recognize this case.</p>
<p>
As mentioned above, there may be an explicit "abort message" line. But you
should also look in the <code>logcat</code> output to see what this thread logged before
deliberately killing itself, because the basic abort primitive doesn't accept a
message.
</p>
<p>
Older versions of Android (especially on 32-bit ARM) followed a convoluted path
between the original abort call (frame 4 here) and the actual sending of the
signal (frame 0 here):
</p>
<pre class="devsite-click-to-copy">
pid: 1656, tid: 1656, name: crasher >>> crasher <<<
signal 6 (<i style="color:Orange">SIGABRT</i>), code -6 (SI_TKILL), fault addr --------
<i style="color:Orange">Abort message</i>: 'some_file.c:123: some_function: assertion "false" failed'
r0 00000000 r1 00000678 r2 00000006 r3 f70b6dc8
r4 f70b6dd0 r5 f70b6d80 r6 00000002 r7 0000010c
r8 ffffffed r9 00000000 sl 00000000 fp ff96ae1c
ip 00000006 sp ff96ad18 lr f700ced5 pc f700dc98 cpsr 400b0010
backtrace:
#00 pc 00042c98 /system/lib/libc.so (tgkill+12)
#01 pc 00041ed1 /system/lib/libc.so (pthread_kill+32)
#02 pc 0001bb87 /system/lib/libc.so (raise+10)
#03 pc 00018cad /system/lib/libc.so (__libc_android_abort+34)
#04 pc 000168e8 /system/lib/<i style="color:Orange">libc.so</i> (<i style="color:Orange">abort</i>+4)
#05 pc 0001a78f /system/lib/libc.so (__libc_fatal+16)
#06 pc 00018d35 /system/lib/libc.so (__assert2+20)
#07 pc 00000f21 /system/xbin/crasher
#08 pc 00016795 /system/lib/libc.so (__libc_init+44)
#09 pc 00000abc /system/xbin/crasher
</pre>
<p>
More recent versions call <code><a
href="http://man7.org/linux/man-pages/man2/tgkill.2.html">tgkill(2)</a></code>
directly from <code>abort</code>, so there are fewer stack frames for you to
skip over before you get to the interesting frames:</p>
<pre class="devsite-click-to-copy">
pid: 25301, tid: 25301, name: crasher >>> crasher <<<
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
r0 00000000 r1 000062d5 r2 00000006 r3 00000008
r4 ffa09dd8 r5 000062d5 r6 000062d5 r7 0000010c
r8 00000000 r9 00000000 sl 00000000 fp ffa09f0c
ip 00000000 sp ffa09dc8 lr eac63ce3 pc eac93f0c cpsr 000d0010
backtrace:
#00 pc 00049f0c /system/lib/libc.so (tgkill+12)
#01 pc 00019cdf /system/lib/libc.so (abort+50)
#02 pc 000012db /system/xbin/crasher (maybe_abort+26)
#03 pc 000015b7 /system/xbin/crasher (do_action+414)
#04 pc 000020d5 /system/xbin/crasher (main+100)
#05 pc 000177a1 /system/lib/libc.so (__libc_init+48)
#06 pc 000010e4 /system/xbin/crasher (_start+96)
</pre>
<p>
You can reproduce an instance of this type of crash using: <code>crasher
abort</code>
</p>
<h3 id=nullpointer>Pure null pointer dereference</h3>
<p>
This is the classic native crash, and although it's just a special case of the
next crash type, it's worth mentioning separately because it usually requires
the least thought.
</p>
<p>
In the example below, even though the crashing function is in
<code>libc.so</code>, because the string functions just operate on the pointers
they're given, you can infer that <code><a
href="http://man7.org/linux/man-pages/man3/strlen.3.html">strlen(3)</a></code>
was called with a null pointer; and this crash should go straight to the author
of the calling code. In this case, frame #01 is the bad caller.
</p>
<pre class="devsite-click-to-copy">
pid: 25326, tid: 25326, name: crasher >>> crasher <<<
signal 11 (<i style="color:Orange">SIGSEGV</i>), code 1 (SEGV_MAPERR), <i style="color:Orange">fault addr 0x0</i>
r0 00000000 r1 00000000 r2 00004c00 r3 00000000
r4 ab088071 r5 fff92b34 r6 00000002 r7 fff92b40
r8 00000000 r9 00000000 sl 00000000 fp fff92b2c
ip ab08cfc4 sp fff92a08 lr ab087a93 pc efb78988 cpsr 600d0030
backtrace:
#00 pc 00019988 /system/lib/libc.so (strlen+71)
#01 pc 00001a8f /system/xbin/crasher (strlen_null+22)
#02 pc 000017cd /system/xbin/crasher (do_action+948)
#03 pc 000020d5 /system/xbin/crasher (main+100)
#04 pc 000177a1 /system/lib/libc.so (__libc_init+48)
#05 pc 000010e4 /system/xbin/crasher (_start+96)
</pre>
<p>
You can reproduce an instance of this type of crash using: <code>crasher
strlen-NULL</code>
</p>
<h3 id=lowaddress>Low-address null pointer dereference</h3>
<p>
In many cases the fault address won't be 0, but some other low number. Two- or
three-digit addresses in particular are very common, whereas a six-digit address
is almost certainly not a null pointer dereference&#8212that would require a 1MiB
offset. This usually occurs when you have code that dereferences a null pointer
as if it was a valid struct. Common functions are <code><a
href="http://man7.org/linux/man-pages/man3/fprintf.3.html">fprintf(3)</a></code>
(or any other function taking a FILE*) and <code><a
href="http://man7.org/linux/man-pages/man3/readdir.3.html">readdir(3)</a></code>,
because code often fails to check that the <code><a
href="http://man7.org/linux/man-pages/man3/fopen.3.html">fopen(3)</a></code> or
<code><a
href="http://man7.org/linux/man-pages/man3/opendir.3.html">opendir(3)</a></code>
call actually succeeded first.
</p>
<p>
Here's an example of <code>readdir</code>:
</p>
<pre class="devsite-click-to-copy">
pid: 25405, tid: 25405, name: crasher >>> crasher <<<
signal 11 (<i style="color:Orange">SIGSEGV</i>), code 1 (SEGV_MAPERR), <i style="color:Orange">fault addr 0xc</i>
r0 0000000c r1 00000000 r2 00000000 r3 3d5f0000
r4 00000000 r5 0000000c r6 00000002 r7 ff8618f0
r8 00000000 r9 00000000 sl 00000000 fp ff8618dc
ip edaa6834 sp ff8617a8 lr eda34a1f pc eda618f6 cpsr 600d0030
backtrace:
#00 pc 000478f6 /system/lib/libc.so (pthread_mutex_lock+1)
#01 pc 0001aa1b /system/lib/libc.so (readdir+10)
#02 pc 00001b35 /system/xbin/crasher (readdir_null+20)
#03 pc 00001815 /system/xbin/crasher (do_action+976)
#04 pc 000021e5 /system/xbin/crasher (main+100)
#05 pc 000177a1 /system/lib/libc.so (__libc_init+48)
#06 pc 00001110 /system/xbin/crasher (_start+96)
</pre>
<p>
Here the direct cause of the crash is that <code><a
href="http://man7.org/linux/man-pages/man3/pthread_mutex_lock.3p.html">pthread_mutex_lock(3)</a></code>
has tried to access address 0xc (frame 0). But the first thing
<code>pthread_mutex_lock</code> does is dereference the <code>state</code>
element of the <code>pthread_mutex_t*</code> it was given. If you look at the
source, you can see that element is at offset 0 in the struct, which tells you
that <code>pthread_mutex_lock</code> was given the invalid pointer 0xc. From the
frame 1 you can see that it was given that pointer by <code>readdir</code>,
which extracts the <code>mutex_</code> field from the <code>DIR*</code> it's
given. Looking at that structure, you can see that <code>mutex_</code> is at
offset <code>sizeof(int) + sizeof(size_t) + sizeof(dirent*)</code> into
<code>struct DIR</code>, which on a 32-bit device is 4 + 4 + 4 = 12 = 0xc, so
you found the bug: <code>readdir</code> was passed a null pointer by the caller.
At this point you can paste the stack into the stack tool to find out
<em>where</em> in logcat this happened.</p>
<pre class="prettyprint">
struct DIR {
int fd_;
size_t available_bytes_;
dirent* next_;
pthread_mutex_t mutex_;
dirent buff_[15];
long current_pos_;
};
</pre>
<p>
In most cases you can actually skip this analysis. A sufficiently low fault
address usually means you can just skip any <code>libc.so</code> frames in the
stack and directly accuse the calling code. But not always, and this is how you
would present a compelling case.
</p>
<p>
You can reproduce instances of this kind of crash using: <code>crasher
fprintf-NULL</code> or <code>crasher readdir-NULL</code>
</p>
<h3 id=fortify>FORTIFY failure</h3>
<p>
A FORTIFY failure is a special case of an abort that occurs when the C library
detects a problem that might lead to a security vulnerability. Many C library
functions are <em>fortified</em>; they take an extra argument that tells them how large
a buffer actually is and check at run time whether the operation you're trying
to perform actually fits. Here's an example where the code tries to
<code>read(fd, buf, 32)</code> into a buffer that's actually only 10 bytes
long...
</p>
<pre class="devsite-click-to-copy">
pid: 25579, tid: 25579, name: crasher >>> crasher <<<
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
Abort message: '<i style="color:Orange">FORTIFY: read: prevented 32-byte write into 10-byte buffer'</i>
r0 00000000 r1 000063eb r2 00000006 r3 00000008
r4 ff96f350 r5 000063eb r6 000063eb r7 0000010c
r8 00000000 r9 00000000 sl 00000000 fp ff96f49c
ip 00000000 sp ff96f340 lr ee83ece3 pc ee86ef0c cpsr 000d0010
backtrace:
#00 pc 00049f0c /system/lib/libc.so (tgkill+12)
#01 pc 00019cdf /system/lib/libc.so (abort+50)
#02 pc 0001e197 /system/lib/libc.so (<i style="color:Orange">__fortify_fatal</i>+30)
#03 pc 0001baf9 /system/lib/libc.so (__read_chk+48)
#04 pc 0000165b /system/xbin/crasher (do_action+534)
#05 pc 000021e5 /system/xbin/crasher (main+100)
#06 pc 000177a1 /system/lib/libc.so (__libc_init+48)
#07 pc 00001110 /system/xbin/crasher (_start+96)
</pre>
<p>
You can reproduce an instance of this type of crash using: <code>crasher
fortify</code>
</p>
<h3 id=stackcorruption>Stack corruption detected by -fstack-protector</h3>
<p>
The compiler's <code>-fstack-protector</code> option inserts checks into
functions with on-stack buffers to guard against buffer overruns. This option is
on by default for platform code but not for apps. When this option is enabled,
the compiler adds instructions to the <a
href="https://en.wikipedia.org/wiki/Function_prologue">function prologue</a> to
write a random value just past the last local on the stack and to the function
epilogue to read it back and check that it's not changed. If that value changed,
it was overwritten by a buffer overrun, so the epilogue calls
<code>__stack_chk_fail</code> to log a message and abort.
</p>
<pre class="devsite-click-to-copy">
pid: 26717, tid: 26717, name: crasher >>> crasher <<<
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
<i style="color:Orange">Abort message: 'stack corruption detected'</i>
r0 00000000 r1 0000685d r2 00000006 r3 00000008
r4 ffd516d8 r5 0000685d r6 0000685d r7 0000010c
r8 00000000 r9 00000000 sl 00000000 fp ffd518bc
ip 00000000 sp ffd516c8 lr ee63ece3 pc ee66ef0c cpsr 000e0010
backtrace:
#00 pc 00049f0c /system/lib/libc.so (tgkill+12)
#01 pc 00019cdf /system/lib/libc.so (abort+50)
#02 pc 0001e07d /system/lib/libc.so (__libc_fatal+24)
#03 pc 0004863f /system/lib/libc.so (<i style="color:Orange">__stack_chk_fail</i>+6)
#04 pc 000013ed /system/xbin/crasher (smash_stack+76)
#05 pc 00001591 /system/xbin/crasher (do_action+280)
#06 pc 00002219 /system/xbin/crasher (main+100)
#07 pc 000177a1 /system/lib/libc.so (__libc_init+48)
#08 pc 00001144 /system/xbin/crasher (_start+96)
</pre>
<p>
You can distinguish this from other kinds of abort by the presence of
<code>__stack_chk_fail</code> in the backtrace and the specific abort message.
</p>
<p>
You can reproduce an instance of this type of crash using: <code>crasher
smash-stack</code>
</p>
<h2 id=crashdump>Crash dumps</h2>
<p>If you don't have a specific crash that you're investigating right now,
the platform source includes a tool for testing <code>debuggerd</code> called crasher. If
you <code>mm</code> in <code>system/core/debuggerd/</code> you'll get both a <code>crasher</code>
and a <code>crasher64</code> on your path (the latter allowing you to test
64-bit crashes). Crasher can crash in a large number of interesting ways based
on the command line arguments you provide. Use <code>crasher --help</code>
to see the currently supported selection.</p>
<p>To introduce the different pieces in a crash dump, let's work through this example crash dump:</p>
<pre class="devsite-click-to-copy">
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'Android/aosp_flounder/flounder:5.1.51/AOSP/enh08201009:eng/test-keys'
Revision: '0'
ABI: 'arm'
pid: 1656, tid: 1656, name: crasher &gt;&gt;&gt; crasher &lt;&lt;&lt;
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
Abort message: 'some_file.c:123: some_function: assertion "false" failed'
r0 00000000 r1 00000678 r2 00000006 r3 f70b6dc8
r4 f70b6dd0 r5 f70b6d80 r6 00000002 r7 0000010c
r8 ffffffed r9 00000000 sl 00000000 fp ff96ae1c
ip 00000006 sp ff96ad18 lr f700ced5 pc f700dc98 cpsr 400b0010
backtrace:
#00 pc 00042c98 /system/lib/libc.so (tgkill+12)
#01 pc 00041ed1 /system/lib/libc.so (pthread_kill+32)
#02 pc 0001bb87 /system/lib/libc.so (raise+10)
#03 pc 00018cad /system/lib/libc.so (__libc_android_abort+34)
#04 pc 000168e8 /system/lib/libc.so (abort+4)
#05 pc 0001a78f /system/lib/libc.so (__libc_fatal+16)
#06 pc 00018d35 /system/lib/libc.so (__assert2+20)
#07 pc 00000f21 /system/xbin/crasher
#08 pc 00016795 /system/lib/libc.so (__libc_init+44)
#09 pc 00000abc /system/xbin/crasher
Tombstone written to: /data/tombstones/tombstone_06
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
</pre>
<p>The line of asterisks with spaces is helpful if you're searching a log
for native crashes. The string "*** ***" rarely shows up in logs other than
at the beginning of a native crash.</p>
<pre class="devsite-click-to-copy">
Build fingerprint:
'Android/aosp_flounder/flounder:5.1.51/AOSP/enh08201009:eng/test-keys'
</pre>
<p>The fingerprint lets you identify exactly which build the crash occurred
on. This is exactly the same as the <code>ro.build.fingerprint</code> system property.</p>
<pre class="devsite-click-to-copy">
Revision: '0'
</pre>
<p>The revision refers to the hardware rather than the software. This is
usually unused but can be useful to help you automatically ignore bugs known
to be caused by bad hardware. This is exactly the same as the <code>ro.revision</code>
system property.</p>
<pre class="devsite-click-to-copy">
ABI: 'arm'
</pre>
<p>The ABI is one of arm, arm64, mips, mips64, x86, or x86-64. This is
mostly useful for the <code>stack</code> script mentioned above, so that it knows
what toolchain to use.</p>
<pre class="devsite-click-to-copy">
pid: 1656, tid: 1656, name: crasher &gt;&gt;&gt; crasher &lt;&lt;&lt;
</pre>
<p>This line identifies the specific thread in the process that crashed. In
this case, it was the process' main thread, so the process ID and thread
ID match. The first name is the thread name, and the name surrounded by
&gt;&gt;&gt; and &lt;&lt;&lt; is the process name. For an app, the process name
is typically the fully-qualified package name (such as com.facebook.katana),
which is useful when filing bugs or trying to find the app in Google Play. The
pid and tid can also be useful in finding the relevant log lines preceding
the crash.</p>
<pre class="devsite-click-to-copy">
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
</pre>
<p>This line tells you which signal (SIGABRT) was received, and more about
how it was received (SI_TKILL). The signals reported by <code>debuggerd</code> are SIGABRT,
SIGBUS, SIGFPE, SIGILL, SIGSEGV, and SIGTRAP. The signal-specific codes vary
based on the specific signal.</p>
<pre class="devsite-click-to-copy">
Abort message: 'some_file.c:123: some_function: assertion "false" failed'
</pre>
<p>Not all crashes will have an abort message line, but aborts will. This
is automatically gathered from the last line of fatal logcat output for
this pid/tid, and in the case of a deliberate abort is likely to give an
explanation of why the program killed itself.</p>
<pre class="devsite-click-to-copy">
r0 00000000 r1 00000678 r2 00000006 r3 f70b6dc8
r4 f70b6dd0 r5 f70b6d80 r6 00000002 r7 0000010c
r8 ffffffed r9 00000000 sl 00000000 fp ff96ae1c
ip 00000006 sp ff96ad18 lr f700ced5 pc f700dc98 cpsr 400b0010
</pre>
<p>The register dump shows the content of the CPU registers at the time the
signal was received. (This section varies wildly between ABIs.) How useful
these are will depend on the exact crash.</p>
<pre class="devsite-click-to-copy">
backtrace:
#00 pc 00042c98 /system/lib/libc.so (tgkill+12)
#01 pc 00041ed1 /system/lib/libc.so (pthread_kill+32)
#02 pc 0001bb87 /system/lib/libc.so (raise+10)
#03 pc 00018cad /system/lib/libc.so (__libc_android_abort+34)
#04 pc 000168e8 /system/lib/libc.so (abort+4)
#05 pc 0001a78f /system/lib/libc.so (__libc_fatal+16)
#06 pc 00018d35 /system/lib/libc.so (__assert2+20)
#07 pc 00000f21 /system/xbin/crasher
#08 pc 00016795 /system/lib/libc.so (__libc_init+44)
#09 pc 00000abc /system/xbin/crasher
</pre>
<p>The backtrace shows you where in the code we were at the time of
crash. The first column is the frame number (matching gdb's style where
the deepest frame is 0). The PC values are relative to the location of the
shared library rather than absolute addresses. The next column is the name
of the mapped region (which is usually a shared library or executable, but
might not be for, say, JIT-compiled code). Finally, if symbols are available,
the symbol that the PC value corresponds to is shown, along with the offset
into that symbol in bytes. You can use this in conjunction with <code>objdump(1)</code>
to find the corresponding assembler instruction.</p>
<h2 id=tombstones>Tombstones</h2>
<pre class="devsite-click-to-copy">
Tombstone written to: /data/tombstones/tombstone_06
</pre>
<p>This tells you where <code>debuggerd</code> wrote extra information.
<code>debuggerd</code> will keep up to 10 tombstones, cycling through
the numbers 00 to 09 and overwriting existing tombstones as necessary.</p>
<p>The tombstone contains the same information as the crash dump, plus a
few extras. For example, it includes backtraces for <i>all</i> threads (not
just the crashing thread), the floating point registers, raw stack dumps,
and memory dumps around the addresses in registers. Most usefully it also
includes a full memory map (similar to <code>/proc/<i>pid</i>/maps</code>). Here's an
annotated example from a 32-bit ARM process crash:</p>
<pre class="devsite-click-to-copy">
memory map: (fault address prefixed with ---&gt;)
---&gt;ab15f000-ab162fff r-x 0 4000 /system/xbin/crasher (BuildId:
b9527db01b5cf8f5402f899f64b9b121)
</pre>
<p>There are two things to note here. The first is that this line is prefixed
with "---&gt;". The maps are most useful when your crash isn't just a null
pointer dereference. If the fault address is small, it's probably some variant
of a null pointer dereference. Otherwise looking at the maps around the fault
address can often give you a clue as to what happened. Some possible issues
that can be recognized by looking at the maps include:</p>
<ul>
<li>Reads/writes past the end of a block of memory.</li>
<li>Reads/writes before the beginning of a block of memory.</li>
<li>Attempts to execute non-code.</li>
<li>Running off the end of a stack.</li>
<li>Attempts to write to code (as in the example above).</li>
</ul>
<p>The second thing to note is that executables and shared libraries files
will show the BuildId (if present) in Android M and later, so you can see
exactly which version of your code crashed. (Platform binaries include a
BuildId by default since Android M. NDK r12 and later automatically pass
<code>-Wl,--build-id</code> to the linker too.)</p>
<pre class="devsite-click-to-copy">
ab163000-ab163fff r-- 3000 1000 /system/xbin/crasher
ab164000-ab164fff rw- 0 1000
f6c80000-f6d7ffff rw- 0 100000 [anon:libc_malloc]
</pre>
<p>On Android the heap isn't necessarily a single region. Heap regions will
be labeled <code>[anon:libc_malloc]</code>.</p>
<pre class="devsite-click-to-copy">
f6d82000-f6da1fff r-- 0 20000 /dev/__properties__/u:object_r:logd_prop:s0
f6da2000-f6dc1fff r-- 0 20000 /dev/__properties__/u:object_r:default_prop:s0
f6dc2000-f6de1fff r-- 0 20000 /dev/__properties__/u:object_r:logd_prop:s0
f6de2000-f6de5fff r-x 0 4000 /system/lib/libnetd_client.so (BuildId: 08020aa06ed48cf9f6971861abf06c9d)
f6de6000-f6de6fff r-- 3000 1000 /system/lib/libnetd_client.so
f6de7000-f6de7fff rw- 4000 1000 /system/lib/libnetd_client.so
f6dec000-f6e74fff r-x 0 89000 /system/lib/libc++.so (BuildId: 8f1f2be4b37d7067d366543fafececa2) (load base 0x2000)
f6e75000-f6e75fff --- 0 1000
f6e76000-f6e79fff r-- 89000 4000 /system/lib/libc++.so
f6e7a000-f6e7afff rw- 8d000 1000 /system/lib/libc++.so
f6e7b000-f6e7bfff rw- 0 1000 [anon:.bss]
f6e7c000-f6efdfff r-x 0 82000 /system/lib/libc.so (BuildId: d189b369d1aafe11feb7014d411bb9c3)
f6efe000-f6f01fff r-- 81000 4000 /system/lib/libc.so
f6f02000-f6f03fff rw- 85000 2000 /system/lib/libc.so
f6f04000-f6f04fff rw- 0 1000 [anon:.bss]
f6f05000-f6f05fff r-- 0 1000 [anon:.bss]
f6f06000-f6f0bfff rw- 0 6000 [anon:.bss]
f6f0c000-f6f21fff r-x 0 16000 /system/lib/libcutils.so (BuildId: d6d68a419dadd645ca852cd339f89741)
f6f22000-f6f22fff r-- 15000 1000 /system/lib/libcutils.so
f6f23000-f6f23fff rw- 16000 1000 /system/lib/libcutils.so
f6f24000-f6f31fff r-x 0 e000 /system/lib/liblog.so (BuildId: e4d30918d1b1028a1ba23d2ab72536fc)
f6f32000-f6f32fff r-- d000 1000 /system/lib/liblog.so
f6f33000-f6f33fff rw- e000 1000 /system/lib/liblog.so
</pre>
<p>Typically a shared library will have three adjacent entries. One will be
readable and executable (code), one will be read-only (read-only
data), and one will be read-write (mutable data). The first column
shows the address ranges for the mapping, the second column the permissions
(in the usual Unix <code>ls(1)</code> style), the third column the offset into the file
(in hex), the fourth column the size of the region (in hex), and the fifth
column the file (or other region name).</p>
<pre class="devsite-click-to-copy">
f6f34000-f6f53fff r-x 0 20000 /system/lib/libm.so (BuildId: 76ba45dcd9247e60227200976a02c69b)
f6f54000-f6f54fff --- 0 1000
f6f55000-f6f55fff r-- 20000 1000 /system/lib/libm.so
f6f56000-f6f56fff rw- 21000 1000 /system/lib/libm.so
f6f58000-f6f58fff rw- 0 1000
f6f59000-f6f78fff r-- 0 20000 /dev/__properties__/u:object_r:default_prop:s0
f6f79000-f6f98fff r-- 0 20000 /dev/__properties__/properties_serial
f6f99000-f6f99fff rw- 0 1000 [anon:linker_alloc_vector]
f6f9a000-f6f9afff r-- 0 1000 [anon:atexit handlers]
f6f9b000-f6fbafff r-- 0 20000 /dev/__properties__/properties_serial
f6fbb000-f6fbbfff rw- 0 1000 [anon:linker_alloc_vector]
f6fbc000-f6fbcfff rw- 0 1000 [anon:linker_alloc_small_objects]
f6fbd000-f6fbdfff rw- 0 1000 [anon:linker_alloc_vector]
f6fbe000-f6fbffff rw- 0 2000 [anon:linker_alloc]
f6fc0000-f6fc0fff r-- 0 1000 [anon:linker_alloc]
f6fc1000-f6fc1fff rw- 0 1000 [anon:linker_alloc_lob]
f6fc2000-f6fc2fff r-- 0 1000 [anon:linker_alloc]
f6fc3000-f6fc3fff rw- 0 1000 [anon:linker_alloc_vector]
f6fc4000-f6fc4fff rw- 0 1000 [anon:linker_alloc_small_objects]
f6fc5000-f6fc5fff rw- 0 1000 [anon:linker_alloc_vector]
f6fc6000-f6fc6fff rw- 0 1000 [anon:linker_alloc_small_objects]
f6fc7000-f6fc7fff rw- 0 1000 [anon:arc4random _rsx structure]
f6fc8000-f6fc8fff rw- 0 1000 [anon:arc4random _rs structure]
f6fc9000-f6fc9fff r-- 0 1000 [anon:atexit handlers]
f6fca000-f6fcafff --- 0 1000 [anon:thread signal stack guard page]
</pre>
<p>
Note that since Android 5.0 (Lollipop), the C library names most of its anonymous mapped
regions so there are fewer mystery regions.
</p>
<pre class="devsite-click-to-copy">
f6fcb000-f6fccfff rw- 0 2000 [stack:5081]
</pre>
<p>
Regions named <code>[stack:<i>tid</i>]</code> are the stacks for the given threads.
</p>
<pre class="devsite-click-to-copy">
f6fcd000-f702afff r-x 0 5e000 /system/bin/linker (BuildId: 84f1316198deee0591c8ac7f158f28b7)
f702b000-f702cfff r-- 5d000 2000 /system/bin/linker
f702d000-f702dfff rw- 5f000 1000 /system/bin/linker
f702e000-f702ffff rw- 0 2000
f7030000-f7030fff r-- 0 1000
f7031000-f7032fff rw- 0 2000
ffcd7000-ffcf7fff rw- 0 21000
ffff0000-ffff0fff r-x 0 1000 [vectors]
</pre>
<p>Whether you see <code>[vector]</code> or <code>[vdso]</code> depends on the architecture. ARM uses [vector], while all other architectures use <a href="http://man7.org/linux/man-pages/man7/vdso.7.html">[vdso].</a></p>
</body>
</html>

View file

@ -0,0 +1,64 @@
<html devsite>
<head>
<title>Debugging Native Memory Use</title>
<meta name="project_path" value="/_project.yaml" />
<meta name="book_path" value="/_book.yaml" />
</head>
<body>
<!--
Copyright 2017 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<h2 id="malloc_debug">Malloc debug</h2>
<p>See <a
href="https://android.googlesource.com/platform/bionic/+/master/libc/malloc_debug/README.md">Malloc
Debug</a> and <a
href="https://android.googlesource.com/platform/bionic/+/master/libc/malloc_debug/README_api.md">Native
Memory Tracking using libc Callbacks</a> for a thorough description of the
debugging options available for native memory issues.</p>
<h2 id="malloc-stats">Malloc statistics</h2>
<p>
Android supports the <a href="http://man7.org/linux/man-pages/man3/mallinfo.3.html"
><code>mallinfo(3)</code></a>and <a href="http://man7.org/linux/man-pages/man3/malloc_info.3.html"
><code>malloc_info(3)</code></a> extensions to <code>&lt;malloc.h&gt;</code></a>.
The <code>malloc_info</code> function is available in Android 6.0 and later and
its XML schema is documented in Bionic's
<a href="https://android.googlesource.com/platform/bionic/+/master/libc/include/malloc.h"
><code>&lt;malloc.h&gt;</code></a>.
</p>
<h2 id="ddms">Dalvik Debug Monitor Server</h2>
<p>You can also use the <a
href="https://developer.android.com/studio/profile/ddms.html">Dalvik Debug
Monitor Server</a> (DDMS) to obtain a graphical view of Malloc Debug
output.</p>
<p>To use DDMS, first turn on its native memory UI:</p>
<ol>
<li>Open <code>~/.android/ddms.cfg</code></li>
<li>Add the line: <code>native=true</code></li>
</ol>
<p>Upon relaunching DDMS and selecting a process, you can switch to the new
native allocation tab and populate it with a list of allocations. This is
especially useful for debugging memory leaks.</p>
</body>
</html>

View file

@ -0,0 +1,121 @@
<html devsite>
<head>
<title>Using Strace</title>
<meta name="project_path" value="/_project.yaml" />
<meta name="book_path" value="/_book.yaml" />
</head>
<body>
<!--
Copyright 2017 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p><a href="https://strace.io">Strace</a> enables you to see the system calls a
process makes and what those system calls return. A typical process makes a lot
of system calls, so you'll want to review the
<a href="http://man7.org/linux/man-pages/man1/strace.1.html">strace man page</a>
to learn how to collect only data you're actually interested in.</p>
<h2 id=build-strace>Building strace</h2>
<p>To build strace, run the following:
<pre class="devsite-terminal devsite-click-to-copy">
mmma -j6 external/strace
</pre>
<h2 id=attach-strace>Attaching to a running process</h2>
<p>The simplest and most common use case for strace is to attach it to a running
process, which you can do with:</p>
<pre class="devsite-terminal devsite-click-to-copy">
adb shell strace -f -p PID
</pre>
<p>The <code>-f</code> flag tells strace to attach to all the threads in the
process, plus any new threads spawned later.</p>
<h2 id=app-strace>Using on an application</h2>
<p>To use strace on an application:</p>
<ol>
<li>Set up a directory for strace logs:
<pre class="devsite-click-to-copy">
<code class="devsite-terminal">adb shell setenforce 0</code>
<code class="devsite-terminal">adb shell mkdir /data/local/tmp/strace</code>
<code class="devsite-terminal">adb shell chmod 777 /data/local/tmp/strace</code>
</pre>
</li>
<li>Choose the process to trace before launching it:
<pre class="devsite-terminal devsite-click-to-copy">
adb shell setprop wrap.com.google.android.browser "logwrapper strace -f -o /data/local/tmp/strace/strace.com.google.android.browser.txt"
</pre>
</li>
<li>Launch the process normally.</li>
</ol>
<h2 id=zygote-systrace>Using on the zygote</h2>
<p>To use strace on the zygote, fix the relevant <code>init.rc</code> zygote
line (requires <code>adb shell setenforce 0</code>):
</p>
<pre class="devsite-click-to-copy">
<code class="devsite-terminal">cd system/core/</code>
<code class="devsite-terminal">patch -p1 &lt;&lt;EOF
--- a/rootdir/init.zygote32.rc
+++ b/rootdir/init.zygote32.rc
@@ -1,4 +1,4 @@
-service zygote /system/bin/app_process -Xzygote /system/bin --zygote --start-system-server
+service zygote /system/xbin/strace -o /data/local/tmp/zygote.strace /system/bin/app_process -Xzygote /system/bin --zygote --start-system-server
class main
socket zygote stream 660 root system
onrestart write /sys/android_power/request_state wake
EOF</code>
</pre>
<h2 id=get-logs-boot>Getting strace logs during Android boot</h2>
<p>To get strace logs during Android boot, make the following changes:</p>
<ul>
<li>Since the process name changes from <code>zygote</code> to
<code>strace</code>, the given service may fail to start due to the missing
SELinux <code>file_context</code> for <code>strace</code>. The solution is to
add a new line for strace in <code>system/sepolicy/private/file_contexts</code>
and copy the original file context over. Example:
<pre class="devsite-click-to-copy">
/dev/socket/zygote u:object_r:zygote_socket:s0
+ /system/xbin/strace u:object_r:zygote_socket:s0
</pre>
</li>
<li>Add kernel command, then boot the device in SELinux permissive mode. You can
do this by adding <code>androidboot.selinux=permissive</code>to
<code>BOARD_KERNEL_CMDLINE</code>. (This variable becomes read-only in
<code>build/core/Makefile</code> but is always available under
<code>/device/*/BoardConfig</code>.)
<br>
<br>Example for the Pixel (sailfish) device in
<code>/device/google/marlin/sailfish/BoardConfig.mk</code>:
<pre class="devsite-click-to-copy">
- BOARD_KERNEL_CMDLINE := .... androidboot.hardware=sailfish ...
+BOARD_KERNEL_CMDLINE := .... androidboot.hardware=sailfish ... androidboot.selinux=permissive
</pre>
After making the change, build and flash the boot image and the device will boot
in permissive mode.
</li>
</ul>
</body>
</html>

View file

@ -0,0 +1,335 @@
<html devsite>
<head>
<title>Understanding Systrace</title>
<meta name="project_path" value="/_project.yaml" />
<meta name="book_path" value="/_book.yaml" />
</head>
<body>
<!--
Copyright 2017 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p class="caution"><strong>Caution:</strong> If you've never used systrace
before, we strongly recommend reading the
<a href="https://developer.android.com/studio/profile/systrace.html">systrace
overview</a> before continuing.</p>
<p>systrace is the primary tool for analyzing Android device performance.
However, it's really a wrapper around other tools: It is the host-side wrapper
around <em>atrace</em>, the device-side executable that controls userspace
tracing and sets up <em>ftrace</em>, the primary tracing mechanism in the Linux
kernel. systrace uses atrace to enable tracing, then reads the ftrace buffer and
wraps it all in a self-contained HTML viewer. (While newer kernels have support
for Linux Enhanced Berkeley Packet Filter (eBPF), the following documentation
pertains to the 3.18 kernel (no eFPF) as that's what was used on the Pixel/Pixel
XL.)</p>
<p>systrace is owned by the Google Android and Google Chrome teams and is
developed in the open as part of the
<a href="https://github.com/catapult-project/catapult">Catapult project</a>. In
addition to systrace, Catapult includes other useful utilities. For example,
ftrace has more features than can be directly enabled by systrace or atrace and
contains some advanced functionality that is critical to debugging performance
problems. (These features require root access and often a new kernel.)</p>
<h2 id="running_systrace">Running systrace</h2>
<p>When debugging jitter on Pixel/Pixel XL, start with the following
command:</p>
<pre class="devsite-terminal devsite-click-to-copy">
./systrace.py sched freq idle am wm gfx view sync binder_driver irq workq input -b 96000
</pre>
<p>When combined with the additional tracepoints required for GPU and display
pipeline activity, this gives you the ability to trace from user input to frame
displayed on screen. Set the buffer size to something large to avoid
losing events (which usually manifests as some CPUs containing no events after
some point in the trace).</p>
<p>When going through systrace, keep in mind that <strong>every event is
triggered by something on the CPU</strong>.</p>
<p class="note"><strong>Note:</strong> Hardware interrupts are not controlled by
the CPU and do trigger things in ftrace, but the actual commit to the trace log
is done by the interrupt handler, which could have been delayed if your
interrupt arrived while (for example) some other bad driver had interrupts
disabled. The critical element is the CPU.</p>
<p>Because systrace is built on top of ftrace and ftrace runs on the CPU,
something on the CPU must write the ftrace buffer that logs hardware changes.
This means that if you're curious about why a display fence changed state, you
can see what was running on the CPU at the exact point of its transition
(something that is running on the CPU triggered that change in the log). This
concept is the foundation of analyzing performance using systrace.</p>
<h2 id="example_1">Example: Working frame</h2>
<p>This example describes a systrace for a normal UI pipeline. To follow along
with the example, <a href="perf_traces.zip">download the zip file</a> of traces
(which also includes other traces referred to in this section), upzip the file,
and open the systrace_tutorial.html file in your browser. Be warned this
systrace is a large file; unless you use systrace in your day-to-day work, this
is likely a much bigger trace with much more information than you've ever seen
in a single trace before.</p>
<p>For a consistent, periodic workload such as TouchLatency, the UI pipeline
contains the following:</p>
<p></p>
<ol>
<li>EventThread in SurfaceFlinger wakes the app UI thread, signaling it's time
to render a new frame.</li>
<li>App renders frame in UI thread, RenderThread, and hwuiTasks, using CPU and
GPU resources. This is the bulk of the capacity spent for UI.</li>
<li>App sends rendered frame to SurfaceFlinger via binder and goes to
sleep.</li>
<li>A second EventThread in SurfaceFlinger wakes SurfaceFlinger to trigger
composition and display output. If SurfaceFlinger determines there is no work to
be done, it goes back to sleep.</li>
<li>SurfaceFlinger handles composition via HWC/HWC2 or GL. HWC/HWC2 composition
is faster and lower power but has limitations depending on the SOC. This usually
takes ~4-6ms, but can overlap with step 2 because Android applications are
always triple-buffered. (While applications are always triple-buffered, there
may only be one pending frame waiting in SurfaceFlinger, which makes it appear
identical to double-buffering.)</li>
<li>SurfaceFlinger dispatches final output to display via vendor driver and goes
back to sleep, waiting for EventThread wakeup.</li>
</ol>
<p>Let's walk through the frame beginning at 15409ms:</p>
<p><img src="/devices/tech/debug/images/perf_trace_normal_pipeline.png"></p>
<p class="img-caption"><strong>Figure 1.</strong> Normal UI pipeline,
EventThread running.</p>
<p>Figure 1 is a normal frame surrounded by normal frames, so it's a good
starting point for understanding how the UI pipeline works. The UI thread row
for TouchLatency includes different colors at different times. Bars denote
different states for the thread:</p>
<ul>
<li><strong>Gray</strong>. Sleeping.</li>
<li><strong>Blue</strong>. Runnable (it could run, but the scheduler hasn't
picked it to run yet).</li>
<li><strong>Green</strong>. Actively running (the scheduler thinks it's
running).
<p class="note"><strong>Note</strong>: Interrupt handlers aren't shown in the
normal per-CPU timeline, so it's possible that you're actually running interrupt
handlers or softirqs during some portion of a thread's runtime. Check the irq
section of the trace (under process 0) to confirm whether an interrupt is
running instead of a standard thread.</p>
</li>
<li><strong>Red</strong>. Uninterruptible sleep (generally sleeping on a lock
in the kernel). Can be indicative of I/O load. Extremely useful for debugging
performance issues.</li>
<li><strong>Orange</strong>. Uninterruptible sleep due to I/O load.</li>
</ul>
<p>To view the reason for uninterruptible sleep (available from the
<code>sched_blocked_reason</code> tracepoint), select the red uninterruptible
sleep slice.</p>
<p>While EventThread is running, the UI thread for TouchLatency becomes
runnable. To see what woke it, click the blue section:</p>
<p><img src="/devices/tech/debug/images/perf_trace_tl.png"></p>
<p class="img-caption"><strong>Figure 2.</strong> UI thread for
TouchLatency.</p>
<p>Figure 2 shows the TouchLatency UI thread was woken by tid 6843, which
corresponds to EventThread. The UI thread wakes:</p>
<p><img src="/devices/tech/debug/images/perf_trace_wake_render_enqueue.png"></p>
<p class="img-caption"><strong>Figure 3.</strong> UI thread wakes, renders a
frame, and enqueues it for SurfaceFlinger to consume.</p>
<p>If the <code>binder_driver</code> tag is enabled in a trace, you can select a
binder transaction to view information on all of the processes involved in that
transaction:</p>
<p><img src="/devices/tech/debug/images/perf_trace_binder_trans.png"></p>
<p class="img-caption"><strong>Figure 4.</strong> Binder transaction.</p>
<p>Figure 4 shows that, at 15,423.65ms, Binder:6832_1 in SurfaceFlinger becomes
runnable because of tid 9579, which is TouchLatency's RenderThread. You can also
see queueBuffer on both sides of the binder transaction.</p>
<p>During the queueBuffer on the SurfaceFlinger side, the number of pending
frames from TouchLatency goes from 1 to 2:</p>
<p><img src="/devices/tech/debug/images/perf_trace_pending_frames.png"></p>
<p class="img-caption"><strong>Figure 5.</strong> Pending frames goes from 1 to
2.</p>
<p>Figure 5 shows triple-buffering, where there are two completed frames and the
app will soon start rendering a third. This is because we've already dropped
some frames, so the app keeps two pending frames instead of one to try to avoid
further dropped frames.</p>
<p>Soon after, SurfaceFlinger's main thread is woken by a second EventThread so
it can output the older pending frame to the display:</p>
<p><img src="/devices/tech/debug/images/perf_trace_sf_woken_et.png"></p>
<p class="img-caption"><strong>Figure 6.</strong> SurfaceFlinger's main thread
is woken by a second EventThread.</p>
<p>SurfaceFlinger first latches the older pending buffer, which causes the
pending buffer count to decrease from 2 to 1:</p>
<p><img src="/devices/tech/debug/images/perf_trace_sf_latches_pend.png"></p>
<p class="img-caption"><strong>Figure 7.</strong> SurfaceFlinger first latches
the older pending buffer.</p>
<p>After latching the buffer, SurfaceFlinger sets up composition and submits the
final frame to the display. (Some of these sections are enabled as part of the
<code>mdss</code> tracepoint, so they may not be there on your SOC.)</p>
<p><img src="/devices/tech/debug/images/perf_trace_sf_comp_submit.png"></p>
<p class="img-caption"><strong>Figure 8.</strong> SurfaceFlinger sets up
composition and submits the final frame.</p>
<p>Next, <code>mdss_fb0</code> wakes on CPU 0. <code>mdss_fb0</code> is the
display pipeline's kernel thread for outputting a rendered frame to the display.
We can see <code>mdss_fb0</code> as its own row in the trace (scroll down to
view).</p>
<p><img src="/devices/tech/debug/images/perf_trace_wake_cpu0.png"></p>
<p class="img-caption"><strong>Figure 9.</strong> <code>mdss_fb0</code> wakes
on CPU 0.</p>
<p><code>mdss_fb0</code> wakes up, runs for a bit, enters uninterruptible sleep,
then wakes again.</p>
<p class="note"><strong>Note</strong>: From this point forward, the trace is
more complicated as the final work is split between <code>mdss_fb0</code>,
interrupts, and workqueue functions. If you need that level of detail, refer to
the exact characteristics of the driver stack for your SOC (as what happens on
the Pixel XL might not be useful to you).</p>
<h2 id="example_2">Example: Non-working frame</h2>
<p>This example describes a systrace used to debug Pixel/Pixel XL jitter. To
follow along with the example, <a href="/devices/tech/debug/perf_traces.zip">download the zip
file</a> of traces (which also includes other traces referred to in this
section), upzip the file, and open the systrace_tutorial.html file in your
browser.</p>
<p>When you first open the systrace, you'll see something like this:</p>
<p><img src="/devices/tech/debug/images/perf_trace_tl_pxl.png"></p>
<p class="img-caption"><strong>Figure 10</strong>. TouchLatency running on Pixel
XL (most options enabled, including mdss and kgsl tracepoints).</p>
<p>When looking for jank, check the FrameMissed row under SurfaceFlinger.
FrameMissed is a quality-of-life improvement provided by Hardware Composer 2
(HWC2). As of December 2016, HWC2 is used only on Pixel/Pixel XL; when viewing
systrace for other devices, the FrameMissed row may not be present. In either
case, FrameMissed is correlated with SurfaceFlinger missing one of its
extremely-regular runtimes and an unchanged pending-buffer count for the app
(<code>com.prefabulated.touchlatency</code>) at a vsync:</p>
<p><img src="/devices/tech/debug/images/perf_trace_fm_sf.png"></p>
<p class="img-caption"><strong>Figure 11</strong>. FrameMissed correlation with
SurfaceFlinger.</p>
<p>Figure 11 shows a missed frame at 15598.29ms. SurfaceFlinger woke briefly at
the vsync interval and went back to sleep without doing any work, which means
SurfaceFlinger determined it was not worth trying to send a frame to the display
again. Why?</p>
<p>To understand how the pipeline broke down for this frame, first review the
working frame example above to see how a normal UI pipeline appears in systrace.
When ready, return to the missed frame and work backwards. Notice that
SurfaceFlinger wakes and immediately goes to sleep. When viewing the number of
pending frames from TouchLatency, there are two frames (a good clue to help
figure out what's going on).</p>
<p><img src="/devices/tech/debug/images/perf_trace_sf_wake_sleep.png"></p>
<p class="img-caption"><strong>Figure 12.</strong> SurfaceFlinger wakes and
immediately goes to sleep.</p>
<p>Because we have frames in SurfaceFlinger, it's not an app issue. In addition,
SurfaceFlinger is waking at the correct time, so it's not a SurfaceFlinger
issue. If SurfaceFlinger and the app are both looking normal, it's probably a
driver issue.</p>
<p>Because the <code>mdss</code> and <code>sync</code> tracepoints are enabled,
we can get information about the fences (shared between the display driver and
SurfaceFlinger) that control when frames are actually submitted to the display.
The fences we care about are listed under <code>mdss_fb0_retire</code>, which
denotes when a frame is actually on the display. These fences are provided as
part of the <code>sync</code> trace category. Which fences correspond to
particular events in SurfaceFlinger depends on your SOC and driver stack, so
work with your SOC vendor to understand the meaning of fence categories in your
traces.</p>
<p><img src="/devices/tech/debug/images/perf_traces_fences.png"></p>
<p class="img-caption"><strong>Figure 13.</strong> <code>mdss_fb0_retire</code>
fences.</p>
<p>Figure 13 shows a frame that was displayed for 33ms, not 16.7ms as expected.
Halfway through that slice, that frame should have been replaced by a new one
but wasn't. View the previous frame and look for anything interesting:</p>
<p><img src="/devices/tech/debug/images/perf_trace_frame_previous.png"></p>
<p class="img-caption"><strong>Figure 14.</strong> Frame previous to busted
frame.</p>
<p>Figure 14 shows 14.482ms a frame. The busted two-frame segment was 33.6ms,
which is roughly what we would expect for two frames (we render at 60Hz, 16.7ms
a frame, which is close). But 14.482ms is not anywhere close to 16.7ms, which
suggests that something is very wrong with the display pipe.</p>
<p>Investigate exactly where that fence ends to determine what controls it:</p>
<p><img src="/devices/tech/debug/images/perf_trace_fence_end.png"></p>
<p class="img-caption"><strong>Figure 15.</strong> Investigate fence end.</p>
<p>A workqueue contains a <code>__vsync_retire_work_handler</code> that is
running when the fence changes. Looking through the kernel source, you can see
it's part of the display driver. It definitely appears to be on the critical
path for the display pipeline, so it must run as quickly as possible. It's
runnable for 70us or so (not a long scheduling delay), but it's a workqueue and
might not get scheduled accurately.</p>
<p>Check the previous frame to determine if that contributed; sometimes jitter
can add up over time and eventually cause us to miss a deadline.</p>
<p><img src="/devices/tech/debug/images/perf_trace_previous_frame.png"></p>
<p class="img-caption"><strong>Figure 16.</strong> Previous frame.</p>
<p>The runnable line on the kworker isn't visible because the viewer turns it
white when it's selected, but the statistics tell the story: 2.3ms of scheduler
delay for part of the display pipeline critical path is <strong>bad</strong>.
Before we do anything else, we should fix that by moving this part of the
display pipeline critical path from a workqueue (which runs as a
<code>SCHED_OTHER</code> CFS thread) to a dedicated <code>SCHED_FIFO</code>
kthread. This function needs timing guarantees that workqueues can't (and aren't
intended to) provide.</p>
<p>Is this the reason for the jank? It's hard to say conclusively. Outside of
easy-to-diagnose cases such as kernel lock contention causing display-critical
threads to sleep, traces usually won't tell you directly what the problem is.
Could this jitter have been the cause of the dropped frame? Absolutely. The
fence times should be 16.7ms, but they aren't close to that at all in the frames
leading up to the dropped frame. (There's a 19ms fence followed by a 14ms
fence.) Given how tightly coupled the display pipeline is, it's entirely
possible the jitter around fence timings resulted in an eventual dropped
frame.</p>
<p>In this example, the solution involved converting
<code>__vsync_retire_work_handler</code> from a workqueue to a dedicated
kthread. This resulted in noticeable jitter improvements and reduced jank in the
bouncing ball test. Subsequent traces show fence timings that hover very close
to 16.7ms.</p>
</body>
</html>

View file

@ -0,0 +1,75 @@
<html devsite>
<head>
<title>Using Valgrind</title>
<meta name="project_path" value="/_project.yaml" />
<meta name="book_path" value="/_book.yaml" />
</head>
<body>
<!--
Copyright 2017 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>The <a href="http://valgrind.org/">Valgrind</a> tool suite includes Memcheck
(for detecting memory-related errors in C and C++), Cachegrind (a cache
profiler), Massif (a heap profiler), and several other tools.</p>
<p class="note"><strong>Note:</strong> While you can use Valgrind for Android
debugging, most Android platform developers use
<a href="/devices/tech/debug/asan.html">AddressSanitizer</a> instead.</p>
<h2 id=build-valgrind>Building Valgrind</h2>
<p>To build Valgrind:</p>
<pre class="devsite-terminal devsite-click-to-copy">
mmma -j6 external/valgrind
</pre>
<h2 id=app-valgrind>Using on an application</h2>
<p>To use Valgrind on an application:</p>
<pre class="devsite-click-to-copy">
<code class="devsite-terminal">adb shell setprop wrap.<em>app_name</em> "logwrapper valgrind"</code>
<code class="devsite-terminal">adb shell am start -a android.intent.action.MAIN -n <em>app_name</em>/.MainActivity</code>
</pre>
<code><em>app_name</em></code> must be a fully-qualified name such as
<strong>com.google.example_app</strong>.
<h2 id=server-valgrind>Using on the system server</h2>
<p>To run the system server with Valgrind:</p>
<pre class="devsite-click-to-copy">
<code class="devsite-terminal">adb shell setprop wrap.system_server "logwrapper valgrind"</code>
<code class="devsite-terminal">adb shell stop && adb shell start</code>
</pre>
<h2 id=symbols-valgrind>Getting debug symbols</h2>
<p>For debug symbols, push unstripped libraries to <code>/data/local/symbols</code>:</p>
<pre class="devsite-click-to-copy">
<code class="devsite-terminal">adb shell mkdir /data/local/symbols</code>
<code class="devsite-terminal">adb push $OUT/symbols /data/local/symbols</code>
</pre>
<h2 id=boot-valgrind>Using during boot up</h2>
<p>To use Valgrind during boot up, edit
<code>out/target/product/XXXX/root/init.rc</code> by changing</p>
<pre class="devsite-click-to-copy">service example /system/bin/foo --arg1 --arg2</pre>
<p>to:</p>
<pre class="devsite-click-to-copy">>service example /system/bin/logwrapper /system/bin/valgrind /system/bin/foo --arg1 --arg2</pre>
<p>To see the effects, create a <code>boot.img</code> and reflash the
device.</p>
</body>
</html>