upload android base code part8
259
android/docs/source.android.com/en/devices/tech/debug/asan.html
Normal file
|
@ -0,0 +1,259 @@
|
|||
<html devsite>
|
||||
<head>
|
||||
<title>AddressSanitizer</title>
|
||||
<meta name="project_path" value="/_project.yaml" />
|
||||
<meta name="book_path" value="/_book.yaml" />
|
||||
</head>
|
||||
<body>
|
||||
<!--
|
||||
Copyright 2017 The Android Open Source Project
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
|
||||
|
||||
<p>AddressSanitizer (ASan) is a fast compiler-based tool for detecting memory bugs
|
||||
in native code. It is comparable to Valgrind (Memcheck tool), but, unlike it,
|
||||
ASan:</p>
|
||||
|
||||
<ul>
|
||||
<li> + detects overflows on stack and global objects
|
||||
<li> - does not detect uninitialized reads and memory leaks
|
||||
<li> + is much faster (two-three times slowdown compared to Valgrind’s 20-100x)
|
||||
<li> + has less memory overhead
|
||||
</ul>
|
||||
|
||||
<p>This document describes how to build and run parts of the Android platform with
|
||||
AddressSanitizer. If you are looking to build a standalone (i.e. SDK/NDK)
|
||||
application with AddressSanitizer, see the <a
|
||||
href="https://github.com/google/sanitizers/wiki/AddressSanitizerOnAndroid">AddressSanitizerOnAndroid</a>
|
||||
public project site instead.</p>
|
||||
|
||||
<p>AddressSanitizer consists of a compiler (<code>external/clang</code>) and a runtime library
|
||||
(<code>external/compiler-rt/lib/asan</code>).</p>
|
||||
|
||||
<p class="note"><strong>Note</strong>: Use the current master
|
||||
branch to gain access to the <a href="#sanitize_target">SANITIZE_TARGET</a>
|
||||
feature and the ability to build the entire Android platform with
|
||||
AddressSanitizer at once. Otherwise, you are limited to using
|
||||
<code>LOCAL_SANITIZE</code>.</p>
|
||||
|
||||
<h2 id=building_with_clang>Building with Clang</h2>
|
||||
|
||||
<p>As a first step to building an ASan-instrumented binary, make sure that your
|
||||
code builds with Clang. This is done by default on the master branch, so there should be nothing
|
||||
you need to do. If you believe that the module you'd like to test is being built with GCC,
|
||||
you can switch to Clang by adding <code>LOCAL_CLANG:=true</code>
|
||||
to the build rules. Clang may find bugs in your code that GCC missed.</p>
|
||||
|
||||
<h2 id=building_executables_with_addresssanitizer>Building executables with AddressSanitizer</h2>
|
||||
|
||||
<p>Add <code>LOCAL_SANITIZE:=address</code> to the build rule of the
|
||||
executable.</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
LOCAL_SANITIZE:=address
|
||||
</pre>
|
||||
|
||||
<p>When a bug is detected, ASan prints a verbose report both to the standard
|
||||
output and to <code>logcat</code> and then crashes the process.</p>
|
||||
|
||||
<h2 id=building_shared_libraries_with_addresssanitizer>Building shared libraries with AddressSanitizer</h2>
|
||||
|
||||
<p>Due to the way ASan works, a library built with ASan cannot be used by an
|
||||
executable that's built without ASan.</p>
|
||||
|
||||
<p class="note"><strong>Note</strong>: In runtime situations where an ASan library is
|
||||
loaded into an incorrect process, you will see unresolved symbol messages
|
||||
starting with <code>_asan</code> or <code>_sanitizer</code>.</p>
|
||||
|
||||
<p>To sanitize a shared library that is used in multiple executables, not all of
|
||||
which are built with ASan, you'll need two copies of the library. The
|
||||
recommended way to do this is to add the following to <code>Android.mk</code>
|
||||
for the module in question:</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
LOCAL_SANITIZE:=address
|
||||
LOCAL_MODULE_RELATIVE_PATH := asan
|
||||
</pre>
|
||||
|
||||
<p>This puts the library in <code>/system/lib/asan</code> instead of
|
||||
<code>/system/lib</code>. Then, run your executable with:
|
||||
<code>LD_LIBRARY_PATH=/system/lib/asan</code></p>
|
||||
|
||||
<p>For system daemons, add the following to the appropriate section of
|
||||
<code>/init.rc</code> or <code>/init.$device$.rc</code>.</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
setenv LD_LIBRARY_PATH /system/lib/asan
|
||||
</pre>
|
||||
|
||||
<p class="warning"><strong>Warning</strong>: The <code>LOCAL_MODULE_RELATIVE_PATH</code>
|
||||
setting <strong>moves</strong> your library to <code>/system/lib/asan</code>,
|
||||
meaning that clobbering and rebuilding from scratch will result in the
|
||||
library missing from <code>/system/lib</code>, and probably an unbootable
|
||||
image. That's an unfortunate limitation of the
|
||||
current build system. Don't clobber; do <code>make -j $N</code> and <code>adb
|
||||
sync</code>.</p>
|
||||
|
||||
<p>Verify the process is using libraries from <code>/system/lib/asan</code>
|
||||
when present by reading <code>/proc/$PID/maps</code>. If it's not, you may need
|
||||
to disable SELinux, like so:</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
<code class="devsite-terminal">adb root</code>
|
||||
<code class="devsite-terminal">adb shell setenforce 0</code>
|
||||
# restart the process with adb shell kill $PID
|
||||
# if it is a system service, or may be adb shell stop; adb shell start.
|
||||
</pre>
|
||||
|
||||
<h2 id=better_stack_traces>Better stack traces</h2>
|
||||
|
||||
<p>AddressSanitizer uses a fast, frame-pointer-based unwinder to record a stack
|
||||
trace for every memory allocation and deallocation event in the program. Most
|
||||
of Android is built without frame pointers. As a result, you will often get
|
||||
only one or two meaningful frames. To fix this, either rebuild the library with
|
||||
ASan (recommended!), or with:</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
LOCAL_CFLAGS:=-fno-omit-frame-pointer
|
||||
LOCAL_ARM_MODE:=arm
|
||||
</pre>
|
||||
|
||||
<p>Or set <code>ASAN_OPTIONS=fast_unwind_on_malloc=0</code> in the process
|
||||
environment. The latter can be very CPU-intensive, depending on
|
||||
the load.</p>
|
||||
|
||||
<h2 id=symbolization>Symbolization</h2>
|
||||
|
||||
<p>Initially, ASan reports contain references to offsets in binaries and shared
|
||||
libraries. There are two ways to obtain source file and line information:</p>
|
||||
|
||||
<ul>
|
||||
<li>Ensure llvm-symbolizer binary is present in <code>/system/bin</code>.
|
||||
Llvm-symbolizer is built from sources in:
|
||||
<code>third_party/llvm/tools/llvm-symbolizer</code> <li>Filter the report
|
||||
through the <code>external/compiler-rt/lib/asan/scripts/symbolize.py</code>
|
||||
script.
|
||||
</ul>
|
||||
|
||||
<p>The second approach can provide more data (i.e. file:line locations) because of
|
||||
the availability of symbolized libraries on the host.</p>
|
||||
|
||||
<h2 id=addresssanitizer_in_the_apps>AddressSanitizer in the apps</h2>
|
||||
|
||||
<p>AddressSanitizer cannot see into Java code, but it can detect bugs in the JNI
|
||||
libraries. For that, you'll need to build the executable with ASan, which in
|
||||
this case is <code>/system/bin/app_process(<em>32|64</em>)</code>. This will
|
||||
enable ASan in all apps on the device at the same time, which is a
|
||||
bit stressful, but nothing that a 2GB RAM device cannot handle.</p>
|
||||
|
||||
<p>Add the usual <code>LOCAL_SANITIZE:=address</code> to
|
||||
the app_process build rule in <code>frameworks/base/cmds/app_process</code>. Ignore
|
||||
the <code>app_process__asan</code> target in the same file for now (if it is
|
||||
still there at the time you read
|
||||
this). Edit the Zygote record in
|
||||
<code>system/core/rootdir/init.zygote(<em>32|64</em>).rc</code> to add the
|
||||
following lines:</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
setenv LD_LIBRARY_PATH /system/lib/asan:/system/lib
|
||||
setenv ASAN_OPTIONS
|
||||
allow_user_segv_handler=true
|
||||
</pre>
|
||||
|
||||
<p>Build, adb sync, fastboot flash boot, reboot.</p>
|
||||
|
||||
<h2 id=using_the_wrap_property>Using the wrap property</h2>
|
||||
|
||||
<p>The approach in the previous section puts AddressSanitizer into every
|
||||
application in the system (actually, into every descendant of the Zygote
|
||||
process). It is possible to run only one (or several) applications with ASan,
|
||||
trading some memory overhead for slower application startup.</p>
|
||||
|
||||
<p>This can be done by starting your app with the “wrap.” property, the same one
|
||||
that’s used to run apps under Valgrind. The following example runs the Gmail app
|
||||
under ASan:</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
<code class="devsite-terminal">adb root</code>
|
||||
<code class="devsite-terminal">adb shell setenforce 0 # disable SELinux</code>
|
||||
<code class="devsite-terminal">adb shell setprop wrap.com.google.android.gm "asanwrapper"</code>
|
||||
</pre>
|
||||
|
||||
<p>In this context, asanwrapper rewrites <code>/system/bin/app_process</code>
|
||||
to <code>/system/bin/asan/app_process</code>, which is built with
|
||||
AddressSanitizer. It also adds <code>/system/lib/asan</code> at the start of
|
||||
the dynamic library search path. This way ASan-instrumented
|
||||
libraries from <code>/system/lib/asan</code> are preferred to normal libraries
|
||||
in <code>/system/lib</code> when running with asanwrapper.</p>
|
||||
|
||||
<p>Again, if a bug is found, the app will crash, and the report will be printed to
|
||||
the log.</p>
|
||||
|
||||
<h2 id=sanitize_target>SANITIZE_TARGET</h2>
|
||||
|
||||
<p>The master branch has support for building the entire Android platform with
|
||||
AddressSanitizer at once.</p>
|
||||
|
||||
<p>Run the following commands in the same build tree.</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
<code class="devsite-terminal">make -j42</code>
|
||||
<code class="devsite-terminal">SANITIZE_TARGET=address make -j42</code>
|
||||
</pre>
|
||||
|
||||
<p>In this mode, <code>userdata.img</code> contains extra libraries and must be
|
||||
flashed to the device as well. Use the following command line:</p>
|
||||
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
fastboot flash userdata && fastboot flashall
|
||||
</pre>
|
||||
|
||||
<p>At the moment of this writing, modern Nexus and Pixel devices boot to the UI in this mode.</p>
|
||||
|
||||
<p>This works by building two sets of shared libraries: normal in
|
||||
<code>/system/lib</code> (the first make invocation), ASan-instrumented in
|
||||
<code>/data/asan/lib</code> (the second make invocation). Executables from the
|
||||
second build overwrite the ones from the first build. ASan-instrumented
|
||||
executables get a different library search path that includes
|
||||
<code>/data/asan/lib</code> before <code>/system/lib</code> through the use of
|
||||
"/system/bin/linker_asan" in PT_INTERP.</p>
|
||||
|
||||
<p>The build system clobbers intermediate object directories when the
|
||||
<code>$SANITIZE_TARGET</code> value has changed. This forces a rebuild of all
|
||||
targets while preserving installed binaries under <code>/system/lib</code>.</p>
|
||||
|
||||
<p>Some targets cannot be built with ASan:</p>
|
||||
|
||||
<ul>
|
||||
<li>Statically linked executables.
|
||||
<li><code>LOCAL_CLANG:=false</code> targets
|
||||
<li><code>LOCAL_SANITIZE:=false</code> will not be ASan'd for <code>SANITIZE_TARGET=address</code>
|
||||
</ul>
|
||||
|
||||
<p>Executables like these are skipped in the SANITIZE_TARGET build, and the
|
||||
version from the first make invocation is left in <code>/system/bin</code>.</p>
|
||||
|
||||
<p>Libraries like this are simply built without ASan. They can contain some ASan
|
||||
code anyway from the static libraries they depend upon.</p>
|
||||
|
||||
<h2 id=supporting_documentation>Supporting documentation</h2>
|
||||
|
||||
<p><a href="https://github.com/google/sanitizers/wiki/AddressSanitizerOnAndroid">AddressSanitizerOnAndroid</a> public project site</p>
|
||||
<p><a href="https://www.chromium.org/developers/testing/addresssanitizer">AddressSanitizer and Chromium</a></p>
|
||||
<p><a href="https://github.com/google/sanitizers">Other Google Sanitizers</a></p>
|
||||
|
||||
</body>
|
||||
</html>
|
|
@ -0,0 +1,268 @@
|
|||
<html devsite>
|
||||
<head>
|
||||
<title>Evaluating Performance</title>
|
||||
<meta name="project_path" value="/_project.yaml" />
|
||||
<meta name="book_path" value="/_book.yaml" />
|
||||
</head>
|
||||
<body>
|
||||
<!--
|
||||
Copyright 2017 The Android Open Source Project
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
|
||||
<p>There are two user-visible indicators of performance:</p>
|
||||
|
||||
<ul>
|
||||
<li><strong>Predictable, perceptible performance</strong>. Does the user
|
||||
interface (UI) drop frames or consistently render at 60FPS? Does audio play
|
||||
without artifacts or popping? How long is the delay between the user touching
|
||||
the screen and the effect showing on the display?</li>
|
||||
<li><strong>Length of time required for longer operations</strong> (such as
|
||||
opening applications).</li>
|
||||
</ul>
|
||||
|
||||
<p>The first is more noticeable than the second. Users typically notice jank
|
||||
but they won't be able to tell 500ms vs 600ms application startup time unless
|
||||
they are looking at two devices side-by-side. Touch latency is immediately
|
||||
noticeable and significantly contributes to the perception of a device.</p>
|
||||
|
||||
<p>As a result, in a fast device, the UI pipeline is the most important thing in
|
||||
the system other than what is necessary to keep the UI pipeline functional. This
|
||||
means that the UI pipeline should preempt any other work that is not necessary
|
||||
for fluid UI. To maintain a fluid UI, background syncing, notification delivery,
|
||||
and similar work must all be delayed if UI work can be run. It is
|
||||
acceptable to trade the performance of longer operations (HDR+ runtime,
|
||||
application startup, etc.) to maintain a fluid UI.</p>
|
||||
|
||||
<h2 id="capacity_vs_jitter">Capacity vs jitter</h2>
|
||||
<p>When considering device performance, <em>capacity</em> and <em>jitter</em>
|
||||
are two meaningful metrics.</p>
|
||||
|
||||
<h3 id="capacity">Capacity</h3>
|
||||
<p>Capacity is the total amount of some resource that the device possesses over
|
||||
some amount of time. This can be CPU resources, GPU resources, I/O resources,
|
||||
network resources, memory bandwidth, or any similar metric. When examining
|
||||
whole-system performance, it can be useful to abstract the individual components
|
||||
and assume a single metric that determines performance (especially when tuning a
|
||||
new device because the workloads run on that device are likely fixed).</p>
|
||||
|
||||
<p>The capacity of a system varies based on the computing resources online.
|
||||
Changing CPU/GPU frequency is the primary means of changing capacity, but there
|
||||
are others such as changing the number of CPU cores online. Accordingly, the
|
||||
capacity of a system corresponds with power consumption; <strong>changing
|
||||
capacity always results in a similar change in power consumption.</strong></p>
|
||||
|
||||
<p>The capacity required at a given time is overwhelmingly determined by the
|
||||
running application. As a result, the platform can do little to adjust the
|
||||
capacity required for a given workload, and the means to do so are limited to
|
||||
runtime improvements (Android framework, ART, Bionic, GPU compiler/drivers,
|
||||
kernel).</p>
|
||||
|
||||
<h3 id="jitter">Jitter</h3>
|
||||
<p>While the required capacity for a workload is easy to see, jitter is a more
|
||||
nebulous concept. For a good introduction to jitter as an impediment to fast
|
||||
systems, refer to
|
||||
<em><a href="http://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-03-3116">THE
|
||||
CASE OF THE MISSING SUPERCOMPUTER PERFORMANCE: ACHIEVING OPTIMAL PERFORMANCE ON
|
||||
THE 8,192 PROCESSORS OF ASCl Q</em></a>. (It's an investigation of why the ASCI
|
||||
Q supercomputer did not achieve its expected performance and is a great
|
||||
introduction to optimizing large systems.)</p>
|
||||
|
||||
<p>This page uses the term jitter to describe what the ASCI Q paper calls
|
||||
<em>noise</em>. Jitter is the random system behavior that prevents perceptible
|
||||
work from running. It is often work that must be run, but it may not have strict
|
||||
timing requirements that cause it to run at any particular time. Because it is
|
||||
random, it is extremely difficult to disprove the existence of jitter for a
|
||||
given workload. It is also extremely difficult to prove that a known source of
|
||||
jitter was the cause of a particular performance issue. The tools most commonly
|
||||
used for diagnosing causes of jitter (such as tracing or logging) can introduce
|
||||
their own jitter.</p>
|
||||
|
||||
<p>Sources of jitter experienced in real-world implementations of Android
|
||||
include:</p>
|
||||
<ul>
|
||||
<li>Scheduler delay</li>
|
||||
<li>Interrupt handlers</li>
|
||||
<li>Driver code running for too long with preemption or interrupts disabled</li>
|
||||
<li>Long-running softirqs</li>
|
||||
<li>Lock contention (application, framework, kernel driver, binder lock, mmap
|
||||
lock)</li>
|
||||
<li>File descriptor contention where a low-priority thread holds the lock on a
|
||||
file, preventing a high-priority thread from running</li>
|
||||
<li>Running UI-critical code in workqueues where it could be delayed</li>
|
||||
<li>CPU idle transitions</li>
|
||||
<li>Logging</li>
|
||||
<li>I/O delays</li>
|
||||
<li>Unnecessary process creation (e.g., CONNECTIVITY_CHANGE broadcasts)</li>
|
||||
<li>Page cache thrashing caused by insufficient free memory</li>
|
||||
</ul>
|
||||
|
||||
<p>The required amount of time for a given period of jitter may or may not
|
||||
decrease as capacity increases. For example, if a driver leaves interrupts
|
||||
disabled while waiting for a read from across an i2c bus, it will take a fixed
|
||||
amount of time regardless of whether the CPU is at 384MHz or 2GHz. Increasing
|
||||
capacity is not a feasible solution to improve performance when jitter is
|
||||
involved. As a result, <strong>faster processors will not usually improve
|
||||
performance in jitter-constrained situations.</strong></p>
|
||||
|
||||
<p>Finally, unlike capacity, jitter is almost entirely within the domain of the
|
||||
system vendor.</p>
|
||||
|
||||
<h3 id="memory_consumption">Memory consumption</h3>
|
||||
<p>Memory consumption is traditionally blamed for poor performance. While
|
||||
consumption itself is not a performance issue, it can cause jitter via
|
||||
lowmemorykiller overhead, service restarts, and page cache thrashing. Reducing
|
||||
memory consumption can avoid the direct causes of poor performance, but there
|
||||
may be other targeted improvements that avoid those causes as well (for example,
|
||||
pinning the framework to prevent it from being paged out when it will be paged
|
||||
in soon after).</p>
|
||||
|
||||
<h2 id="analyze_initial">Analyzing initial device performance</h2>
|
||||
<p>Starting from a functional but poorly-performing system and attempting to fix
|
||||
the system's behavior by looking at individual cases of user-visible poor
|
||||
performance is <strong>not</strong> a sound strategy. Because poor performance
|
||||
is usually not easily reproducible (i.e., jitter) or an application issue, too
|
||||
many variables in the full system prevent this strategy from being effective. As
|
||||
a result, it's very easy to misidentify causes and make minor improvements while
|
||||
missing systemic opportunities for fixing performance across the system.</p>
|
||||
|
||||
<p>Instead, use the following general approach when bringing up a new
|
||||
device:</p>
|
||||
<ol>
|
||||
<li>Get the system booting to UI with all drivers running and some basic
|
||||
frequency governor settings (if you change the frequency governor settings,
|
||||
repeat all steps below).</li>
|
||||
<li>Ensure the kernel supports the <code>sched_blocked_reason</code> tracepoint
|
||||
as well as other tracepoints in the display pipeline that denote when the frame
|
||||
is delivered to the display.</li>
|
||||
<li>Take long traces of the entire UI pipeline (from receiving input via an IRQ
|
||||
to final scanout) while running a lightweight and consistent workload (e.g.,
|
||||
<a href="https://android.googlesource.com/platform/frameworks/base.git/+/master/tests/UiBench/">UiBench</a>
|
||||
or the ball test in <a href="#touchlatency">TouchLatency)</a>.</li>
|
||||
<li>Fix the frame drops detected in the lightweight and consistent
|
||||
workload.</li>
|
||||
<li>Repeat steps 3-4 until you can run with zero dropped frames for 20+ seconds
|
||||
at a time. </li>
|
||||
<li>Move on to other user-visible sources of jank.</li>
|
||||
</ol>
|
||||
|
||||
<p>Other simple things you can do early on in device bringup include:</p>
|
||||
|
||||
<ul>
|
||||
<li>Ensure your kernel has the
|
||||
<a href="https://android.googlesource.com/kernel/msm/+/c9f00aa0e25e397533c198a0fcf6246715f99a7b%5E!/">sched_blocked_reason
|
||||
tracepoint patch</a>. This tracepoint is enabled with the sched trace category
|
||||
in systrace and provides the function responsible for sleeping when that
|
||||
thread enters uninterruptible sleep. It is critical for performance analysis
|
||||
because uninterruptible sleep is a very common indicator of jitter.</li>
|
||||
<li>Ensure you have sufficient tracing for the GPU and display pipelines. On
|
||||
recent Qualcomm SOCs, tracepoints are enabled using:</li>
|
||||
<pre class="devsite-click-to-copy">
|
||||
<code class="devsite-terminal">adb shell "echo 1 > /d/tracing/events/kgsl/enable"</code>
|
||||
<code class="devsite-terminal">adb shell "echo 1 > /d/tracing/events/mdss/enable"</code>
|
||||
</pre>
|
||||
|
||||
<p>These events remain enabled when you run systrace so you can see additional
|
||||
information in the trace about the display pipeline (MDSS) in the
|
||||
<code>mdss_fb0</code> section. On Qualcomm SOCs, you won't see any additional
|
||||
information about the GPU in the standard systrace view, but the results are
|
||||
present in the trace itself (for details, see
|
||||
<a href="/devices/tech/debug/systrace.html">Understanding
|
||||
systrace</a>).</p>
|
||||
|
||||
<p>What you want from this kind of display tracing is a single event that
|
||||
directly indicates a frame has been delivered to the display. From there, you
|
||||
can determine if you've hit your frame time successfully; if event X<em>n</em>
|
||||
occurs less than 16.7ms after event X<em>n-1</em> (assuming a 60Hz display),
|
||||
then you know you did not jank. If your SOC does not provide such signals, work
|
||||
with your vendor to get them. Debugging jitter is extremely difficult without a
|
||||
definitive signal of frame completion.</p></ul>
|
||||
|
||||
<h3 id="synthetic_benchmarks">Using synthetic benchmarks</h3>
|
||||
<p>Synthetic benchmarks are useful for ensuring a device's basic functionality
|
||||
is present. However, treating benchmarks as a proxy for perceived device
|
||||
performance is not useful.</p>
|
||||
|
||||
<p>Based on experiences with SOCs, differences in synthetic benchmark
|
||||
performance between SOCs is not correlated with a similar difference in
|
||||
perceptible UI performance (number of dropped frames, 99th percentile frame
|
||||
time, etc.). Synthetic benchmarks are capacity-only benchmarks; jitter impacts
|
||||
the measured performance of these benchmarks only by stealing time from the bulk
|
||||
operation of the benchmark. As a result, synthetic benchmark scores are mostly
|
||||
irrelevant as a metric of user-perceived performance.</p>
|
||||
|
||||
<p>Consider two SOCs running Benchmark X that renders 1000 frames of UI and
|
||||
reports the total rendering time (lower score is better).</p>
|
||||
|
||||
<ul>
|
||||
<li>SOC 1 renders each frame of Benchmark X in 10ms and scores 10,000.</li>
|
||||
<li>SOC 2 renders 99% of frames in 1ms but 1% of frames in 100ms and scores
|
||||
19,900, a dramatically better score.</li>
|
||||
</ul>
|
||||
|
||||
<p>If the benchmark is indicative of actual UI performance, SOC 2 would be
|
||||
unusable. Assuming a 60Hz refresh rate, SOC 2 would have a janky frame every
|
||||
1.5s of operation. Meanwhile, SOC 1 (the slower SOC according to Benchmark X)
|
||||
would be perfectly fluid.</p>
|
||||
|
||||
<h3 id="bug_reports">Using bug reports</h3>
|
||||
<p>Bug reports are sometimes useful for performance analysis, but because they
|
||||
are so heavyweight, they are rarely useful for debugging sporadic jank issues.
|
||||
They may provide some hints on what the system was doing at a given time,
|
||||
especially if the jank was around an application transition (which is logged in
|
||||
a bug report). Bug reports can also indicate when something is more broadly
|
||||
wrong with the system that could reduce its effective capacity (such as thermal
|
||||
throttling or memory fragmentation).</p>
|
||||
|
||||
<h3 id="touchlatency">Using TouchLatency</h3>
|
||||
<p>Several examples of bad behavior come from TouchLatency, which is the
|
||||
preferred periodic workload used for the Pixel and Pixel XL. It's available at
|
||||
<code>frameworks/base/tests/TouchLatency</code> and has two modes: touch latency
|
||||
and bouncing ball (to switch modes, click the button in the upper-right
|
||||
corner).</p>
|
||||
|
||||
<p>The bouncing ball test is exactly as simple as it appears: A ball bounces
|
||||
around the screen forever, regardless of user input. It is usually also
|
||||
<strong>by far</strong> the hardest test to run perfectly, but the closer it
|
||||
comes to running without any dropped frames, the better your device will be. The
|
||||
bouncing ball test is difficult because it is a trivial but perfectly consistent
|
||||
workload that runs at a very low clock (this assumes device has a frequency
|
||||
governor; if the device is instead running with fixed clocks, downclock the
|
||||
CPU/GPU to near-minimum when running the bouncing ball test for the first time).
|
||||
As the system quiesces and the clocks drop closer to idle, the required CPU/GPU
|
||||
time per frame increases. You can watch the ball and see things jank, and you'll
|
||||
be able to see missed frames in systrace as well.</p>
|
||||
|
||||
<p>Because the workload is so consistent, you can identify most sources of
|
||||
jitter much more easily than in most user-visible workloads by tracking what
|
||||
exactly is running on the system during each missed frame instead of the UI
|
||||
pipeline. <strong>The lower clocks amplify the effects of jitter by making it
|
||||
more likely that any jitter causes a dropped frame.</strong> As a result, the
|
||||
closer TouchLatency is to 60FPS, the less likely you are to have bad system
|
||||
behaviors that cause sporadic, hard-to-reproduce jank in larger
|
||||
applications.</p>
|
||||
|
||||
<p>As jitter is often (but not always) clockspeed-invariant, use a test that
|
||||
runs at very low clocks to diagnose jitter for the following reasons:</p>
|
||||
<ul>
|
||||
<li>Not all jitter is clockspeed-invariant; many sources just consume CPU
|
||||
time.</li>
|
||||
<li>The governor should get the average frame time close to the deadline by
|
||||
clocking down, so time spent running non-UI work can push it over the edge to
|
||||
dropping a frame.</li>
|
||||
</ul>
|
||||
|
||||
</body>
|
||||
</html>
|
|
@ -0,0 +1,360 @@
|
|||
<html devsite>
|
||||
<head>
|
||||
<title>Using ftrace</title>
|
||||
<meta name="project_path" value="/_project.yaml" />
|
||||
<meta name="book_path" value="/_book.yaml" />
|
||||
</head>
|
||||
<body>
|
||||
<!--
|
||||
Copyright 2017 The Android Open Source Project
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
|
||||
<p>ftrace is a debugging tool for understanding what is going on inside the
|
||||
Linux kernel. The following sections detail basic ftrace functionality, ftrace
|
||||
usage with atrace (which captures kernel events), and dynamic ftrace.</p>
|
||||
|
||||
<p>For details on advanced ftrace functionality that is not available from
|
||||
systrace, refer to the ftrace documentation at
|
||||
<a href="https://www.kernel.org/doc/Documentation/trace/ftrace.txt"><code><kernel
|
||||
tree>/Documentation/trace/ftrace.txt</code></a>.</p>
|
||||
|
||||
<h2 id="atrace">Capturing kernel events with atrace</h2>
|
||||
<p>atrace (<code>frameworks/native/cmds/atrace</code>) uses ftrace to capture
|
||||
kernel events. In turn, systrace.py (or run_systrace.py in later versions of
|
||||
<a href="https://github.com/catapult-project/catapult">Catapult</a>) uses adb
|
||||
to run atrace on the device. atrace does the following:</p>
|
||||
<ul>
|
||||
<li>Sets up user-mode tracing by setting a property
|
||||
(<code>debug.atrace.tags.enableflags</code>).</li>
|
||||
<li>Enables the desired ftrace functionality by writing to the appropriate
|
||||
ftrace sysfs nodes. However, as ftrace supports more features, you might set
|
||||
some sysfs nodes yourself then use atrace. </li>
|
||||
</ul>
|
||||
|
||||
<p>With the exception of boot-time tracing, rely on using atrace to set the
|
||||
property to the appropriate value. The property is a bitmask and there's no good
|
||||
way to determine the correct values other than looking at the appropriate header
|
||||
(which could change between Android releases).</p>
|
||||
|
||||
<h2 id="enabling_events">Enabling ftrace events</h2>
|
||||
|
||||
<p>The ftrace sysfs nodes are in <code>/d/tracing</code> and trace events are
|
||||
divided into categories in <code>/d/tracing/events</code>.
|
||||
|
||||
<p>To enable events on a per-category basis, use:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
echo 1 > /d/tracing/events/irq/enable
|
||||
</pre>
|
||||
|
||||
<p>To enable events on per-event basis, use:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
echo 1 > /d/tracing/events/sched/sched_wakeup/enable
|
||||
</pre>
|
||||
|
||||
<p>If extra events have been enabled by writing to sysfs nodes, they will
|
||||
<strong>not</strong> be reset by atrace. A common pattern
|
||||
for Qualcomm device bringup is to enable <code>kgsl</code> (GPU) and
|
||||
<code>mdss</code> (display pipeline) tracepoints and then use atrace or
|
||||
<a href="/devices/tech/debug/systrace.html">systrace</a>:</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
<code class="devsite-terminal">adb shell "echo 1 > /d/tracing/events/mdss/enable"</code>
|
||||
<code class="devsite-terminal">adb shell "echo 1 > /d/tracing/events/kgsl/enable"</code>
|
||||
<code class="devsite-terminal">./systrace.py sched freq idle am wm gfx view binder_driver irq workq ss sync -t 10 -b 96000 -o full_trace.html</code>
|
||||
</pre>
|
||||
|
||||
<p>You can also use ftrace without atrace or systrace, which is
|
||||
useful when you want kernel-only traces (or if you've taken the time to write
|
||||
the user-mode tracing property by hand). To run just ftrace:</p>
|
||||
|
||||
<ol>
|
||||
<li>Set the buffer size to a value large enough for your trace:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
echo 96000 > /d/tracing/buffer_size_kb
|
||||
</pre>
|
||||
</li>
|
||||
<li>Enable tracing:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
echo 1 > /d/tracing/tracing_on
|
||||
</pre>
|
||||
</li>
|
||||
<li>Run your test, then disable tracing:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
echo 0 > /d/tracing/tracing_on
|
||||
</pre>
|
||||
</li>
|
||||
<li>Dump the trace:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
cat /d/tracing/trace > /data/local/tmp/trace_output
|
||||
</pre>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<p>The trace_output gives the trace in text form. To visualize it using
|
||||
Catapult, get the
|
||||
<a href="https://github.com/catapult-project/catapult/tree/master/">Catapult
|
||||
repository</a> from Github and run trace2html:</p>
|
||||
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
catapult/tracing/bin/trace2html ~/path/to/trace_file
|
||||
</pre>
|
||||
|
||||
<p>By default, this writes <code>trace_file.html</code> in the same
|
||||
directory.</p>
|
||||
|
||||
<h2 id="correlate">Correlating events</h2>
|
||||
<p>It is often useful to look at the Catapult visualization and the ftrace
|
||||
log simultaneously; for example, some ftrace events (especially vendor-specific
|
||||
ones) are not visualized by Catapult. However, Catapult's timestamps are
|
||||
relative either to the first event in the trace or to a specific timestamp
|
||||
dumped by atrace, while the raw ftrace timestamps are based on a particular
|
||||
absolute clock source in the Linux kernel.</p>
|
||||
|
||||
<p>To find a given ftrace event from a Catapult event:</p>
|
||||
|
||||
<ol>
|
||||
<li>Open the raw ftrace log. Traces in recent versions of systrace are
|
||||
compressed by default:
|
||||
<ul>
|
||||
<li>If you captured your systrace with <code>--no-compress</code>, this is in
|
||||
the html file in the section beginning with BEGIN TRACE.</li>
|
||||
<li>If not, run html2trace from the
|
||||
<a href="https://github.com/catapult-project/catapult/tree/master/">Catapult
|
||||
tree</a> (<code>tracing/bin/html2trace</code>) to uncompress the trace.</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Find the relative timestamp in the Catapult visualization.</li>
|
||||
|
||||
<li>Find a line at the beginning of the trace containing
|
||||
<code>tracing_mark_sync</code>. It should look something like this:
|
||||
<pre class="devsite-click-to-copy">
|
||||
<5134>-5134 (-----) [003] ...1 68.104349: tracing_mark_write: trace_event_clock_sync: parent_ts=68.104286
|
||||
</pre>
|
||||
|
||||
<br>If this line does not exist (or if you used ftrace without atrace), then
|
||||
timings will be relative from the first event in the ftrace log.
|
||||
<ol style="list-style-type: lower-alpha">
|
||||
<li>Add the relative timestamp (in milliseconds) to the value in
|
||||
<code>parent_ts</code> (in seconds).</li>
|
||||
<li>Search for the new timestamp.</li>
|
||||
</ol>
|
||||
</li>
|
||||
</ol>
|
||||
<p>These steps should put you at (or at least very close to) the event.</p>
|
||||
|
||||
<h2 id="dftrace">Using dynamic ftrace</h2>
|
||||
<p>When systrace and standard ftrace are insufficient, there is one last
|
||||
recourse available: <em>dynamic ftrace</em>. Dynamic ftrace involves rewriting
|
||||
of kernel code after boot, and as a result it is not available in production
|
||||
kernels for security reasons. However, every single difficult performance bug in
|
||||
2015 and 2016 was ultimately root-caused using dynamic ftrace. It is especially
|
||||
powerful for debugging uninterruptible sleeps because you can get a stack trace
|
||||
in the kernel every time you hit the function triggering uninterruptible sleep.
|
||||
You can also debug sections with interrupts and preemptions disabled, which can
|
||||
be very useful for proving issues.</p>
|
||||
|
||||
<p>To turn on dynamic ftrace, edit your kernel's defconfig:</p>
|
||||
|
||||
<ol>
|
||||
<li>Remove CONFIG_STRICT_MEMORY_RWX (if it's present). If you're on 3.18 or
|
||||
newer and arm64, it's not there.</li>
|
||||
<li>Add the following: CONFIG_DYNAMIC_FTRACE=y, CONFIG_FUNCTION_TRACER=y,
|
||||
CONFIG_IRQSOFF_TRACER=y, CONFIG_FUNCTION_PROFILER=y, and CONFIG_PREEMPT_TRACER=y
|
||||
</li>
|
||||
<li>Rebuild and boot the new kernel.</li>
|
||||
<li>Run the following to check for available tracers:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
cat /d/tracing/available_tracers
|
||||
</pre>
|
||||
</li>
|
||||
<li>Confirm the command returns <code>function</code>, <code>irqsoff</code>,
|
||||
<code>preemptoff</code>, and <code>preemptirqsoff</code>.</li>
|
||||
<li>Run the following to ensure dynamic ftrace is working:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
cat /d/tracing/available_filter_functions | grep <a function you care about>
|
||||
</pre>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<p>After completing these steps, you have dynamic ftrace, the function profiler,
|
||||
the irqsoff profiler, and the preemptoff profiler available. We <strong>strongly
|
||||
recommend</strong> reading ftrace documentation on these topics before using
|
||||
them as they are powerful but complex. irqsoff and preemptoff are primarily
|
||||
useful for confirming that drivers may be leaving interrupts or preemption
|
||||
turned off for too long.</p>
|
||||
<p>The function profiler is the best option for performance issues and is often
|
||||
used to find out where a function is being called.</p>
|
||||
|
||||
<section class="expandable">
|
||||
<h4 class="showalways">Show Issue: HDR photo + rotating viewfinder</h4>
|
||||
|
||||
<p>In this issue, using a Pixel XL to take an HDR+ photo then immediately
|
||||
rotating the viewfinder caused jank every time. We used the function profiler to
|
||||
debug the issue in less than one hour. To follow along with the example,
|
||||
<a href="perf_traces.zip">download the zip file</a> of traces (which also
|
||||
includes other traces referred to in this section), unzip the file, and open the
|
||||
trace_30898724.html file in your browser.</p>
|
||||
|
||||
<p>The trace shows several threads in the cameraserver process blocked in
|
||||
uninterruptible sleep on <code>ion_client_destroy</code>. That's an expensive
|
||||
function, but it should be called very infrequently because ion clients should
|
||||
encompass many allocations. Initially, the blame fell on the Hexagon code in
|
||||
Halide, which was indeed one of the culprits (it created a new client for every
|
||||
ion allocation and destroyed that client when the allocation was freed, which
|
||||
was way too expensive). Moving to a single ion client for all Hexagon
|
||||
allocations improved the situation, but the jank wasn't fixed.</p>
|
||||
<p>At this point we need to know who is calling <code>ion_client_destroy</code>,
|
||||
so it's time to use the function profiler:</p>
|
||||
<p></p>
|
||||
<ol>
|
||||
<li>As functions are sometimes renamed by the compiler, confirm
|
||||
<code>ion_client_destroy</code> is there by using:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
cat /d/tracing/available_filter_functions | grep ion_client_destroy
|
||||
</pre>
|
||||
</li>
|
||||
<li>After confirming it is there, use it as the ftrace filter:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
echo ion_client_destroy > /d/tracing/set_ftrace_filter
|
||||
</pre>
|
||||
</li>
|
||||
<li>Turn on the function profiler:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
echo function > /d/tracing/current_tracer
|
||||
</pre>
|
||||
</li>
|
||||
<li>Turn on stack traces whenever a filter function is called:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
echo func_stack_trace > /d/tracing/trace_options
|
||||
</pre>
|
||||
</li>
|
||||
<li>Increase the buffer size:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
echo 64000 > /d/tracing/buffer_size_kb
|
||||
</pre>
|
||||
</li>
|
||||
<li>Turn on tracing:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
echo 1 > /d/tracing/trace_on
|
||||
</pre>
|
||||
</li>
|
||||
<li>Run the test and get the trace:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
cat /d/tracing/trace > /data/local/tmp/trace
|
||||
</pre>
|
||||
</li>
|
||||
<li>View the trace to see lots and lots of stack traces:
|
||||
<pre class="devsite-click-to-copy">
|
||||
cameraserver-643 [003] ...1 94.192991: ion_client_destroy <-ion_release
|
||||
cameraserver-643 [003] ...1 94.192997: <stack trace>
|
||||
=> ftrace_ops_no_ops
|
||||
=> ftrace_graph_call
|
||||
=> ion_client_destroy
|
||||
=> ion_release
|
||||
=> __fput
|
||||
=> ____fput
|
||||
=> task_work_run
|
||||
=> do_notify_resume
|
||||
=> work_pending
|
||||
</pre>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<p>Based on inspection of the ion driver, we can see that
|
||||
<code>ion_client_destroy</code> is being spammed by a userspace function closing
|
||||
an fd to <code>/dev/ion</code>, not a random kernel driver. By searching the
|
||||
Android codebase for <code>\"/dev/ion\"</code>, we find several vendor drivers
|
||||
doing the same thing as the Hexagon driver and opening/closing
|
||||
<code>/dev/ion</code> (creating and destroying a new ion client) every time they
|
||||
need a new ion allocation. Changing those to
|
||||
<a href="https://android.googlesource.com/platform/hardware/qcom/camera/+/8f7984018b6643f430c229725a58d3c6bb04acab">use
|
||||
a single ion client</a> for the lifetime of the process fixed the bug.</p>
|
||||
</section>
|
||||
<hr>
|
||||
|
||||
<p>If the data from function profiler isn't specific enough, you can combine
|
||||
ftrace tracepoints with the function profiler. ftrace events can be enabled in
|
||||
exactly the same way as usual, and they will be interleaved with your trace.
|
||||
This is great if there's an occasional long uninterruptible sleep in a specific
|
||||
function you want to debug: set the ftrace filter to the function you want,
|
||||
enable tracepoints, take a trace. You can parse the resulting trace with
|
||||
<code>trace2html</code>, find the event you want, then get nearby stack traces
|
||||
in the raw trace.</p>
|
||||
|
||||
<h2 id="lock_stat">Using lockstat</h2>
|
||||
<p>Sometimes, ftrace isn't enough and you really need to debug what appears to
|
||||
be kernel lock contention. There is one more kernel option worth trying:
|
||||
<code>CONFIG_LOCK_STAT</code>. This is a last resort as it is extremely
|
||||
difficult to get working on Android devices because it inflates the size of the
|
||||
kernel beyond what most devices can handle.</p>
|
||||
<p>However, lockstat uses the debug
|
||||
locking infrastructure, which is useful for many other applications. Everyone
|
||||
working on device bringup should figure out some way to get that option working
|
||||
on every device because there <strong>will</strong> be a time when you think
|
||||
"If only I could turn on <code>LOCK_STAT</code>, I could confirm or refute this
|
||||
as the problem in five minutes instead of five days."</p>
|
||||
|
||||
<section class="expandable">
|
||||
<h4 class="showalways">Show Issue: Stall in SCHED_FIFO when cores at max load
|
||||
with non-SCHED_FIFO</h4>
|
||||
|
||||
<p>In this issue, the SCHED_FIFO thread stalled when all cores were at maximum
|
||||
load with non-SCHED_FIFO threads. We had traces showing significant lock
|
||||
contention on an fd in VR apps, but we couldn't easily identify the fd in use.
|
||||
To follow along with the example, <a href="perf_traces.zip">download the zip
|
||||
file</a> of traces (which also includes other traces referred to in this
|
||||
section), unzip the file, and open the trace_30905547.html file in your browser.
|
||||
</p>
|
||||
|
||||
<p>We hypothesized that ftrace itself was the source of lock contention, when a
|
||||
low priority thread would start writing to the ftrace pipe and then get
|
||||
preempted before it could release the lock. This is a worst-case scenario that
|
||||
was exacerbated by a mixture of extremely low-priority threads writing to the
|
||||
ftrace marker along with some higher priority threads spinning on CPUs to
|
||||
simulate a completely loaded device.</p>
|
||||
<p>As we couldn't use ftrace to debug, we got <code>LOCK_STAT</code> working
|
||||
then turned off all other tracing from the app. The results showed the lock
|
||||
contention was actually from ftrace because none of the contention showed up in
|
||||
the lock trace when ftrace was not running.</p>
|
||||
</section>
|
||||
<hr>
|
||||
|
||||
<p>If you can boot a kernel with the config option, lock tracing is similar to
|
||||
ftrace:</p>
|
||||
<ol>
|
||||
<li>Enable tracing:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
echo 1 > /proc/sys/kernel/lock_stat
|
||||
</pre>
|
||||
</li>
|
||||
<li>Run your test.</li>
|
||||
<li>Disable tracing:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
echo 0 > /proc/sys/kernel/lock_stat
|
||||
</pre>
|
||||
</li>
|
||||
<li>Dump your trace:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
cat /proc/lock_stat > /data/local/tmp/lock_stat
|
||||
</pre>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<p>For help interpreting the resulting output, refer to lockstat documentation
|
||||
at <a href="https://www.kernel.org/doc/Documentation/locking/lockstat.txt"><code><kernel>/Documentation/locking/lockstat.txt</code></a>.</p>
|
||||
|
||||
</body>
|
||||
</html>
|
141
android/docs/source.android.com/en/devices/tech/debug/gdb.html
Normal file
|
@ -0,0 +1,141 @@
|
|||
<html devsite>
|
||||
<head>
|
||||
<title>Using GDB</title>
|
||||
<meta name="project_path" value="/_project.yaml" />
|
||||
<meta name="book_path" value="/_book.yaml" />
|
||||
</head>
|
||||
<body>
|
||||
<!--
|
||||
Copyright 2017 The Android Open Source Project
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
|
||||
<p>The GNU Project debugger (GDB) is a commonly used Unix debugger. This page
|
||||
details using <code>gdb</code> to debug Android apps and processes.</p>
|
||||
|
||||
<h2 id=running>Debugging running apps or processes</h2>
|
||||
|
||||
<p>To connect to an already-running app or native daemon, use
|
||||
<code>gdbclient</code> with a PID. For example, to debug the process with PID
|
||||
1234, run:</p>
|
||||
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
gdbclient 1234
|
||||
</pre>
|
||||
|
||||
<p>The script sets up port forwarding, starts the appropriate
|
||||
<code>gdbserver</code> on the device, starts the appropriate <code>gdb</code> on
|
||||
the host, configures <code>gdb</code> to find symbols, and connects
|
||||
<code>gdb</code> to the remote <code>gdbserver</code>.</p>
|
||||
|
||||
<h2 id=starts>Debugging native process startup</h2>
|
||||
|
||||
<p>To debug a process as it starts, use <code>gdbserver</code> or
|
||||
<code>gdbserver64</code> (for 64-bit processes). For example:</p>
|
||||
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
adb shell gdbserver :5039 /system/bin/<var>MY_TEST_APP</var>
|
||||
</pre>
|
||||
|
||||
<p>Example output:</p>
|
||||
<pre class="devsite-click-to-copy">
|
||||
Process <var>MY_TEST_APP</var> created; pid = 3460
|
||||
Listening on port 5039
|
||||
</pre>
|
||||
|
||||
<p>Next, identify the application PID from the <code>gdbserver</code> output and
|
||||
use it in another terminal window:</p>
|
||||
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
gdbclient <var>APP_PID</var>
|
||||
</pre>
|
||||
|
||||
<p>Finally, enter <strong>continue</strong> at the <code>gdb</code> prompt.</p>
|
||||
|
||||
<p class="note"><strong>Note:</strong> If you specify the wrong
|
||||
<code>gdbserver</code>, you'll get an unhelpful error message (such as
|
||||
"<code>Reply contains invalid hex digit 59</code>").</p>
|
||||
|
||||
<h2 id="app-startup">Debugging app startup</h2>
|
||||
|
||||
<p>Sometimes you want to debug an app as it starts, such as when there's a crash
|
||||
and you want to step through code to see what happens <em>before</em> the crash.
|
||||
<a href="#running">Attaching</a> works in some cases, but in other cases is
|
||||
impossible because the app crashes before you can attach. The
|
||||
<code>logwrapper</code> approach (used for <code>strace</code> and
|
||||
<code>valgrind</code>) does not always work because the app might not have
|
||||
permissions to open a port, and <code>gdbserver</code> inherits that
|
||||
restriction.</p>
|
||||
|
||||
<p>To debug app startup, use the developer options in Settings to instruct
|
||||
the app to wait for a Java debugger to attach:</p>
|
||||
|
||||
<ol>
|
||||
<li>Go to <em>Settings > Developer options > Select debug app</em> and choose
|
||||
your app from the list, then press <strong>Wait for debugger</strong>.</li>
|
||||
|
||||
<li>Start the app, either from the launcher or by using the command line to run:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
am start -a android.intent.action.MAIN -n <var>APP_NAME</var>/.<var>APP_ACTIVITY</var>
|
||||
</pre></li>
|
||||
|
||||
<li>Wait for the app to load and a dialog to appear telling you the app is
|
||||
waiting for a debugger.</li>
|
||||
|
||||
<li>Attach <code>gdbserver</code>/<code>gdbclient</code> normally, set
|
||||
breakpoints, then continue the process.</li></ol>
|
||||
|
||||
<p>To let the app actually run, attach a Java Debug Wire Protocol (JDWP)
|
||||
debugger such as Java Debugger (jdb):</p>
|
||||
<pre class="devsite-click-to-copy">
|
||||
<code class="devsite-terminal">adb forward tcp:12345 jdwp:<var>XXX</var> # (Where XXX is the pid of the debugged process.)</code>
|
||||
<code class="devsite-terminal">jdb -attach localhost:12345</code>
|
||||
</pre>
|
||||
|
||||
<h2 id=crash>Debugging apps or processes that crash</h2>
|
||||
|
||||
<p>If you want <code>debuggerd</code> to suspend crashed processes so you can
|
||||
attach <code>gdb</code>, set the appropriate property:</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
# Android 7.0 Nougat and later.
|
||||
<code class="devsite-terminal">adb shell setprop debug.debuggerd.wait_for_gdb true</code>
|
||||
</pre>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
# Android 6.0 Marshmallow and earlier.
|
||||
<code class="devsite-terminal">adb shell setprop debug.db.uid 999999</code>
|
||||
</pre>
|
||||
|
||||
<p>At the end of the usual crash output, <code>debuggerd</code> provides
|
||||
instructions on how to connect <code>gdb</code> using the command:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
gdbclient <var>PID</var>
|
||||
</pre>
|
||||
|
||||
|
||||
<h2 id=symbols>Debugging without symbols</h2>
|
||||
|
||||
<p>For 32-bit ARM, if you don’t have symbols, <code>gdb</code> can get confused
|
||||
about the instruction set it is disassembling (ARM or Thumb). To specify the
|
||||
instruction set chosen as the default when symbol information is missing, set
|
||||
the following property:</p>
|
||||
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
set arm fallback-mode arm # or thumb
|
||||
</pre>
|
||||
|
||||
</body>
|
||||
</html>
|
After Width: | Height: | Size: 64 KiB |
After Width: | Height: | Size: 37 KiB |
After Width: | Height: | Size: 57 KiB |
After Width: | Height: | Size: 35 KiB |
After Width: | Height: | Size: 54 KiB |
After Width: | Height: | Size: 29 KiB |
After Width: | Height: | Size: 43 KiB |
After Width: | Height: | Size: 53 KiB |
After Width: | Height: | Size: 52 KiB |
After Width: | Height: | Size: 46 KiB |
After Width: | Height: | Size: 55 KiB |
After Width: | Height: | Size: 54 KiB |
After Width: | Height: | Size: 79 KiB |
After Width: | Height: | Size: 36 KiB |
After Width: | Height: | Size: 59 KiB |
After Width: | Height: | Size: 32 KiB |
170
android/docs/source.android.com/en/devices/tech/debug/index.html
Normal file
|
@ -0,0 +1,170 @@
|
|||
<html devsite>
|
||||
<head>
|
||||
<title>Debugging Native Android Platform Code</title>
|
||||
<meta name="project_path" value="/_project.yaml" />
|
||||
<meta name="book_path" value="/_book.yaml" />
|
||||
</head>
|
||||
<body>
|
||||
<!--
|
||||
Copyright 2017 The Android Open Source Project
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
|
||||
|
||||
<p>This section summarizes useful tools and related commands for debugging,
|
||||
tracing, and profiling native Android platform code when developing
|
||||
platform-level features.</p>
|
||||
|
||||
<p>This page covers use of <code>debuggerd</code>, a daemon process for
|
||||
collecting error information after applications crash. Other pages in this
|
||||
section explore system services with
|
||||
<a href="https://developer.android.com/studio/command-line/dumpsys.html">Dumpsys</a>, viewing
|
||||
<a href="/devices/tech/debug/native-memory.html">native memory</a>,
|
||||
<a href="https://developer.android.com/studio/command-line/dumpsys.html#network">network</a>, and
|
||||
<a href="https://developer.android.com/studio/command-line/dumpsys.html#procstats">RAM</a> usage, using
|
||||
<a href="/devices/tech/debug/asan.html">AddressSanitizer</a> to detect memory
|
||||
bugs in native code, evaluating
|
||||
<a href="/devices/tech/debug/eval_perf.html"> performance issues</a> (includes
|
||||
<a href="/devices/tech/debug/systrace">systrace</a>), and using
|
||||
<a href="/devices/tech/debug/gdb.html">GNU Project debugger (GDB)</a> and
|
||||
other debugging tools.</p>
|
||||
|
||||
<h2 id=debuggerd>Using debuggerd</h2>
|
||||
|
||||
<p>The <code>debuggerd</code> process dumps registers and unwinds the stack.
|
||||
When a dynamically linked executable starts, several signal handlers are
|
||||
registered that connect to <code>debuggerd</code> (or <code>debuggerd64)</code>
|
||||
in the event that signals (such as SIGSEGV or SIGABRT) are sent to the process.</p>
|
||||
|
||||
<p>It's possible for <code>debuggerd</code> to attach only if nothing else is
|
||||
already attached, which means using tools such as <code>strace</code> or
|
||||
<code>gdb</code> will prevent <code>debuggerd</code> from working. You can also
|
||||
explicitly prevent <code>debuggerd</code> from attaching by calling
|
||||
<code>prctl(PR_SET_DUMPABLE, 0)</code>, which can be useful when you need to
|
||||
opt out of crash reporting.</p>
|
||||
|
||||
<p>Example <code>debuggerd</code> output (with timestamps and extraneous
|
||||
information removed):</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
|
||||
Build fingerprint: 'Android/aosp_angler/angler:7.1.1/NYC/enh12211018:eng/test-keys'
|
||||
Revision: '0'
|
||||
ABI: 'arm'
|
||||
pid: 17946, tid: 17949, name: crasher >>> crasher <<<
|
||||
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xc
|
||||
r0 0000000c r1 00000000 r2 00000000 r3 00000000
|
||||
r4 00000000 r5 0000000c r6 eccdd920 r7 00000078
|
||||
r8 0000461a r9 ffc78c19 sl ab209441 fp fffff924
|
||||
ip ed01b834 sp eccdd800 lr ecfa9a1f pc ecfd693e cpsr 600e0030
|
||||
|
||||
backtrace:
|
||||
#00 pc 0004793e /system/lib/libc.so (pthread_mutex_lock+1)
|
||||
#01 pc 0001aa1b /system/lib/libc.so (readdir+10)
|
||||
#02 pc 00001b91 /system/xbin/crasher (readdir_null+20)
|
||||
#03 pc 0000184b /system/xbin/crasher (do_action+978)
|
||||
#04 pc 00001459 /system/xbin/crasher (thread_callback+24)
|
||||
#05 pc 00047317 /system/lib/libc.so (_ZL15__pthread_startPv+22)
|
||||
#06 pc 0001a7e5 /system/lib/libc.so (__start_thread+34)
|
||||
Tombstone written to: /data/tombstones/tombstone_06
|
||||
</pre>
|
||||
|
||||
<p>The last line of <code>debuggerd</code> output dumps a summary to the log and
|
||||
writes a full <em>tombstone</em> to disk. The tombstone is simply a file with
|
||||
extra data about the crashed process; it contains information that can be
|
||||
helpful in debugging a crash, in particular the stack traces for all threads in
|
||||
the crashing process (not just the thread that caught the signal) and a full
|
||||
memory map.</p>
|
||||
|
||||
<p>Assuming the unstripped binaries can be found, you can get a more detailed
|
||||
unwind with line number information by pasting the above example into
|
||||
<code>development/scripts/stack</code>:</p>
|
||||
|
||||
<p class="key-point"><strong>Tip:</strong> For convenience, if you've lunched
|
||||
<code>stack</code> will be on your $PATH already so you don't need to give the
|
||||
full path.</p>
|
||||
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
development/tools/stack
|
||||
</pre>
|
||||
|
||||
<p>Example output:</p>
|
||||
<pre class="devsite-click-to-copy">
|
||||
Reading native crash info from stdin
|
||||
03-02 23:53:49.477 17951 17951 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
|
||||
03-02 23:53:49.477 17951 17951 F DEBUG : Build fingerprint: 'Android/aosp_angler/angler:7.1.1/NYC/enh12211018:eng/test-keys'
|
||||
03-02 23:53:49.477 17951 17951 F DEBUG : Revision: '0'
|
||||
03-02 23:53:49.477 17951 17951 F DEBUG : ABI: 'arm'
|
||||
03-02 23:53:49.478 17951 17951 F DEBUG : pid: 17946, tid: 17949, name: crasher >>> crasher <<<
|
||||
03-02 23:53:49.478 17951 17951 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xc
|
||||
03-02 23:53:49.478 17951 17951 F DEBUG : r0 0000000c r1 00000000 r2 00000000 r3 00000000
|
||||
03-02 23:53:49.478 17951 17951 F DEBUG : r4 00000000 r5 0000000c r6 eccdd920 r7 00000078
|
||||
03-02 23:53:49.478 17951 17951 F DEBUG : r8 0000461a r9 ffc78c19 sl ab209441 fp fffff924
|
||||
03-02 23:53:49.478 17951 17951 F DEBUG : ip ed01b834 sp eccdd800 lr ecfa9a1f pc ecfd693e cpsr 600e0030
|
||||
03-02 23:53:49.491 17951 17951 F DEBUG :
|
||||
03-02 23:53:49.491 17951 17951 F DEBUG : backtrace:
|
||||
03-02 23:53:49.492 17951 17951 F DEBUG : #00 pc 0004793e /system/lib/libc.so (pthread_mutex_lock+1)
|
||||
03-02 23:53:49.492 17951 17951 F DEBUG : #01 pc 0001aa1b /system/lib/libc.so (readdir+10)
|
||||
03-02 23:53:49.492 17951 17951 F DEBUG : #02 pc 00001b91 /system/xbin/crasher (readdir_null+20)
|
||||
03-02 23:53:49.492 17951 17951 F DEBUG : #03 pc 0000184b /system/xbin/crasher (do_action+978)
|
||||
03-02 23:53:49.492 17951 17951 F DEBUG : #04 pc 00001459 /system/xbin/crasher (thread_callback+24)
|
||||
03-02 23:53:49.492 17951 17951 F DEBUG : #05 pc 00047317 /system/lib/libc.so (_ZL15__pthread_startPv+22)
|
||||
03-02 23:53:49.492 17951 17951 F DEBUG : #06 pc 0001a7e5 /system/lib/libc.so (__start_thread+34)
|
||||
03-02 23:53:49.492 17951 17951 F DEBUG : Tombstone written to: /data/tombstones/tombstone_06
|
||||
Reading symbols from /huge-ssd/aosp-arm64/out/target/product/angler/symbols
|
||||
Revision: '0'
|
||||
pid: 17946, tid: 17949, name: crasher >>> crasher <<<
|
||||
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xc
|
||||
r0 0000000c r1 00000000 r2 00000000 r3 00000000
|
||||
r4 00000000 r5 0000000c r6 eccdd920 r7 00000078
|
||||
r8 0000461a r9 ffc78c19 sl ab209441 fp fffff924
|
||||
ip ed01b834 sp eccdd800 lr ecfa9a1f pc ecfd693e cpsr 600e0030
|
||||
Using arm toolchain from: /huge-ssd/aosp-arm64/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/bin/
|
||||
|
||||
Stack Trace:
|
||||
RELADDR FUNCTION FILE:LINE
|
||||
0004793e pthread_mutex_lock+2 bionic/libc/bionic/pthread_mutex.cpp:515
|
||||
v------> ScopedPthreadMutexLocker bionic/libc/private/ScopedPthreadMutexLocker.h:27
|
||||
0001aa1b readdir+10 bionic/libc/bionic/dirent.cpp:120
|
||||
00001b91 readdir_null+20 system/core/debuggerd/crasher.cpp:131
|
||||
0000184b do_action+978 system/core/debuggerd/crasher.cpp:228
|
||||
00001459 thread_callback+24 system/core/debuggerd/crasher.cpp:90
|
||||
00047317 __pthread_start(void*)+22 bionic/libc/bionic/pthread_create.cpp:202 (discriminator 1)
|
||||
0001a7e5 __start_thread+34 bionic/libc/bionic/clone.cpp:46 (discriminator 1)
|
||||
</pre>
|
||||
|
||||
<p class="note"><strong>Note:</strong> Some system libraries are built with
|
||||
<code>LOCAL_STRIP_MODULE := keep_symbols</code> to provide usable backtraces
|
||||
directly from <code>debuggerd</code> without taking up anywhere near as much
|
||||
space as an unstripped version.</p>
|
||||
|
||||
<p>You can also <code>stack</code> an entire tombstone. Example:</p>
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
stack < FS/data/tombstones/tombstone_05</code>
|
||||
</pre>
|
||||
<p>This is useful if you've just unzipped a bugreport in the current directory.
|
||||
For more information about diagnosing native crashes and tombstones, see
|
||||
<a href="/devices/tech/debug/native-crash.html">Diagnosing Native Crashes</a>.
|
||||
</p>
|
||||
|
||||
<h2 id="tombstone">Getting a stack trace/tombstone from a running process</h2>
|
||||
|
||||
<p>You can also use <code>debuggerd</code> on a running process. From the
|
||||
command line, invoke <code>debuggerd</code> using a process ID (PID) to dump the
|
||||
full tombstone to <code>stdout</code>. To get just the stack for every thread in
|
||||
the process, include the <code>-b</code> or <code>--backtrace</code> flag.</p>
|
||||
|
||||
</body>
|
||||
</html>
|
|
@ -0,0 +1,127 @@
|
|||
<html devsite>
|
||||
<head>
|
||||
<title>Identifying Capacity-Related Jank</title>
|
||||
<meta name="project_path" value="/_project.yaml" />
|
||||
<meta name="book_path" value="/_book.yaml" />
|
||||
</head>
|
||||
<body>
|
||||
<!--
|
||||
Copyright 2017 The Android Open Source Project
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
|
||||
<p>Capacity is the total amount of some resource (CPU, GPU, etc.) a device
|
||||
possesses over some amount of time. This page describes how to identify and
|
||||
address capacity-related jank issues.</p>
|
||||
|
||||
<h2 id="governor">Governor slow to react</h2>
|
||||
<p>To avoid jank, the CPU frequency governor needs to be able to respond quickly
|
||||
to bursty workloads. Most UI applications follow the same basic pattern:</p>
|
||||
<ol>
|
||||
<li>User is reading the screen.</li>
|
||||
<li>User touches the screen: taps a button, scrolls, etc.</li>
|
||||
<li>Screen scrolls, changes activity, or animates in some way in response to
|
||||
input.</li>
|
||||
<li>System quiesces as new content is displayed.</li>
|
||||
<li>User goes back to reading the screen.</li>
|
||||
</ol>
|
||||
|
||||
<p>Pixel and Nexus devices implement touch boost to modify CPU frequency
|
||||
governor (and scheduler) behavior on touch. To avoid a slow ramp to a high clock
|
||||
frequency (which could cause the device to drop frames on touch), touch boost
|
||||
usually sets a frequency floor on the CPU to ensure plenty of CPU capacity is
|
||||
available on touch. A floor lasts for some amount of time after touch (usually
|
||||
around two seconds).</p>
|
||||
|
||||
<p>Pixel also uses the schedtune cgroup provided by Energy Aware Scheduling
|
||||
(EAS) as an additional touch boost signal: Top applications get additional
|
||||
weight via schedtune to ensure they get enough CPU capacity to run quickly. The
|
||||
Nexus 5X and 6P have a much bigger performance gap between the little and big
|
||||
CPU clusters (A53 and A57, respectively) than the Pixel with the Kryo CPU. We
|
||||
found that the little CPU cluster was not always adequate for smooth UI
|
||||
rendering, especially given other sources of jitter on the device.</p>
|
||||
|
||||
<p>Accordingly, on the Nexus 5X and 6P, touch boost modifies the scheduler
|
||||
behavior to make it more likely for foreground applications to move to the big
|
||||
cores (this is conceptually similar to the floor on CPU frequency). Without the
|
||||
scheduler change to make foreground applications more likely to move to the big
|
||||
CPU cluster, foreground applications may have insufficient CPU capacity to
|
||||
render until the scheduler decided to load balance the thread to a big CPU core.
|
||||
By changing the scheduler behavior during touch boost, it is more likely for a
|
||||
UI thread to immediately run on a big core and avoid jank while not forcing it
|
||||
to always run on a big core, which could have severe impacts on power
|
||||
consumption.</p>
|
||||
|
||||
<h2 id="thermal-throttling">Thermal throttling</h2>
|
||||
<p>Thermal throttling occurs when the device must reduce its overall thermal
|
||||
output, usually performed by reducing CPU, GPU, and DRAM clocks. Unsurprisingly,
|
||||
this often results in jank as the system may no longer be able to provide enough
|
||||
capacity to render within a given timeslice. The only way to avoid thermal
|
||||
throttling is to use less power. There are not a lot of ways to do this, but
|
||||
based on our experiences with past SOCs, we have a few recommendations for
|
||||
system vendors.</p>
|
||||
|
||||
<p>First, when building a new SOC with heterogeneous CPU architectures, ensure
|
||||
the performance/W curves of the CPU clusters overlap. The overall performance/W
|
||||
curve for the entire processor should be a continuous line. Discontinuities in
|
||||
the perf/W curve force the scheduler and frequency governor to guess what a
|
||||
workload needs; to prevent jank, the scheduler and frequency governor err on
|
||||
the side of giving the workload more capacity than it requires. This results in
|
||||
spending too much power, which contributes to thermal throttling.</p>
|
||||
|
||||
<p>Imagine a hypothetical SOC with two CPU clusters:</p>
|
||||
<ul>
|
||||
<li>Cluster 1, the little cluster, can spend between 100-300mW and scores
|
||||
100-300 in a throughput benchmark depending on clocks.</li>
|
||||
<li>Cluster 2, the big cluster, can spend between 1000 and 1600mW and scores
|
||||
between 800 and 1200 in the same throughput benchmark depending on clocks.</li>
|
||||
</ul>
|
||||
<p>In this benchmark, a higher score is faster. While not more desirable than
|
||||
slower, faster == greater power consumption.</p>
|
||||
|
||||
<p>If the scheduler believes a UI workload would require the equivalent of a
|
||||
score of 310 on that throughput benchmark, its best option to avoid jank is to
|
||||
run the big cluster at the lowest frequency, wasting significant power. (This
|
||||
depends on cpuidle behavior and race to idle; SOCs with continuous perf/W curves
|
||||
are easier to optimize for.)</p>
|
||||
|
||||
<p>Second, use cpusets. Ensure you have enabled cpusets in your kernel and in
|
||||
your <code>BoardConfig.mk</code>. You must also set up the actual cpuset
|
||||
assignments in your device-specific <code>init.rc</code>. Some vendors leave
|
||||
this disabled in their BSPs in the hopes they can use other hints to influence
|
||||
scheduler behavior; we feel this doesn't make sense. cpusets are useful for
|
||||
ensuring load balancing between CPUs is done in a way that reflects what the
|
||||
user is actually doing on the device.</p>
|
||||
|
||||
<p>ActivityManager assigns apps to different cpusets based on the relative
|
||||
importance of those apps (top, foreground, background), with more important
|
||||
applications getting more access to CPU cores. This helps ensure quality of
|
||||
service for foreground and top apps.</p>
|
||||
|
||||
<p>cpusets are useful on homogeneous CPU configurations, but you should not ship
|
||||
a device with a heterogeneous CPU configuration without cpusets enabled. Nexus
|
||||
6P is a good model for how to use cpusets on heterogeneous CPU configurations;
|
||||
use that as a basis for your own device's configuration.</p>
|
||||
|
||||
<p>cpusets also offer power advantages by ensuring background threads that are
|
||||
not performance-critical never get load balanced to big CPU cores, where they
|
||||
could spend significantly more power without any user-perceived benefit. This
|
||||
can help to avoid thermal throttling as well. While thermal throttling is a
|
||||
capacity issue, jitter improvements have an outsize impact on UI performance
|
||||
when thermal throttling. Because the system will be running closer to its
|
||||
ability to render 60FPS, it takes less jitter to cause a dropped frame.</p>
|
||||
|
||||
</body>
|
||||
</html>
|
|
@ -0,0 +1,430 @@
|
|||
<html devsite>
|
||||
<head>
|
||||
<title>Identifying Jitter-Related Jank</title>
|
||||
<meta name="project_path" value="/_project.yaml" />
|
||||
<meta name="book_path" value="/_book.yaml" />
|
||||
</head>
|
||||
<body>
|
||||
<!--
|
||||
Copyright 2017 The Android Open Source Project
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<p>Jitter is the random system behavior that prevents perceptible work from
|
||||
running. This page describes how to identify and address jitter-related jank
|
||||
issues.</p>
|
||||
|
||||
<h2 id="application">Application thread scheduler delay</h2>
|
||||
<p>Scheduler delay is the most obvious symptom of jitter: A process that should
|
||||
be run is made runnable but does not run for some significant amount of time.
|
||||
The significance of the delay varies according to the context. For example:</p>
|
||||
<ul>
|
||||
<li>A random helper thread in an app can probably be delayed for many
|
||||
milliseconds without an issue.</li>
|
||||
<li>An application's UI thread may be able to tolerate 1-2ms of jitter.</li>
|
||||
<li>Driver kthreads running as SCHED_FIFO may cause issues if they are runnable
|
||||
for 500us before running.</li>
|
||||
</ul>
|
||||
<p>Runnable times can be identified in systrace by the blue bar preceding a
|
||||
running segment of a thread. A runnable time can also be determined by the
|
||||
length of time between the <code>sched_wakeup</code> event for a thread and the
|
||||
<code>sched_switch</code> event that signals the start of thread execution.</p>
|
||||
|
||||
<h3 id="long_threads">Threads that run too long</h3>
|
||||
<p>Application UI threads that are runnable for too long can cause problems.
|
||||
Lower-level threads with long runnable times generally have different causes,
|
||||
but attempting to push UI thread runnable time toward zero may require fixing
|
||||
some of the same issues that cause lower level threads to have long runnable
|
||||
times. To mitigate delays:</p>
|
||||
<ol>
|
||||
<li>Use cpusets as described in
|
||||
<a href="/devices/tech/debug/jank_capacity.html#thermal-throttling">Thermal
|
||||
throttling</a>.</li>
|
||||
<li>Increase the CONFIG_HZ value.<ul>
|
||||
<li>Historically, the value has been set to 100 on arm and arm64 platforms.
|
||||
However, this is an accident of history and is not a good value to use for
|
||||
interactive devices. CONFIG_HZ=100 means that a jiffy is 10ms long, which means
|
||||
that load balancing between CPUs may take 20ms (two jiffies) to happen. This can
|
||||
significantly contribute to jank on a loaded system.</li>
|
||||
<li>Recent devices (Nexus 5X, Nexus 6P, Pixel, and Pixel XL) have shipped with
|
||||
CONFIG_HZ=300. This should have a negligible power cost while significantly
|
||||
improving runnable times. If you do see significant increases in power
|
||||
consumption or performance issues after changing CONFIG_HZ, it's likely one of
|
||||
your drivers is using a timer based on raw jiffies instead of milliseconds and
|
||||
converting to jiffies. This is usually an easy fix (see the
|
||||
<a href="https://android.googlesource.com/kernel/msm.git/+/daf0cdf29244ce4098cff186d16df23cfa782993%5E!/">patch</a>
|
||||
that fixed kgsl timer issues on Nexus 5X and 6P when converting to
|
||||
CONFIG_HZ=300).</li>
|
||||
<li>Finally, we've experimented with CONFIG_HZ=1000 on Nexus/Pixel and found it
|
||||
offers a noticeable performance and power reduction due to decreased RCU
|
||||
overhead.</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ol>
|
||||
<p>With those two changes alone, a device should look much better for UI thread
|
||||
runnable time under load.</p>
|
||||
|
||||
<h3 id="sys.use_fifo_ui">Using sys.use_fifo_ui</h3>
|
||||
<p>You can try to drive UI thread runnable time to zero by setting the
|
||||
<code>sys.use_fifo_ui</code> property to 1.</p>
|
||||
|
||||
<p class="warning"><strong>Warning</strong>: Do not use this option on
|
||||
heterogeneous CPU configurations unless you have a capacity-aware RT scheduler.
|
||||
And, at this moment, <strong>NO CURRENTLY SHIPPING RT SCHEDULER IS
|
||||
CAPACITY-AWARE</strong>. We're working on one for EAS, but it's not yet
|
||||
available. The default RT scheduler is based purely on RT priorities and whether
|
||||
a CPU already has a RT thread of equal or higher priority.
|
||||
|
||||
<br><br>As a result, the default RT scheduler will happily move your
|
||||
relatively-long-running UI thread from a high-frequency big core to a little
|
||||
core at minimum frequency if a higher priority FIFO kthread happens to wake up
|
||||
on the same big core. <strong>This will introduce significant performance
|
||||
regressions</strong>. As this option has not yet been used on a shipping Android
|
||||
device, if you want to use it get in touch with the Android performance team to
|
||||
help you validate it.</p>
|
||||
|
||||
<p>When <code>sys.use_fifo_ui</code> is enabled, ActivityManager tracks the UI
|
||||
thread and RenderThread (the two most UI-critical threads) of the top
|
||||
application and makes those threads SCHED_FIFO instead of SCHED_OTHER. This
|
||||
effectively eliminates jitter from UI and RenderThreads; the traces we've
|
||||
gathered with this option enabled show runnable times on the order of
|
||||
microseconds instead of milliseconds.</p>
|
||||
|
||||
<p>However, because the RT load balancer was not capacity-aware, there was a
|
||||
30% reduction in application startup performance because the UI thread
|
||||
responsible for starting up the app would be moved from a 2.1Ghz gold Kryo core
|
||||
to a 1.5GHz silver Kryo core. With a capacity-aware RT load balancer, we see
|
||||
equivalent performance in bulk operations and a 10-15% reduction in 95th and
|
||||
99th percentile frame times in many of our UI benchmarks.</p>
|
||||
|
||||
<h2 id="interrupt">Interrupt traffic</h2>
|
||||
<p>Because ARM platforms deliver interrupts to CPU 0 only by default, we
|
||||
recommend the use of an IRQ balancer (irqbalance or msm_irqbalance on Qualcomm
|
||||
platforms).</p>
|
||||
<p>During Pixel development, we saw jank that could be directly attributed to
|
||||
overwhelming CPU 0 with interrupts. For example, if the <code>mdss_fb0</code>
|
||||
thread was scheduled on CPU 0, there was a much greater likelihood to jank
|
||||
because of an interrupt that is triggered by the display almost immediately
|
||||
before scanout. <code>mdss_fb0</code> would be in the middle of its own work
|
||||
with a very tight deadline, and then it would lose some time to the MDSS
|
||||
interrupt handler. Initially, we attempted to fix this by setting the CPU
|
||||
affinity of the mdss_fb0 thread to CPUs 1-3 to avoid contention with the
|
||||
interrupt, but then we realized that we had not yet enabled msm_irqbalance. With
|
||||
msm_irqbalance enabled, jank was noticeably improved even when both mdss_fb0 and
|
||||
the MDSS interrupt were on the same CPU due to reduced contention from other
|
||||
interrupts.</p>
|
||||
<p>This can be identified in systrace by looking at the sched section as well as
|
||||
the irq section. The sched section shows what has been scheduled, but an
|
||||
overlapping region in the irq section means an interrupt is running during that
|
||||
time instead of the normally scheduled process. If you see significant chunks of
|
||||
time taken during an interrupt, your options include:</p>
|
||||
<ul>
|
||||
<li>Make the interrupt handler faster.</li>
|
||||
<li>Prevent the interrupt from happening in the first place.</li>
|
||||
<li>Change the frequency of the interrupt to be out of phase with other regular
|
||||
work that it may be interfering with (if it is a regular interrupt).</li>
|
||||
<li>Set CPU affinity of the interrupt directly and prevent it from being
|
||||
balanced.</li>
|
||||
<li>Set CPU affinity of the thread the interrupt is interfering with to avoid
|
||||
the interrupt.</li>
|
||||
<li>Rely on the interrupt balancer to move the interrupt to a less loaded
|
||||
CPU.</li>
|
||||
</ul>
|
||||
<p>Setting CPU affinity is generally not recommended but can be useful for
|
||||
specific cases. In general, it's too hard to predict the state of the system for
|
||||
most common interrupts, but if you have a very specific set of conditions that
|
||||
triggers certain interrupts where the system is more constrained than normal
|
||||
(such as VR), explicit CPU affinity may be a good solution.</p>
|
||||
|
||||
<h2>Long softirqs</h2>
|
||||
<p>While a softirq is running, it disables preemption. softirqs can also be
|
||||
triggered at many places within the kernel and can run inside of a user process.
|
||||
If there's enough softirq activity, user processes will stop running softirqs,
|
||||
and ksoftirqd wakes to run softirqs and be load balanced. Usually, this is fine.
|
||||
However, a single very long softirq can wreak havoc on the system.</p>
|
||||
|
||||
<section class="expandable">
|
||||
<h4 class="showalways">Show Issue: Janky headtracking while streaming data over
|
||||
Wi-Fi</h4>
|
||||
|
||||
<p>In this issue, we saw that VR performance was inconsistent under specific
|
||||
network conditions (Wi-Fi). What we found in the trace was that a single NET_RX
|
||||
softirq could run for 30+ milliseconds. This was eventually tracked down to
|
||||
Receive Packet Steering, a Qualcomm Wi-Fi feature that coalesces many NET_RX
|
||||
softirqs into a single softirq. The resulting softirq could, under the right
|
||||
conditions, have a very large (potentially unbounded) runtime.</p>
|
||||
<p>While this feature may have reduced total CPU cycles spent on networking, it
|
||||
prevented the system from running the right things at the right time. Disabling
|
||||
the feature didn't impact network throughput or battery life, but it did fix the
|
||||
VR headtracking jank by allowing ksoftirqd to load balance the softirqs instead
|
||||
of circumventing it.</p>
|
||||
</section>
|
||||
<hr>
|
||||
|
||||
<p>softirqs are visible within the irq section of a trace, so they are easy to
|
||||
spot if the issue can be reproduced while tracing. Because a softirq can run
|
||||
within a user process, a bad softirq can also manifest as extra runtime inside
|
||||
of a user process for no obvious reason. If you see that, check the irq section
|
||||
to see if softirqs are to blame.</p>
|
||||
|
||||
<h2 id="drivers">Drivers leaving preemption or IRQs disabled too long</h2>
|
||||
<p>Disabling preemption or interrupts for too long (tens of milliseconds)
|
||||
results in jank. Typically, the jank manifests as a thread becoming runnable but
|
||||
not running on a particular CPU, even if the runnable thread is significantly
|
||||
higher priority (or SCHED_FIFO) than the other thread.</p>
|
||||
|
||||
<p>Some guidelines:</p>
|
||||
<ul>
|
||||
<li>If runnable thread is SCHED_FIFO and running thread is SCHED_OTHER, the
|
||||
running thread has preemption or interrupts disabled.</li>
|
||||
<li>If runnable thread is significantly higher priority (100) than the running
|
||||
thread (120), the running thread likely has preemption or interrupts disabled if
|
||||
the runnable thread doesn't run within two jiffies.</li>
|
||||
<li>If the runnable thread and the running thread have the same priority, the
|
||||
running thread likely has preemption or interrupts disabled if the runnable
|
||||
thread doesn't run within 20ms.</li>
|
||||
</ul>
|
||||
<p>Keep in mind that running an interrupt handler prevents you from servicing
|
||||
other interrupts, which also disables preemption.</p>
|
||||
|
||||
<section class="expandable">
|
||||
<h4 class="showalways">Show Issue: CONFIG_BUS_AUTO_SUSPEND causes significant
|
||||
jank</h4>
|
||||
|
||||
<p>In this issue, we identified a major source of jank in Pixel during bringup.
|
||||
To follow along with the example, <a href="perf_traces.zip">download the zip
|
||||
file</a> of traces (which also includes other traces referred to in this
|
||||
section), upzip the file, and open the trace_30293222.html file in your
|
||||
browser.</p>
|
||||
|
||||
<p>In the trace, locate the SurfaceFlinger EventThread beginning
|
||||
at 2235.195 ms. When performing bouncing ball tuning, we often saw one dropped
|
||||
frame when either SurfaceFlinger or a UI-critical thread ran after being
|
||||
runnable for 6.6ms (two jiffies at CONFIG_HZ=300). Critical SurfaceFlinger
|
||||
threads and the app's UI thread were SCHED_FIFO at the time.</p>
|
||||
<p>According to the trace, the thread would wake on a particular CPU, remain
|
||||
runnable for two jiffies, and get load balanced to a different CPU, where it
|
||||
would then run. The thread that was running while the UI thread and
|
||||
SurfaceFlinger were runnable was always a priority 120 kworker in
|
||||
pm_runtime_work.</p>
|
||||
<p>While digging through the kernel to see what pm_runtime_work was actually
|
||||
doing, we discovered the Wi-Fi driver's power management was handled through
|
||||
pm_runtime_work. We took additional traces with Wi-Fi disabled and the jank
|
||||
disappeared. To double-check, we also disabled the Wi-Fi driver's power
|
||||
management in the kernel and took more traces with Wi-Fi connected, and the jank
|
||||
was still gone. Qualcomm was then able to find the offending region with
|
||||
preemption disabled and fix it, and we were able to reenable that option for
|
||||
launch.</p>
|
||||
</section>
|
||||
<hr>
|
||||
|
||||
<p>Another option for identifying offending regions is with the preemptirqsoff
|
||||
tracer (see <a href="/devices/tech/debug/ftrace.html#dftrace">Using
|
||||
dynamic ftrace</a>). This tracer can give a much greater insight into the root
|
||||
cause of an uninterruptible region (such as function names), but requires more
|
||||
invasive work to enable. While it may have more of a performance impact, it is
|
||||
definitely worth trying.</p>
|
||||
|
||||
<h2 id="workqueues">Incorrect use of workqueues</h2>
|
||||
<p>Interrupt handlers often need to do work that can run outside of an interrupt
|
||||
context, enabling work to be farmed out to different threads in the kernel. A
|
||||
driver developer may notice the kernel has a very convenient system-wide
|
||||
asynchronous task functionality called <em>workqueues</em> and might use that
|
||||
for interrupt-related work.</p>
|
||||
|
||||
<p>However, workqueues are almost always the wrong answer for this problem
|
||||
because they are always SCHED_OTHER. Many hardware interrupts are in the
|
||||
critical path of performance and must be run immediately. Workqueues have no
|
||||
guarantees about when they will be run. Every time we've seen a workqueue in the
|
||||
critical path of performance, it's been a source of sporadic jank, regardless of
|
||||
the device. On Pixel, with a flagship processor, we saw that a single workqueue
|
||||
could be delayed up to 7ms if the device was under load depending on scheduler
|
||||
behavior and other things running on the system.</p>
|
||||
|
||||
<p>Instead of a workqueue, drivers that need to handle interrupt-like work
|
||||
inside a separate thread should create their own SCHED_FIFO kthread. For help
|
||||
doing this with kthread_work functions, refer to this
|
||||
<a href="https://android.googlesource.com/kernel/msm/+/1a7a93bd33f48a369de29f6f2b56251127bf6ab4%5E!/">patch</a>.</p>
|
||||
|
||||
<h2 id="framework-lock">Framework lock contention</h2>
|
||||
<p>Framework lock contention can be a source of jank or other performance
|
||||
issues. It's usually caused by the ActivityManagerService lock but can be seen
|
||||
in other locks as well. For example, the PowerManagerService lock can impact
|
||||
screen on performance. If you're seeing this on your device, there's no good fix
|
||||
because it can only be improved via architectural improvements to the framework.
|
||||
However, if you are modifying code that runs inside of system_server, it's
|
||||
critical to avoid holding locks for a long time, especially the
|
||||
ActivityManagerService lock.</p>
|
||||
|
||||
<h2 id="binder-lock">Binder lock contention</h2>
|
||||
<p>Historically, binder has had a single global lock. If the thread running a
|
||||
binder transaction was preempted while holding the lock, no other thread can
|
||||
perform a binder transaction until the original thread has released the lock.
|
||||
This is bad; binder contention can block everything in the system, including
|
||||
sending UI updates to the display (UI threads communicate with SurfaceFlinger
|
||||
via binder).</p>
|
||||
<p>Android 6.0 included several patches to improve this behavior by disabling
|
||||
preemption while holding the binder lock. This was safe only because the binder
|
||||
lock should be held for a few microseconds of actual runtime. This dramatically
|
||||
improved performance in uncontended situations and prevented contention by
|
||||
preventing most scheduler switches while the binder lock was held. However,
|
||||
preemption could not be disabled for the entire runtime of holding the binder
|
||||
lock, meaning that preemption was enabled for functions that could sleep (such
|
||||
as copy_from_user), which could cause the same preemption as the original case.
|
||||
When we sent the patches upstream, they promptly told us that this was the worst
|
||||
idea in history. (We agreed with them, but we also couldn't argue with the
|
||||
efficacy of the patches toward preventing jank.)</p>
|
||||
|
||||
<h2 id="fd-contention">fd contention within a process</h2>
|
||||
<p>This is rare. Your jank is probably not caused by this.</p>
|
||||
<p>That said, if you have multiple threads within a process writing the same fd,
|
||||
it is possible to see contention on this fd, however the only time we saw this
|
||||
during Pixel bringup is during a test where low-priority threads attempted to
|
||||
occupy all CPU time while a single high-priority thread was running within the
|
||||
same process. All threads were writing to the trace marker fd and the
|
||||
high-priority thread could get blocked on the trace marker fd if a low-priority
|
||||
thread was holding the fd lock and was then preempted. When tracing was disabled
|
||||
from the low-priority threads, there was no performance issue.</p>
|
||||
<p>We weren't able to reproduce this in any other situation, but it's worth
|
||||
pointing out as a potential cause of performance issues while tracing.</p>
|
||||
|
||||
<h2 id="cpu-idle">Unnecessary CPU idle transitions</h2>
|
||||
<p>When dealing with IPC, especially multi-process pipelines, it's common to see
|
||||
variations on the following runtime behavior:</p>
|
||||
<ol>
|
||||
<li>Thread A runs on CPU 1.</li>
|
||||
<li>Thread A wakes up thread B.</li>
|
||||
<li>Thread B starts running on CPU 2.</li>
|
||||
<li>Thread A immediately goes to sleep, to be awakened by thread B when thread B
|
||||
has finished its current work.</li>
|
||||
</ol>
|
||||
|
||||
<p>A common source of overhead is between steps 2 and 3. If CPU 2 is idle, it
|
||||
must be brought back to an active state before thread B can run. Depending on
|
||||
the SOC and how deep the idle is, this could be tens of microseconds before
|
||||
thread B begins running. If the actual runtime of each side of the IPC is close
|
||||
enough to the overhead, the overall performance of that pipeline can be
|
||||
significantly reduced by CPU idle transitions. The most common place for Android
|
||||
to hit this is around binder transactions, and many services that use binder end
|
||||
up looking like the situation described above.</p>
|
||||
|
||||
<p>First, use the <code>wake_up_interruptible_sync()</code> function in your
|
||||
kernel drivers and support this from any custom scheduler. Treat this as a
|
||||
requirement, not a hint. Binder uses this today, and it helps a lot with
|
||||
synchronous binder transactions avoiding unnecessary CPU idle transitions.</p>
|
||||
<p>Second, ensure your cpuidle transition times are realistic and the cpuidle
|
||||
governor is taking these into account correctly. If your SOC is thrashing in and
|
||||
out of your deepest idle state, you won't save power by going to deepest
|
||||
idle.</p>
|
||||
|
||||
<h2 id="logging">Logging</h2>
|
||||
<p>Logging is not free for CPU cycles or memory, so don't spam the log buffer.
|
||||
Logging costs cycles in your application (directly) and in the log daemon.
|
||||
Remove any debugging logs before shipping your device.</p>
|
||||
|
||||
<h2 id="io-issues">I/O issues</h2>
|
||||
<p>I/O operations are common sources of jitter. If a thread accesses a
|
||||
memory-mapped file and the page is not in the page cache, it faults and reads
|
||||
the page from disk. This blocks the thread (usually for 10+ ms) and if it
|
||||
happens in the critical path of UI rendering, can result in jank. There are too
|
||||
many causes of I/O operations to discuss here, but check the following locations
|
||||
when trying to improve I/O behavior:</p>
|
||||
<ul>
|
||||
<li><strong>PinnerService</strong>. Added in Android 7.0, PinnerService enables
|
||||
the framework to lock some files in the page cache. This removes the memory for
|
||||
use by any other process, but if there are some files that are known a priori to
|
||||
be used regularly, it can be effective to mlock those files.<br><br>
|
||||
On Pixel and Nexus 6P devices running Android 7.0, we mlocked four files:
|
||||
<ul>
|
||||
<li>/system/framework/arm64/boot-framework.oat</li>
|
||||
<li>/system/framework/oat/arm64/services.odex</li>
|
||||
<li>/system/framework/arm64/boot.oat</li>
|
||||
<li>/system/framework/arm64/boot-core-libart.oat</li>
|
||||
</ul>
|
||||
These files are constantly in use by most applications and system_server, so
|
||||
they should not be paged out. In particular, we've found that if any of those
|
||||
are paged out, they will be paged back in and cause jank when switching from a
|
||||
heavyweight application.
|
||||
</li>
|
||||
<li><strong>Encryption</strong>. Another possible cause of I/O problems. We find
|
||||
inline encryption offers the best performance when compared to CPU-based
|
||||
encryption or using a hardware block accessible via DMA. Most importantly,
|
||||
inline encryption reduces the jitter associated with I/O, especially when
|
||||
compared to CPU-based encryption. Because fetches to the page cache are often in
|
||||
the critical path of UI rendering, CPU-based encryption introduces additional
|
||||
CPU load in the critical path, which adds more jitter than just the I/O fetch.
|
||||
<br><br>
|
||||
DMA-based hardware encryption engines have a similar problem, since the kernel
|
||||
has to spend cycles managing that work even if other critical work is available
|
||||
to run. We strongly recommend any SOC vendor building new hardware to include
|
||||
support for inline encryption.</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="small-task">Aggressive small task packing</h2>
|
||||
<p>Some schedulers offer support for packing small tasks onto single CPU cores
|
||||
to try to reduce power consumption by keeping more CPUs idle for longer. While
|
||||
this works well for throughput and power consumption, it can be
|
||||
<em>catastrophic</em> to latency. There are several short-running threads in the
|
||||
critical path of UI rendering that can be considered small; if these threads are
|
||||
delayed as they are slowly migrated to other CPUs, it <strong>will</strong>
|
||||
cause jank. We recommend using small task packing very conservatively.</p>
|
||||
|
||||
<h2 id="page-cache">Page cache thrashing</h2>
|
||||
<p>A device without enough free memory may suddenly become extremely sluggish
|
||||
while performing a long-running operation, such as opening a new application. A
|
||||
trace of the application may reveal it is consistently blocked in I/O during a
|
||||
particular run even when it often is not blocked in I/O. This is usually a sign
|
||||
of page cache thrashing, especially on devices with less memory.</p>
|
||||
|
||||
<p>One way to identify this is to take a systrace using the pagecache tag and
|
||||
feed that trace to the script at
|
||||
<code>system/extras/pagecache/pagecache.py</code>. pagecache.py translates
|
||||
individual requests to map files into the page cache into aggregate per-file
|
||||
statistics. If you find that more bytes of a file have been read than the total
|
||||
size of that file on disk, you are definitely hitting page cache thrashing.</p>
|
||||
|
||||
<p>What this means is that the working set required by your workload (typically
|
||||
a single application plus system_server) is greater than the amount of memory
|
||||
available to the page cache on your device. As a result, as one part of the
|
||||
workload gets the data it needs in the page cache, another part that will be
|
||||
used in the near future will be evicted and will have to be fetched again,
|
||||
causing the problem to occur again until the load has completed. This is the
|
||||
fundamental cause of performance problems when not enough memory is available
|
||||
on a device.</p>
|
||||
|
||||
<p>There's no foolproof way to fix page cache thrashing, but there are a few
|
||||
ways to try to improve this on a given device.</p>
|
||||
<ul>
|
||||
<li>Use less memory in persistent processes. The less memory used by persistent
|
||||
processes, the more memory available to applications and the page cache.</li>
|
||||
<li>Audit the carveouts you have for your device to ensure you are not
|
||||
unnecessarily removing memory from the OS. We've seen situations where carveouts
|
||||
used for debugging were accidentally left in shipping kernel configurations,
|
||||
consuming tens of megabytes of memory. This can make the difference between
|
||||
hitting page cache thrashing and not, especially on devices with less
|
||||
memory.</li>
|
||||
<li>If you're seeing page cache thrashing in system_server on critical files,
|
||||
consider pinning those files. This will increase memory pressure elsewhere, but
|
||||
it may modify behavior enough to avoid thrashing.</li>
|
||||
<li>Retune lowmemorykiller to try to keep more memory free. lowmemorykiller's
|
||||
thresholds are based on both absolute free memory and the page cache, so
|
||||
increasing the threshold at which processes at a given oom_adj level are killed
|
||||
may result in better behavior at the expense of increased background app
|
||||
death.</li>
|
||||
<li>Try using ZRAM. We use ZRAM on Pixel, even though Pixel has 4GB, because it
|
||||
could help with rarely used dirty pages.</li>
|
||||
</ul>
|
||||
|
||||
</body>
|
||||
</html>
|
|
@ -0,0 +1,567 @@
|
|||
<html devsite>
|
||||
<head>
|
||||
<title>Diagnosing Native Crashes</title>
|
||||
<meta name="project_path" value="/_project.yaml" />
|
||||
<meta name="book_path" value="/_book.yaml" />
|
||||
</head>
|
||||
<body>
|
||||
<!--
|
||||
Copyright 2017 The Android Open Source Project
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
|
||||
|
||||
<p>
|
||||
If you've never seen a native crash before, start with
|
||||
<a href="/devices/tech/debug/index.html">Debugging Native Android
|
||||
Platform Code</a>.
|
||||
</p>
|
||||
|
||||
<h2 id=crashtypes>Types of native crash</h2>
|
||||
<p>
|
||||
The sections below detail the most common kinds of native crash. Each includes
|
||||
an example chunk of <code>debuggerd</code> output, with the key evidence that helps you
|
||||
distinguish that specific kind of crash highlighted in orange italic text.
|
||||
</p>
|
||||
<h3 id=abort>Abort</h3>
|
||||
<p>
|
||||
Aborts are interesting because they're deliberate. There are many different ways
|
||||
to abort (including calling <code><a
|
||||
href="http://man7.org/linux/man-pages/man3/abort.3.html">abort(3)</a></code>,
|
||||
failing an <code><a
|
||||
href="http://man7.org/linux/man-pages/man3/assert.3.html">assert(3)</a></code>,
|
||||
using one of the Android-specific fatal logging types), but they all involve
|
||||
calling <code>abort</code>. A call to <code>abort</code> basically signals the
|
||||
calling thread with SIGABRT, so a frame showing "abort" in <code>libc.so</code>
|
||||
plus SIGABRT are the things to look for in the <code>debuggerd</code> output to
|
||||
recognize this case.</p>
|
||||
|
||||
<p>
|
||||
As mentioned above, there may be an explicit "abort message" line. But you
|
||||
should also look in the <code>logcat</code> output to see what this thread logged before
|
||||
deliberately killing itself, because the basic abort primitive doesn't accept a
|
||||
message.
|
||||
</p>
|
||||
<p>
|
||||
Older versions of Android (especially on 32-bit ARM) followed a convoluted path
|
||||
between the original abort call (frame 4 here) and the actual sending of the
|
||||
signal (frame 0 here):
|
||||
</p>
|
||||
<pre class="devsite-click-to-copy">
|
||||
pid: 1656, tid: 1656, name: crasher >>> crasher <<<
|
||||
signal 6 (<i style="color:Orange">SIGABRT</i>), code -6 (SI_TKILL), fault addr --------
|
||||
<i style="color:Orange">Abort message</i>: 'some_file.c:123: some_function: assertion "false" failed'
|
||||
r0 00000000 r1 00000678 r2 00000006 r3 f70b6dc8
|
||||
r4 f70b6dd0 r5 f70b6d80 r6 00000002 r7 0000010c
|
||||
r8 ffffffed r9 00000000 sl 00000000 fp ff96ae1c
|
||||
ip 00000006 sp ff96ad18 lr f700ced5 pc f700dc98 cpsr 400b0010
|
||||
backtrace:
|
||||
#00 pc 00042c98 /system/lib/libc.so (tgkill+12)
|
||||
#01 pc 00041ed1 /system/lib/libc.so (pthread_kill+32)
|
||||
#02 pc 0001bb87 /system/lib/libc.so (raise+10)
|
||||
#03 pc 00018cad /system/lib/libc.so (__libc_android_abort+34)
|
||||
#04 pc 000168e8 /system/lib/<i style="color:Orange">libc.so</i> (<i style="color:Orange">abort</i>+4)
|
||||
#05 pc 0001a78f /system/lib/libc.so (__libc_fatal+16)
|
||||
#06 pc 00018d35 /system/lib/libc.so (__assert2+20)
|
||||
#07 pc 00000f21 /system/xbin/crasher
|
||||
#08 pc 00016795 /system/lib/libc.so (__libc_init+44)
|
||||
#09 pc 00000abc /system/xbin/crasher
|
||||
</pre>
|
||||
<p>
|
||||
More recent versions call <code><a
|
||||
href="http://man7.org/linux/man-pages/man2/tgkill.2.html">tgkill(2)</a></code>
|
||||
directly from <code>abort</code>, so there are fewer stack frames for you to
|
||||
skip over before you get to the interesting frames:</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
pid: 25301, tid: 25301, name: crasher >>> crasher <<<
|
||||
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
|
||||
r0 00000000 r1 000062d5 r2 00000006 r3 00000008
|
||||
r4 ffa09dd8 r5 000062d5 r6 000062d5 r7 0000010c
|
||||
r8 00000000 r9 00000000 sl 00000000 fp ffa09f0c
|
||||
ip 00000000 sp ffa09dc8 lr eac63ce3 pc eac93f0c cpsr 000d0010
|
||||
backtrace:
|
||||
#00 pc 00049f0c /system/lib/libc.so (tgkill+12)
|
||||
#01 pc 00019cdf /system/lib/libc.so (abort+50)
|
||||
#02 pc 000012db /system/xbin/crasher (maybe_abort+26)
|
||||
#03 pc 000015b7 /system/xbin/crasher (do_action+414)
|
||||
#04 pc 000020d5 /system/xbin/crasher (main+100)
|
||||
#05 pc 000177a1 /system/lib/libc.so (__libc_init+48)
|
||||
#06 pc 000010e4 /system/xbin/crasher (_start+96)
|
||||
</pre>
|
||||
<p>
|
||||
You can reproduce an instance of this type of crash using: <code>crasher
|
||||
abort</code>
|
||||
</p>
|
||||
<h3 id=nullpointer>Pure null pointer dereference</h3>
|
||||
<p>
|
||||
This is the classic native crash, and although it's just a special case of the
|
||||
next crash type, it's worth mentioning separately because it usually requires
|
||||
the least thought.
|
||||
</p>
|
||||
<p>
|
||||
In the example below, even though the crashing function is in
|
||||
<code>libc.so</code>, because the string functions just operate on the pointers
|
||||
they're given, you can infer that <code><a
|
||||
href="http://man7.org/linux/man-pages/man3/strlen.3.html">strlen(3)</a></code>
|
||||
was called with a null pointer; and this crash should go straight to the author
|
||||
of the calling code. In this case, frame #01 is the bad caller.
|
||||
</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
pid: 25326, tid: 25326, name: crasher >>> crasher <<<
|
||||
signal 11 (<i style="color:Orange">SIGSEGV</i>), code 1 (SEGV_MAPERR), <i style="color:Orange">fault addr 0x0</i>
|
||||
r0 00000000 r1 00000000 r2 00004c00 r3 00000000
|
||||
r4 ab088071 r5 fff92b34 r6 00000002 r7 fff92b40
|
||||
r8 00000000 r9 00000000 sl 00000000 fp fff92b2c
|
||||
ip ab08cfc4 sp fff92a08 lr ab087a93 pc efb78988 cpsr 600d0030
|
||||
|
||||
backtrace:
|
||||
#00 pc 00019988 /system/lib/libc.so (strlen+71)
|
||||
#01 pc 00001a8f /system/xbin/crasher (strlen_null+22)
|
||||
#02 pc 000017cd /system/xbin/crasher (do_action+948)
|
||||
#03 pc 000020d5 /system/xbin/crasher (main+100)
|
||||
#04 pc 000177a1 /system/lib/libc.so (__libc_init+48)
|
||||
#05 pc 000010e4 /system/xbin/crasher (_start+96)
|
||||
</pre>
|
||||
<p>
|
||||
You can reproduce an instance of this type of crash using: <code>crasher
|
||||
strlen-NULL</code>
|
||||
</p>
|
||||
<h3 id=lowaddress>Low-address null pointer dereference</h3>
|
||||
<p>
|
||||
In many cases the fault address won't be 0, but some other low number. Two- or
|
||||
three-digit addresses in particular are very common, whereas a six-digit address
|
||||
is almost certainly not a null pointer dereference—that would require a 1MiB
|
||||
offset. This usually occurs when you have code that dereferences a null pointer
|
||||
as if it was a valid struct. Common functions are <code><a
|
||||
href="http://man7.org/linux/man-pages/man3/fprintf.3.html">fprintf(3)</a></code>
|
||||
(or any other function taking a FILE*) and <code><a
|
||||
href="http://man7.org/linux/man-pages/man3/readdir.3.html">readdir(3)</a></code>,
|
||||
because code often fails to check that the <code><a
|
||||
href="http://man7.org/linux/man-pages/man3/fopen.3.html">fopen(3)</a></code> or
|
||||
<code><a
|
||||
href="http://man7.org/linux/man-pages/man3/opendir.3.html">opendir(3)</a></code>
|
||||
call actually succeeded first.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Here's an example of <code>readdir</code>:
|
||||
</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
pid: 25405, tid: 25405, name: crasher >>> crasher <<<
|
||||
signal 11 (<i style="color:Orange">SIGSEGV</i>), code 1 (SEGV_MAPERR), <i style="color:Orange">fault addr 0xc</i>
|
||||
r0 0000000c r1 00000000 r2 00000000 r3 3d5f0000
|
||||
r4 00000000 r5 0000000c r6 00000002 r7 ff8618f0
|
||||
r8 00000000 r9 00000000 sl 00000000 fp ff8618dc
|
||||
ip edaa6834 sp ff8617a8 lr eda34a1f pc eda618f6 cpsr 600d0030
|
||||
|
||||
backtrace:
|
||||
#00 pc 000478f6 /system/lib/libc.so (pthread_mutex_lock+1)
|
||||
#01 pc 0001aa1b /system/lib/libc.so (readdir+10)
|
||||
#02 pc 00001b35 /system/xbin/crasher (readdir_null+20)
|
||||
#03 pc 00001815 /system/xbin/crasher (do_action+976)
|
||||
#04 pc 000021e5 /system/xbin/crasher (main+100)
|
||||
#05 pc 000177a1 /system/lib/libc.so (__libc_init+48)
|
||||
#06 pc 00001110 /system/xbin/crasher (_start+96)
|
||||
</pre>
|
||||
<p>
|
||||
Here the direct cause of the crash is that <code><a
|
||||
href="http://man7.org/linux/man-pages/man3/pthread_mutex_lock.3p.html">pthread_mutex_lock(3)</a></code>
|
||||
has tried to access address 0xc (frame 0). But the first thing
|
||||
<code>pthread_mutex_lock</code> does is dereference the <code>state</code>
|
||||
element of the <code>pthread_mutex_t*</code> it was given. If you look at the
|
||||
source, you can see that element is at offset 0 in the struct, which tells you
|
||||
that <code>pthread_mutex_lock</code> was given the invalid pointer 0xc. From the
|
||||
frame 1 you can see that it was given that pointer by <code>readdir</code>,
|
||||
which extracts the <code>mutex_</code> field from the <code>DIR*</code> it's
|
||||
given. Looking at that structure, you can see that <code>mutex_</code> is at
|
||||
offset <code>sizeof(int) + sizeof(size_t) + sizeof(dirent*)</code> into
|
||||
<code>struct DIR</code>, which on a 32-bit device is 4 + 4 + 4 = 12 = 0xc, so
|
||||
you found the bug: <code>readdir</code> was passed a null pointer by the caller.
|
||||
At this point you can paste the stack into the stack tool to find out
|
||||
<em>where</em> in logcat this happened.</p>
|
||||
|
||||
<pre class="prettyprint">
|
||||
struct DIR {
|
||||
int fd_;
|
||||
size_t available_bytes_;
|
||||
dirent* next_;
|
||||
pthread_mutex_t mutex_;
|
||||
dirent buff_[15];
|
||||
long current_pos_;
|
||||
};
|
||||
</pre>
|
||||
<p>
|
||||
In most cases you can actually skip this analysis. A sufficiently low fault
|
||||
address usually means you can just skip any <code>libc.so</code> frames in the
|
||||
stack and directly accuse the calling code. But not always, and this is how you
|
||||
would present a compelling case.
|
||||
</p>
|
||||
<p>
|
||||
You can reproduce instances of this kind of crash using: <code>crasher
|
||||
fprintf-NULL</code> or <code>crasher readdir-NULL</code>
|
||||
</p>
|
||||
<h3 id=fortify>FORTIFY failure</h3>
|
||||
<p>
|
||||
A FORTIFY failure is a special case of an abort that occurs when the C library
|
||||
detects a problem that might lead to a security vulnerability. Many C library
|
||||
functions are <em>fortified</em>; they take an extra argument that tells them how large
|
||||
a buffer actually is and check at run time whether the operation you're trying
|
||||
to perform actually fits. Here's an example where the code tries to
|
||||
<code>read(fd, buf, 32)</code> into a buffer that's actually only 10 bytes
|
||||
long...
|
||||
</p>
|
||||
<pre class="devsite-click-to-copy">
|
||||
pid: 25579, tid: 25579, name: crasher >>> crasher <<<
|
||||
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
|
||||
Abort message: '<i style="color:Orange">FORTIFY: read: prevented 32-byte write into 10-byte buffer'</i>
|
||||
r0 00000000 r1 000063eb r2 00000006 r3 00000008
|
||||
r4 ff96f350 r5 000063eb r6 000063eb r7 0000010c
|
||||
r8 00000000 r9 00000000 sl 00000000 fp ff96f49c
|
||||
ip 00000000 sp ff96f340 lr ee83ece3 pc ee86ef0c cpsr 000d0010
|
||||
|
||||
backtrace:
|
||||
#00 pc 00049f0c /system/lib/libc.so (tgkill+12)
|
||||
#01 pc 00019cdf /system/lib/libc.so (abort+50)
|
||||
#02 pc 0001e197 /system/lib/libc.so (<i style="color:Orange">__fortify_fatal</i>+30)
|
||||
#03 pc 0001baf9 /system/lib/libc.so (__read_chk+48)
|
||||
#04 pc 0000165b /system/xbin/crasher (do_action+534)
|
||||
#05 pc 000021e5 /system/xbin/crasher (main+100)
|
||||
#06 pc 000177a1 /system/lib/libc.so (__libc_init+48)
|
||||
#07 pc 00001110 /system/xbin/crasher (_start+96)
|
||||
</pre>
|
||||
<p>
|
||||
You can reproduce an instance of this type of crash using: <code>crasher
|
||||
fortify</code>
|
||||
</p>
|
||||
<h3 id=stackcorruption>Stack corruption detected by -fstack-protector</h3>
|
||||
<p>
|
||||
The compiler's <code>-fstack-protector</code> option inserts checks into
|
||||
functions with on-stack buffers to guard against buffer overruns. This option is
|
||||
on by default for platform code but not for apps. When this option is enabled,
|
||||
the compiler adds instructions to the <a
|
||||
href="https://en.wikipedia.org/wiki/Function_prologue">function prologue</a> to
|
||||
write a random value just past the last local on the stack and to the function
|
||||
epilogue to read it back and check that it's not changed. If that value changed,
|
||||
it was overwritten by a buffer overrun, so the epilogue calls
|
||||
<code>__stack_chk_fail</code> to log a message and abort.
|
||||
</p>
|
||||
<pre class="devsite-click-to-copy">
|
||||
pid: 26717, tid: 26717, name: crasher >>> crasher <<<
|
||||
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
|
||||
<i style="color:Orange">Abort message: 'stack corruption detected'</i>
|
||||
r0 00000000 r1 0000685d r2 00000006 r3 00000008
|
||||
r4 ffd516d8 r5 0000685d r6 0000685d r7 0000010c
|
||||
r8 00000000 r9 00000000 sl 00000000 fp ffd518bc
|
||||
ip 00000000 sp ffd516c8 lr ee63ece3 pc ee66ef0c cpsr 000e0010
|
||||
|
||||
backtrace:
|
||||
#00 pc 00049f0c /system/lib/libc.so (tgkill+12)
|
||||
#01 pc 00019cdf /system/lib/libc.so (abort+50)
|
||||
#02 pc 0001e07d /system/lib/libc.so (__libc_fatal+24)
|
||||
#03 pc 0004863f /system/lib/libc.so (<i style="color:Orange">__stack_chk_fail</i>+6)
|
||||
#04 pc 000013ed /system/xbin/crasher (smash_stack+76)
|
||||
#05 pc 00001591 /system/xbin/crasher (do_action+280)
|
||||
#06 pc 00002219 /system/xbin/crasher (main+100)
|
||||
#07 pc 000177a1 /system/lib/libc.so (__libc_init+48)
|
||||
#08 pc 00001144 /system/xbin/crasher (_start+96)
|
||||
</pre>
|
||||
<p>
|
||||
You can distinguish this from other kinds of abort by the presence of
|
||||
<code>__stack_chk_fail</code> in the backtrace and the specific abort message.
|
||||
</p>
|
||||
<p>
|
||||
You can reproduce an instance of this type of crash using: <code>crasher
|
||||
smash-stack</code>
|
||||
</p>
|
||||
|
||||
<h2 id=crashdump>Crash dumps</h2>
|
||||
|
||||
<p>If you don't have a specific crash that you're investigating right now,
|
||||
the platform source includes a tool for testing <code>debuggerd</code> called crasher. If
|
||||
you <code>mm</code> in <code>system/core/debuggerd/</code> you'll get both a <code>crasher</code>
|
||||
and a <code>crasher64</code> on your path (the latter allowing you to test
|
||||
64-bit crashes). Crasher can crash in a large number of interesting ways based
|
||||
on the command line arguments you provide. Use <code>crasher --help</code>
|
||||
to see the currently supported selection.</p>
|
||||
|
||||
<p>To introduce the different pieces in a crash dump, let's work through this example crash dump:</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
|
||||
Build fingerprint: 'Android/aosp_flounder/flounder:5.1.51/AOSP/enh08201009:eng/test-keys'
|
||||
Revision: '0'
|
||||
ABI: 'arm'
|
||||
pid: 1656, tid: 1656, name: crasher >>> crasher <<<
|
||||
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
|
||||
Abort message: 'some_file.c:123: some_function: assertion "false" failed'
|
||||
r0 00000000 r1 00000678 r2 00000006 r3 f70b6dc8
|
||||
r4 f70b6dd0 r5 f70b6d80 r6 00000002 r7 0000010c
|
||||
r8 ffffffed r9 00000000 sl 00000000 fp ff96ae1c
|
||||
ip 00000006 sp ff96ad18 lr f700ced5 pc f700dc98 cpsr 400b0010
|
||||
backtrace:
|
||||
#00 pc 00042c98 /system/lib/libc.so (tgkill+12)
|
||||
#01 pc 00041ed1 /system/lib/libc.so (pthread_kill+32)
|
||||
#02 pc 0001bb87 /system/lib/libc.so (raise+10)
|
||||
#03 pc 00018cad /system/lib/libc.so (__libc_android_abort+34)
|
||||
#04 pc 000168e8 /system/lib/libc.so (abort+4)
|
||||
#05 pc 0001a78f /system/lib/libc.so (__libc_fatal+16)
|
||||
#06 pc 00018d35 /system/lib/libc.so (__assert2+20)
|
||||
#07 pc 00000f21 /system/xbin/crasher
|
||||
#08 pc 00016795 /system/lib/libc.so (__libc_init+44)
|
||||
#09 pc 00000abc /system/xbin/crasher
|
||||
Tombstone written to: /data/tombstones/tombstone_06
|
||||
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
|
||||
</pre>
|
||||
|
||||
<p>The line of asterisks with spaces is helpful if you're searching a log
|
||||
for native crashes. The string "*** ***" rarely shows up in logs other than
|
||||
at the beginning of a native crash.</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
Build fingerprint:
|
||||
'Android/aosp_flounder/flounder:5.1.51/AOSP/enh08201009:eng/test-keys'
|
||||
</pre>
|
||||
|
||||
<p>The fingerprint lets you identify exactly which build the crash occurred
|
||||
on. This is exactly the same as the <code>ro.build.fingerprint</code> system property.</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
Revision: '0'
|
||||
</pre>
|
||||
|
||||
<p>The revision refers to the hardware rather than the software. This is
|
||||
usually unused but can be useful to help you automatically ignore bugs known
|
||||
to be caused by bad hardware. This is exactly the same as the <code>ro.revision</code>
|
||||
system property.</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
ABI: 'arm'
|
||||
</pre>
|
||||
|
||||
<p>The ABI is one of arm, arm64, mips, mips64, x86, or x86-64. This is
|
||||
mostly useful for the <code>stack</code> script mentioned above, so that it knows
|
||||
what toolchain to use.</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
pid: 1656, tid: 1656, name: crasher >>> crasher <<<
|
||||
</pre>
|
||||
|
||||
<p>This line identifies the specific thread in the process that crashed. In
|
||||
this case, it was the process' main thread, so the process ID and thread
|
||||
ID match. The first name is the thread name, and the name surrounded by
|
||||
>>> and <<< is the process name. For an app, the process name
|
||||
is typically the fully-qualified package name (such as com.facebook.katana),
|
||||
which is useful when filing bugs or trying to find the app in Google Play. The
|
||||
pid and tid can also be useful in finding the relevant log lines preceding
|
||||
the crash.</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
|
||||
</pre>
|
||||
|
||||
<p>This line tells you which signal (SIGABRT) was received, and more about
|
||||
how it was received (SI_TKILL). The signals reported by <code>debuggerd</code> are SIGABRT,
|
||||
SIGBUS, SIGFPE, SIGILL, SIGSEGV, and SIGTRAP. The signal-specific codes vary
|
||||
based on the specific signal.</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
Abort message: 'some_file.c:123: some_function: assertion "false" failed'
|
||||
</pre>
|
||||
|
||||
<p>Not all crashes will have an abort message line, but aborts will. This
|
||||
is automatically gathered from the last line of fatal logcat output for
|
||||
this pid/tid, and in the case of a deliberate abort is likely to give an
|
||||
explanation of why the program killed itself.</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
r0 00000000 r1 00000678 r2 00000006 r3 f70b6dc8
|
||||
r4 f70b6dd0 r5 f70b6d80 r6 00000002 r7 0000010c
|
||||
r8 ffffffed r9 00000000 sl 00000000 fp ff96ae1c
|
||||
ip 00000006 sp ff96ad18 lr f700ced5 pc f700dc98 cpsr 400b0010
|
||||
</pre>
|
||||
|
||||
<p>The register dump shows the content of the CPU registers at the time the
|
||||
signal was received. (This section varies wildly between ABIs.) How useful
|
||||
these are will depend on the exact crash.</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
backtrace:
|
||||
#00 pc 00042c98 /system/lib/libc.so (tgkill+12)
|
||||
#01 pc 00041ed1 /system/lib/libc.so (pthread_kill+32)
|
||||
#02 pc 0001bb87 /system/lib/libc.so (raise+10)
|
||||
#03 pc 00018cad /system/lib/libc.so (__libc_android_abort+34)
|
||||
#04 pc 000168e8 /system/lib/libc.so (abort+4)
|
||||
#05 pc 0001a78f /system/lib/libc.so (__libc_fatal+16)
|
||||
#06 pc 00018d35 /system/lib/libc.so (__assert2+20)
|
||||
#07 pc 00000f21 /system/xbin/crasher
|
||||
#08 pc 00016795 /system/lib/libc.so (__libc_init+44)
|
||||
#09 pc 00000abc /system/xbin/crasher
|
||||
</pre>
|
||||
|
||||
<p>The backtrace shows you where in the code we were at the time of
|
||||
crash. The first column is the frame number (matching gdb's style where
|
||||
the deepest frame is 0). The PC values are relative to the location of the
|
||||
shared library rather than absolute addresses. The next column is the name
|
||||
of the mapped region (which is usually a shared library or executable, but
|
||||
might not be for, say, JIT-compiled code). Finally, if symbols are available,
|
||||
the symbol that the PC value corresponds to is shown, along with the offset
|
||||
into that symbol in bytes. You can use this in conjunction with <code>objdump(1)</code>
|
||||
to find the corresponding assembler instruction.</p>
|
||||
|
||||
<h2 id=tombstones>Tombstones</h2>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
Tombstone written to: /data/tombstones/tombstone_06
|
||||
</pre>
|
||||
|
||||
<p>This tells you where <code>debuggerd</code> wrote extra information.
|
||||
<code>debuggerd</code> will keep up to 10 tombstones, cycling through
|
||||
the numbers 00 to 09 and overwriting existing tombstones as necessary.</p>
|
||||
|
||||
<p>The tombstone contains the same information as the crash dump, plus a
|
||||
few extras. For example, it includes backtraces for <i>all</i> threads (not
|
||||
just the crashing thread), the floating point registers, raw stack dumps,
|
||||
and memory dumps around the addresses in registers. Most usefully it also
|
||||
includes a full memory map (similar to <code>/proc/<i>pid</i>/maps</code>). Here's an
|
||||
annotated example from a 32-bit ARM process crash:</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
memory map: (fault address prefixed with --->)
|
||||
--->ab15f000-ab162fff r-x 0 4000 /system/xbin/crasher (BuildId:
|
||||
b9527db01b5cf8f5402f899f64b9b121)
|
||||
</pre>
|
||||
|
||||
<p>There are two things to note here. The first is that this line is prefixed
|
||||
with "--->". The maps are most useful when your crash isn't just a null
|
||||
pointer dereference. If the fault address is small, it's probably some variant
|
||||
of a null pointer dereference. Otherwise looking at the maps around the fault
|
||||
address can often give you a clue as to what happened. Some possible issues
|
||||
that can be recognized by looking at the maps include:</p>
|
||||
|
||||
<ul>
|
||||
<li>Reads/writes past the end of a block of memory.</li>
|
||||
<li>Reads/writes before the beginning of a block of memory.</li>
|
||||
<li>Attempts to execute non-code.</li>
|
||||
<li>Running off the end of a stack.</li>
|
||||
<li>Attempts to write to code (as in the example above).</li>
|
||||
</ul>
|
||||
|
||||
<p>The second thing to note is that executables and shared libraries files
|
||||
will show the BuildId (if present) in Android M and later, so you can see
|
||||
exactly which version of your code crashed. (Platform binaries include a
|
||||
BuildId by default since Android M. NDK r12 and later automatically pass
|
||||
<code>-Wl,--build-id</code> to the linker too.)</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
ab163000-ab163fff r-- 3000 1000 /system/xbin/crasher
|
||||
ab164000-ab164fff rw- 0 1000
|
||||
f6c80000-f6d7ffff rw- 0 100000 [anon:libc_malloc]
|
||||
</pre>
|
||||
|
||||
<p>On Android the heap isn't necessarily a single region. Heap regions will
|
||||
be labeled <code>[anon:libc_malloc]</code>.</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
f6d82000-f6da1fff r-- 0 20000 /dev/__properties__/u:object_r:logd_prop:s0
|
||||
f6da2000-f6dc1fff r-- 0 20000 /dev/__properties__/u:object_r:default_prop:s0
|
||||
f6dc2000-f6de1fff r-- 0 20000 /dev/__properties__/u:object_r:logd_prop:s0
|
||||
f6de2000-f6de5fff r-x 0 4000 /system/lib/libnetd_client.so (BuildId: 08020aa06ed48cf9f6971861abf06c9d)
|
||||
f6de6000-f6de6fff r-- 3000 1000 /system/lib/libnetd_client.so
|
||||
f6de7000-f6de7fff rw- 4000 1000 /system/lib/libnetd_client.so
|
||||
f6dec000-f6e74fff r-x 0 89000 /system/lib/libc++.so (BuildId: 8f1f2be4b37d7067d366543fafececa2) (load base 0x2000)
|
||||
f6e75000-f6e75fff --- 0 1000
|
||||
f6e76000-f6e79fff r-- 89000 4000 /system/lib/libc++.so
|
||||
f6e7a000-f6e7afff rw- 8d000 1000 /system/lib/libc++.so
|
||||
f6e7b000-f6e7bfff rw- 0 1000 [anon:.bss]
|
||||
f6e7c000-f6efdfff r-x 0 82000 /system/lib/libc.so (BuildId: d189b369d1aafe11feb7014d411bb9c3)
|
||||
f6efe000-f6f01fff r-- 81000 4000 /system/lib/libc.so
|
||||
f6f02000-f6f03fff rw- 85000 2000 /system/lib/libc.so
|
||||
f6f04000-f6f04fff rw- 0 1000 [anon:.bss]
|
||||
f6f05000-f6f05fff r-- 0 1000 [anon:.bss]
|
||||
f6f06000-f6f0bfff rw- 0 6000 [anon:.bss]
|
||||
f6f0c000-f6f21fff r-x 0 16000 /system/lib/libcutils.so (BuildId: d6d68a419dadd645ca852cd339f89741)
|
||||
f6f22000-f6f22fff r-- 15000 1000 /system/lib/libcutils.so
|
||||
f6f23000-f6f23fff rw- 16000 1000 /system/lib/libcutils.so
|
||||
f6f24000-f6f31fff r-x 0 e000 /system/lib/liblog.so (BuildId: e4d30918d1b1028a1ba23d2ab72536fc)
|
||||
f6f32000-f6f32fff r-- d000 1000 /system/lib/liblog.so
|
||||
f6f33000-f6f33fff rw- e000 1000 /system/lib/liblog.so
|
||||
</pre>
|
||||
|
||||
<p>Typically a shared library will have three adjacent entries. One will be
|
||||
readable and executable (code), one will be read-only (read-only
|
||||
data), and one will be read-write (mutable data). The first column
|
||||
shows the address ranges for the mapping, the second column the permissions
|
||||
(in the usual Unix <code>ls(1)</code> style), the third column the offset into the file
|
||||
(in hex), the fourth column the size of the region (in hex), and the fifth
|
||||
column the file (or other region name).</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
f6f34000-f6f53fff r-x 0 20000 /system/lib/libm.so (BuildId: 76ba45dcd9247e60227200976a02c69b)
|
||||
f6f54000-f6f54fff --- 0 1000
|
||||
f6f55000-f6f55fff r-- 20000 1000 /system/lib/libm.so
|
||||
f6f56000-f6f56fff rw- 21000 1000 /system/lib/libm.so
|
||||
f6f58000-f6f58fff rw- 0 1000
|
||||
f6f59000-f6f78fff r-- 0 20000 /dev/__properties__/u:object_r:default_prop:s0
|
||||
f6f79000-f6f98fff r-- 0 20000 /dev/__properties__/properties_serial
|
||||
f6f99000-f6f99fff rw- 0 1000 [anon:linker_alloc_vector]
|
||||
f6f9a000-f6f9afff r-- 0 1000 [anon:atexit handlers]
|
||||
f6f9b000-f6fbafff r-- 0 20000 /dev/__properties__/properties_serial
|
||||
f6fbb000-f6fbbfff rw- 0 1000 [anon:linker_alloc_vector]
|
||||
f6fbc000-f6fbcfff rw- 0 1000 [anon:linker_alloc_small_objects]
|
||||
f6fbd000-f6fbdfff rw- 0 1000 [anon:linker_alloc_vector]
|
||||
f6fbe000-f6fbffff rw- 0 2000 [anon:linker_alloc]
|
||||
f6fc0000-f6fc0fff r-- 0 1000 [anon:linker_alloc]
|
||||
f6fc1000-f6fc1fff rw- 0 1000 [anon:linker_alloc_lob]
|
||||
f6fc2000-f6fc2fff r-- 0 1000 [anon:linker_alloc]
|
||||
f6fc3000-f6fc3fff rw- 0 1000 [anon:linker_alloc_vector]
|
||||
f6fc4000-f6fc4fff rw- 0 1000 [anon:linker_alloc_small_objects]
|
||||
f6fc5000-f6fc5fff rw- 0 1000 [anon:linker_alloc_vector]
|
||||
f6fc6000-f6fc6fff rw- 0 1000 [anon:linker_alloc_small_objects]
|
||||
f6fc7000-f6fc7fff rw- 0 1000 [anon:arc4random _rsx structure]
|
||||
f6fc8000-f6fc8fff rw- 0 1000 [anon:arc4random _rs structure]
|
||||
f6fc9000-f6fc9fff r-- 0 1000 [anon:atexit handlers]
|
||||
f6fca000-f6fcafff --- 0 1000 [anon:thread signal stack guard page]
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Note that since Android 5.0 (Lollipop), the C library names most of its anonymous mapped
|
||||
regions so there are fewer mystery regions.
|
||||
</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
f6fcb000-f6fccfff rw- 0 2000 [stack:5081]
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Regions named <code>[stack:<i>tid</i>]</code> are the stacks for the given threads.
|
||||
</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
f6fcd000-f702afff r-x 0 5e000 /system/bin/linker (BuildId: 84f1316198deee0591c8ac7f158f28b7)
|
||||
f702b000-f702cfff r-- 5d000 2000 /system/bin/linker
|
||||
f702d000-f702dfff rw- 5f000 1000 /system/bin/linker
|
||||
f702e000-f702ffff rw- 0 2000
|
||||
f7030000-f7030fff r-- 0 1000
|
||||
f7031000-f7032fff rw- 0 2000
|
||||
ffcd7000-ffcf7fff rw- 0 21000
|
||||
ffff0000-ffff0fff r-x 0 1000 [vectors]
|
||||
</pre>
|
||||
|
||||
<p>Whether you see <code>[vector]</code> or <code>[vdso]</code> depends on the architecture. ARM uses [vector], while all other architectures use <a href="http://man7.org/linux/man-pages/man7/vdso.7.html">[vdso].</a></p>
|
||||
</body>
|
||||
</html>
|
|
@ -0,0 +1,64 @@
|
|||
<html devsite>
|
||||
<head>
|
||||
<title>Debugging Native Memory Use</title>
|
||||
<meta name="project_path" value="/_project.yaml" />
|
||||
<meta name="book_path" value="/_book.yaml" />
|
||||
</head>
|
||||
<body>
|
||||
<!--
|
||||
Copyright 2017 The Android Open Source Project
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
|
||||
<h2 id="malloc_debug">Malloc debug</h2>
|
||||
|
||||
<p>See <a
|
||||
href="https://android.googlesource.com/platform/bionic/+/master/libc/malloc_debug/README.md">Malloc
|
||||
Debug</a> and <a
|
||||
href="https://android.googlesource.com/platform/bionic/+/master/libc/malloc_debug/README_api.md">Native
|
||||
Memory Tracking using libc Callbacks</a> for a thorough description of the
|
||||
debugging options available for native memory issues.</p>
|
||||
|
||||
<h2 id="malloc-stats">Malloc statistics</h2>
|
||||
|
||||
<p>
|
||||
Android supports the <a href="http://man7.org/linux/man-pages/man3/mallinfo.3.html"
|
||||
><code>mallinfo(3)</code></a>and <a href="http://man7.org/linux/man-pages/man3/malloc_info.3.html"
|
||||
><code>malloc_info(3)</code></a> extensions to <code><malloc.h></code></a>.
|
||||
The <code>malloc_info</code> function is available in Android 6.0 and later and
|
||||
its XML schema is documented in Bionic's
|
||||
<a href="https://android.googlesource.com/platform/bionic/+/master/libc/include/malloc.h"
|
||||
><code><malloc.h></code></a>.
|
||||
</p>
|
||||
|
||||
<h2 id="ddms">Dalvik Debug Monitor Server</h2>
|
||||
|
||||
<p>You can also use the <a
|
||||
href="https://developer.android.com/studio/profile/ddms.html">Dalvik Debug
|
||||
Monitor Server</a> (DDMS) to obtain a graphical view of Malloc Debug
|
||||
output.</p>
|
||||
|
||||
<p>To use DDMS, first turn on its native memory UI:</p>
|
||||
<ol>
|
||||
<li>Open <code>~/.android/ddms.cfg</code></li>
|
||||
<li>Add the line: <code>native=true</code></li>
|
||||
</ol>
|
||||
|
||||
<p>Upon relaunching DDMS and selecting a process, you can switch to the new
|
||||
native allocation tab and populate it with a list of allocations. This is
|
||||
especially useful for debugging memory leaks.</p>
|
||||
|
||||
</body>
|
||||
</html>
|
|
@ -0,0 +1,121 @@
|
|||
<html devsite>
|
||||
<head>
|
||||
<title>Using Strace</title>
|
||||
<meta name="project_path" value="/_project.yaml" />
|
||||
<meta name="book_path" value="/_book.yaml" />
|
||||
</head>
|
||||
<body>
|
||||
<!--
|
||||
Copyright 2017 The Android Open Source Project
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<p><a href="https://strace.io">Strace</a> enables you to see the system calls a
|
||||
process makes and what those system calls return. A typical process makes a lot
|
||||
of system calls, so you'll want to review the
|
||||
<a href="http://man7.org/linux/man-pages/man1/strace.1.html">strace man page</a>
|
||||
to learn how to collect only data you're actually interested in.</p>
|
||||
|
||||
<h2 id=build-strace>Building strace</h2>
|
||||
|
||||
<p>To build strace, run the following:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
mmma -j6 external/strace
|
||||
</pre>
|
||||
|
||||
<h2 id=attach-strace>Attaching to a running process</h2>
|
||||
|
||||
<p>The simplest and most common use case for strace is to attach it to a running
|
||||
process, which you can do with:</p>
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
adb shell strace -f -p PID
|
||||
</pre>
|
||||
<p>The <code>-f</code> flag tells strace to attach to all the threads in the
|
||||
process, plus any new threads spawned later.</p>
|
||||
|
||||
<h2 id=app-strace>Using on an application</h2>
|
||||
<p>To use strace on an application:</p>
|
||||
|
||||
<ol>
|
||||
<li>Set up a directory for strace logs:
|
||||
<pre class="devsite-click-to-copy">
|
||||
<code class="devsite-terminal">adb shell setenforce 0</code>
|
||||
<code class="devsite-terminal">adb shell mkdir /data/local/tmp/strace</code>
|
||||
<code class="devsite-terminal">adb shell chmod 777 /data/local/tmp/strace</code>
|
||||
</pre>
|
||||
</li>
|
||||
|
||||
<li>Choose the process to trace before launching it:
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
adb shell setprop wrap.com.google.android.browser "logwrapper strace -f -o /data/local/tmp/strace/strace.com.google.android.browser.txt"
|
||||
</pre>
|
||||
</li>
|
||||
<li>Launch the process normally.</li>
|
||||
</ol>
|
||||
|
||||
<h2 id=zygote-systrace>Using on the zygote</h2>
|
||||
<p>To use strace on the zygote, fix the relevant <code>init.rc</code> zygote
|
||||
line (requires <code>adb shell setenforce 0</code>):
|
||||
</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
<code class="devsite-terminal">cd system/core/</code>
|
||||
<code class="devsite-terminal">patch -p1 <<EOF
|
||||
--- a/rootdir/init.zygote32.rc
|
||||
+++ b/rootdir/init.zygote32.rc
|
||||
@@ -1,4 +1,4 @@
|
||||
-service zygote /system/bin/app_process -Xzygote /system/bin --zygote --start-system-server
|
||||
+service zygote /system/xbin/strace -o /data/local/tmp/zygote.strace /system/bin/app_process -Xzygote /system/bin --zygote --start-system-server
|
||||
class main
|
||||
socket zygote stream 660 root system
|
||||
onrestart write /sys/android_power/request_state wake
|
||||
EOF</code>
|
||||
</pre>
|
||||
|
||||
<h2 id=get-logs-boot>Getting strace logs during Android boot</h2>
|
||||
|
||||
<p>To get strace logs during Android boot, make the following changes:</p>
|
||||
|
||||
<ul>
|
||||
<li>Since the process name changes from <code>zygote</code> to
|
||||
<code>strace</code>, the given service may fail to start due to the missing
|
||||
SELinux <code>file_context</code> for <code>strace</code>. The solution is to
|
||||
add a new line for strace in <code>system/sepolicy/private/file_contexts</code>
|
||||
and copy the original file context over. Example:
|
||||
<pre class="devsite-click-to-copy">
|
||||
/dev/socket/zygote u:object_r:zygote_socket:s0
|
||||
+ /system/xbin/strace u:object_r:zygote_socket:s0
|
||||
</pre>
|
||||
</li>
|
||||
|
||||
<li>Add kernel command, then boot the device in SELinux permissive mode. You can
|
||||
do this by adding <code>androidboot.selinux=permissive</code>to
|
||||
<code>BOARD_KERNEL_CMDLINE</code>. (This variable becomes read-only in
|
||||
<code>build/core/Makefile</code> but is always available under
|
||||
<code>/device/*/BoardConfig</code>.)
|
||||
|
||||
<br>
|
||||
<br>Example for the Pixel (sailfish) device in
|
||||
<code>/device/google/marlin/sailfish/BoardConfig.mk</code>:
|
||||
<pre class="devsite-click-to-copy">
|
||||
- BOARD_KERNEL_CMDLINE := .... androidboot.hardware=sailfish ...
|
||||
+BOARD_KERNEL_CMDLINE := .... androidboot.hardware=sailfish ... androidboot.selinux=permissive
|
||||
</pre>
|
||||
After making the change, build and flash the boot image and the device will boot
|
||||
in permissive mode.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
</body>
|
||||
</html>
|
|
@ -0,0 +1,335 @@
|
|||
<html devsite>
|
||||
<head>
|
||||
<title>Understanding Systrace</title>
|
||||
<meta name="project_path" value="/_project.yaml" />
|
||||
<meta name="book_path" value="/_book.yaml" />
|
||||
</head>
|
||||
<body>
|
||||
<!--
|
||||
Copyright 2017 The Android Open Source Project
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
|
||||
<p class="caution"><strong>Caution:</strong> If you've never used systrace
|
||||
before, we strongly recommend reading the
|
||||
<a href="https://developer.android.com/studio/profile/systrace.html">systrace
|
||||
overview</a> before continuing.</p>
|
||||
|
||||
<p>systrace is the primary tool for analyzing Android device performance.
|
||||
However, it's really a wrapper around other tools: It is the host-side wrapper
|
||||
around <em>atrace</em>, the device-side executable that controls userspace
|
||||
tracing and sets up <em>ftrace</em>, the primary tracing mechanism in the Linux
|
||||
kernel. systrace uses atrace to enable tracing, then reads the ftrace buffer and
|
||||
wraps it all in a self-contained HTML viewer. (While newer kernels have support
|
||||
for Linux Enhanced Berkeley Packet Filter (eBPF), the following documentation
|
||||
pertains to the 3.18 kernel (no eFPF) as that's what was used on the Pixel/Pixel
|
||||
XL.)</p>
|
||||
|
||||
<p>systrace is owned by the Google Android and Google Chrome teams and is
|
||||
developed in the open as part of the
|
||||
<a href="https://github.com/catapult-project/catapult">Catapult project</a>. In
|
||||
addition to systrace, Catapult includes other useful utilities. For example,
|
||||
ftrace has more features than can be directly enabled by systrace or atrace and
|
||||
contains some advanced functionality that is critical to debugging performance
|
||||
problems. (These features require root access and often a new kernel.)</p>
|
||||
|
||||
<h2 id="running_systrace">Running systrace</h2>
|
||||
<p>When debugging jitter on Pixel/Pixel XL, start with the following
|
||||
command:</p>
|
||||
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
./systrace.py sched freq idle am wm gfx view sync binder_driver irq workq input -b 96000
|
||||
</pre>
|
||||
|
||||
<p>When combined with the additional tracepoints required for GPU and display
|
||||
pipeline activity, this gives you the ability to trace from user input to frame
|
||||
displayed on screen. Set the buffer size to something large to avoid
|
||||
losing events (which usually manifests as some CPUs containing no events after
|
||||
some point in the trace).</p>
|
||||
|
||||
<p>When going through systrace, keep in mind that <strong>every event is
|
||||
triggered by something on the CPU</strong>.</p>
|
||||
|
||||
<p class="note"><strong>Note:</strong> Hardware interrupts are not controlled by
|
||||
the CPU and do trigger things in ftrace, but the actual commit to the trace log
|
||||
is done by the interrupt handler, which could have been delayed if your
|
||||
interrupt arrived while (for example) some other bad driver had interrupts
|
||||
disabled. The critical element is the CPU.</p>
|
||||
|
||||
<p>Because systrace is built on top of ftrace and ftrace runs on the CPU,
|
||||
something on the CPU must write the ftrace buffer that logs hardware changes.
|
||||
This means that if you're curious about why a display fence changed state, you
|
||||
can see what was running on the CPU at the exact point of its transition
|
||||
(something that is running on the CPU triggered that change in the log). This
|
||||
concept is the foundation of analyzing performance using systrace.</p>
|
||||
|
||||
<h2 id="example_1">Example: Working frame</h2>
|
||||
<p>This example describes a systrace for a normal UI pipeline. To follow along
|
||||
with the example, <a href="perf_traces.zip">download the zip file</a> of traces
|
||||
(which also includes other traces referred to in this section), upzip the file,
|
||||
and open the systrace_tutorial.html file in your browser. Be warned this
|
||||
systrace is a large file; unless you use systrace in your day-to-day work, this
|
||||
is likely a much bigger trace with much more information than you've ever seen
|
||||
in a single trace before.</p>
|
||||
<p>For a consistent, periodic workload such as TouchLatency, the UI pipeline
|
||||
contains the following:</p>
|
||||
<p></p>
|
||||
<ol>
|
||||
<li>EventThread in SurfaceFlinger wakes the app UI thread, signaling it's time
|
||||
to render a new frame.</li>
|
||||
<li>App renders frame in UI thread, RenderThread, and hwuiTasks, using CPU and
|
||||
GPU resources. This is the bulk of the capacity spent for UI.</li>
|
||||
<li>App sends rendered frame to SurfaceFlinger via binder and goes to
|
||||
sleep.</li>
|
||||
<li>A second EventThread in SurfaceFlinger wakes SurfaceFlinger to trigger
|
||||
composition and display output. If SurfaceFlinger determines there is no work to
|
||||
be done, it goes back to sleep.</li>
|
||||
<li>SurfaceFlinger handles composition via HWC/HWC2 or GL. HWC/HWC2 composition
|
||||
is faster and lower power but has limitations depending on the SOC. This usually
|
||||
takes ~4-6ms, but can overlap with step 2 because Android applications are
|
||||
always triple-buffered. (While applications are always triple-buffered, there
|
||||
may only be one pending frame waiting in SurfaceFlinger, which makes it appear
|
||||
identical to double-buffering.)</li>
|
||||
<li>SurfaceFlinger dispatches final output to display via vendor driver and goes
|
||||
back to sleep, waiting for EventThread wakeup.</li>
|
||||
</ol>
|
||||
|
||||
<p>Let's walk through the frame beginning at 15409ms:</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_normal_pipeline.png"></p>
|
||||
<p class="img-caption"><strong>Figure 1.</strong> Normal UI pipeline,
|
||||
EventThread running.</p>
|
||||
|
||||
<p>Figure 1 is a normal frame surrounded by normal frames, so it's a good
|
||||
starting point for understanding how the UI pipeline works. The UI thread row
|
||||
for TouchLatency includes different colors at different times. Bars denote
|
||||
different states for the thread:</p>
|
||||
|
||||
<ul>
|
||||
<li><strong>Gray</strong>. Sleeping.</li>
|
||||
<li><strong>Blue</strong>. Runnable (it could run, but the scheduler hasn't
|
||||
picked it to run yet).</li>
|
||||
<li><strong>Green</strong>. Actively running (the scheduler thinks it's
|
||||
running).
|
||||
<p class="note"><strong>Note</strong>: Interrupt handlers aren't shown in the
|
||||
normal per-CPU timeline, so it's possible that you're actually running interrupt
|
||||
handlers or softirqs during some portion of a thread's runtime. Check the irq
|
||||
section of the trace (under process 0) to confirm whether an interrupt is
|
||||
running instead of a standard thread.</p>
|
||||
</li>
|
||||
<li><strong>Red</strong>. Uninterruptible sleep (generally sleeping on a lock
|
||||
in the kernel). Can be indicative of I/O load. Extremely useful for debugging
|
||||
performance issues.</li>
|
||||
<li><strong>Orange</strong>. Uninterruptible sleep due to I/O load.</li>
|
||||
</ul>
|
||||
|
||||
<p>To view the reason for uninterruptible sleep (available from the
|
||||
<code>sched_blocked_reason</code> tracepoint), select the red uninterruptible
|
||||
sleep slice.</p>
|
||||
<p>While EventThread is running, the UI thread for TouchLatency becomes
|
||||
runnable. To see what woke it, click the blue section:</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_tl.png"></p>
|
||||
<p class="img-caption"><strong>Figure 2.</strong> UI thread for
|
||||
TouchLatency.</p>
|
||||
|
||||
<p>Figure 2 shows the TouchLatency UI thread was woken by tid 6843, which
|
||||
corresponds to EventThread. The UI thread wakes:</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_wake_render_enqueue.png"></p>
|
||||
<p class="img-caption"><strong>Figure 3.</strong> UI thread wakes, renders a
|
||||
frame, and enqueues it for SurfaceFlinger to consume.</p>
|
||||
|
||||
<p>If the <code>binder_driver</code> tag is enabled in a trace, you can select a
|
||||
binder transaction to view information on all of the processes involved in that
|
||||
transaction:</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_binder_trans.png"></p>
|
||||
<p class="img-caption"><strong>Figure 4.</strong> Binder transaction.</p>
|
||||
|
||||
<p>Figure 4 shows that, at 15,423.65ms, Binder:6832_1 in SurfaceFlinger becomes
|
||||
runnable because of tid 9579, which is TouchLatency's RenderThread. You can also
|
||||
see queueBuffer on both sides of the binder transaction.</p>
|
||||
|
||||
<p>During the queueBuffer on the SurfaceFlinger side, the number of pending
|
||||
frames from TouchLatency goes from 1 to 2:</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_pending_frames.png"></p>
|
||||
<p class="img-caption"><strong>Figure 5.</strong> Pending frames goes from 1 to
|
||||
2.</p>
|
||||
|
||||
<p>Figure 5 shows triple-buffering, where there are two completed frames and the
|
||||
app will soon start rendering a third. This is because we've already dropped
|
||||
some frames, so the app keeps two pending frames instead of one to try to avoid
|
||||
further dropped frames.</p>
|
||||
|
||||
<p>Soon after, SurfaceFlinger's main thread is woken by a second EventThread so
|
||||
it can output the older pending frame to the display:</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_sf_woken_et.png"></p>
|
||||
<p class="img-caption"><strong>Figure 6.</strong> SurfaceFlinger's main thread
|
||||
is woken by a second EventThread.</p>
|
||||
|
||||
<p>SurfaceFlinger first latches the older pending buffer, which causes the
|
||||
pending buffer count to decrease from 2 to 1:</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_sf_latches_pend.png"></p>
|
||||
<p class="img-caption"><strong>Figure 7.</strong> SurfaceFlinger first latches
|
||||
the older pending buffer.</p>
|
||||
|
||||
<p>After latching the buffer, SurfaceFlinger sets up composition and submits the
|
||||
final frame to the display. (Some of these sections are enabled as part of the
|
||||
<code>mdss</code> tracepoint, so they may not be there on your SOC.)</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_sf_comp_submit.png"></p>
|
||||
<p class="img-caption"><strong>Figure 8.</strong> SurfaceFlinger sets up
|
||||
composition and submits the final frame.</p>
|
||||
|
||||
<p>Next, <code>mdss_fb0</code> wakes on CPU 0. <code>mdss_fb0</code> is the
|
||||
display pipeline's kernel thread for outputting a rendered frame to the display.
|
||||
We can see <code>mdss_fb0</code> as its own row in the trace (scroll down to
|
||||
view).</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_wake_cpu0.png"></p>
|
||||
<p class="img-caption"><strong>Figure 9.</strong> <code>mdss_fb0</code> wakes
|
||||
on CPU 0.</p>
|
||||
|
||||
<p><code>mdss_fb0</code> wakes up, runs for a bit, enters uninterruptible sleep,
|
||||
then wakes again.</p>
|
||||
|
||||
<p class="note"><strong>Note</strong>: From this point forward, the trace is
|
||||
more complicated as the final work is split between <code>mdss_fb0</code>,
|
||||
interrupts, and workqueue functions. If you need that level of detail, refer to
|
||||
the exact characteristics of the driver stack for your SOC (as what happens on
|
||||
the Pixel XL might not be useful to you).</p>
|
||||
|
||||
<h2 id="example_2">Example: Non-working frame</h2>
|
||||
<p>This example describes a systrace used to debug Pixel/Pixel XL jitter. To
|
||||
follow along with the example, <a href="/devices/tech/debug/perf_traces.zip">download the zip
|
||||
file</a> of traces (which also includes other traces referred to in this
|
||||
section), upzip the file, and open the systrace_tutorial.html file in your
|
||||
browser.</p>
|
||||
|
||||
<p>When you first open the systrace, you'll see something like this:</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_tl_pxl.png"></p>
|
||||
<p class="img-caption"><strong>Figure 10</strong>. TouchLatency running on Pixel
|
||||
XL (most options enabled, including mdss and kgsl tracepoints).</p>
|
||||
|
||||
<p>When looking for jank, check the FrameMissed row under SurfaceFlinger.
|
||||
FrameMissed is a quality-of-life improvement provided by Hardware Composer 2
|
||||
(HWC2). As of December 2016, HWC2 is used only on Pixel/Pixel XL; when viewing
|
||||
systrace for other devices, the FrameMissed row may not be present. In either
|
||||
case, FrameMissed is correlated with SurfaceFlinger missing one of its
|
||||
extremely-regular runtimes and an unchanged pending-buffer count for the app
|
||||
(<code>com.prefabulated.touchlatency</code>) at a vsync:</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_fm_sf.png"></p>
|
||||
<p class="img-caption"><strong>Figure 11</strong>. FrameMissed correlation with
|
||||
SurfaceFlinger.</p>
|
||||
|
||||
<p>Figure 11 shows a missed frame at 15598.29ms. SurfaceFlinger woke briefly at
|
||||
the vsync interval and went back to sleep without doing any work, which means
|
||||
SurfaceFlinger determined it was not worth trying to send a frame to the display
|
||||
again. Why?</p>
|
||||
|
||||
<p>To understand how the pipeline broke down for this frame, first review the
|
||||
working frame example above to see how a normal UI pipeline appears in systrace.
|
||||
When ready, return to the missed frame and work backwards. Notice that
|
||||
SurfaceFlinger wakes and immediately goes to sleep. When viewing the number of
|
||||
pending frames from TouchLatency, there are two frames (a good clue to help
|
||||
figure out what's going on).</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_sf_wake_sleep.png"></p>
|
||||
<p class="img-caption"><strong>Figure 12.</strong> SurfaceFlinger wakes and
|
||||
immediately goes to sleep.</p>
|
||||
|
||||
<p>Because we have frames in SurfaceFlinger, it's not an app issue. In addition,
|
||||
SurfaceFlinger is waking at the correct time, so it's not a SurfaceFlinger
|
||||
issue. If SurfaceFlinger and the app are both looking normal, it's probably a
|
||||
driver issue.</p>
|
||||
|
||||
<p>Because the <code>mdss</code> and <code>sync</code> tracepoints are enabled,
|
||||
we can get information about the fences (shared between the display driver and
|
||||
SurfaceFlinger) that control when frames are actually submitted to the display.
|
||||
The fences we care about are listed under <code>mdss_fb0_retire</code>, which
|
||||
denotes when a frame is actually on the display. These fences are provided as
|
||||
part of the <code>sync</code> trace category. Which fences correspond to
|
||||
particular events in SurfaceFlinger depends on your SOC and driver stack, so
|
||||
work with your SOC vendor to understand the meaning of fence categories in your
|
||||
traces.</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_traces_fences.png"></p>
|
||||
<p class="img-caption"><strong>Figure 13.</strong> <code>mdss_fb0_retire</code>
|
||||
fences.</p>
|
||||
|
||||
<p>Figure 13 shows a frame that was displayed for 33ms, not 16.7ms as expected.
|
||||
Halfway through that slice, that frame should have been replaced by a new one
|
||||
but wasn't. View the previous frame and look for anything interesting:</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_frame_previous.png"></p>
|
||||
<p class="img-caption"><strong>Figure 14.</strong> Frame previous to busted
|
||||
frame.</p>
|
||||
|
||||
<p>Figure 14 shows 14.482ms a frame. The busted two-frame segment was 33.6ms,
|
||||
which is roughly what we would expect for two frames (we render at 60Hz, 16.7ms
|
||||
a frame, which is close). But 14.482ms is not anywhere close to 16.7ms, which
|
||||
suggests that something is very wrong with the display pipe.</p>
|
||||
|
||||
<p>Investigate exactly where that fence ends to determine what controls it:</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_fence_end.png"></p>
|
||||
<p class="img-caption"><strong>Figure 15.</strong> Investigate fence end.</p>
|
||||
|
||||
<p>A workqueue contains a <code>__vsync_retire_work_handler</code> that is
|
||||
running when the fence changes. Looking through the kernel source, you can see
|
||||
it's part of the display driver. It definitely appears to be on the critical
|
||||
path for the display pipeline, so it must run as quickly as possible. It's
|
||||
runnable for 70us or so (not a long scheduling delay), but it's a workqueue and
|
||||
might not get scheduled accurately.</p>
|
||||
|
||||
<p>Check the previous frame to determine if that contributed; sometimes jitter
|
||||
can add up over time and eventually cause us to miss a deadline.</p>
|
||||
|
||||
<p><img src="/devices/tech/debug/images/perf_trace_previous_frame.png"></p>
|
||||
<p class="img-caption"><strong>Figure 16.</strong> Previous frame.</p>
|
||||
|
||||
<p>The runnable line on the kworker isn't visible because the viewer turns it
|
||||
white when it's selected, but the statistics tell the story: 2.3ms of scheduler
|
||||
delay for part of the display pipeline critical path is <strong>bad</strong>.
|
||||
Before we do anything else, we should fix that by moving this part of the
|
||||
display pipeline critical path from a workqueue (which runs as a
|
||||
<code>SCHED_OTHER</code> CFS thread) to a dedicated <code>SCHED_FIFO</code>
|
||||
kthread. This function needs timing guarantees that workqueues can't (and aren't
|
||||
intended to) provide.</p>
|
||||
|
||||
<p>Is this the reason for the jank? It's hard to say conclusively. Outside of
|
||||
easy-to-diagnose cases such as kernel lock contention causing display-critical
|
||||
threads to sleep, traces usually won't tell you directly what the problem is.
|
||||
Could this jitter have been the cause of the dropped frame? Absolutely. The
|
||||
fence times should be 16.7ms, but they aren't close to that at all in the frames
|
||||
leading up to the dropped frame. (There's a 19ms fence followed by a 14ms
|
||||
fence.) Given how tightly coupled the display pipeline is, it's entirely
|
||||
possible the jitter around fence timings resulted in an eventual dropped
|
||||
frame.</p>
|
||||
|
||||
<p>In this example, the solution involved converting
|
||||
<code>__vsync_retire_work_handler</code> from a workqueue to a dedicated
|
||||
kthread. This resulted in noticeable jitter improvements and reduced jank in the
|
||||
bouncing ball test. Subsequent traces show fence timings that hover very close
|
||||
to 16.7ms.</p>
|
||||
|
||||
</body>
|
||||
</html>
|
|
@ -0,0 +1,75 @@
|
|||
<html devsite>
|
||||
<head>
|
||||
<title>Using Valgrind</title>
|
||||
<meta name="project_path" value="/_project.yaml" />
|
||||
<meta name="book_path" value="/_book.yaml" />
|
||||
</head>
|
||||
<body>
|
||||
<!--
|
||||
Copyright 2017 The Android Open Source Project
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<p>The <a href="http://valgrind.org/">Valgrind</a> tool suite includes Memcheck
|
||||
(for detecting memory-related errors in C and C++), Cachegrind (a cache
|
||||
profiler), Massif (a heap profiler), and several other tools.</p>
|
||||
|
||||
<p class="note"><strong>Note:</strong> While you can use Valgrind for Android
|
||||
debugging, most Android platform developers use
|
||||
<a href="/devices/tech/debug/asan.html">AddressSanitizer</a> instead.</p>
|
||||
|
||||
<h2 id=build-valgrind>Building Valgrind</h2>
|
||||
<p>To build Valgrind:</p>
|
||||
<pre class="devsite-terminal devsite-click-to-copy">
|
||||
mmma -j6 external/valgrind
|
||||
</pre>
|
||||
|
||||
<h2 id=app-valgrind>Using on an application</h2>
|
||||
<p>To use Valgrind on an application:</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
<code class="devsite-terminal">adb shell setprop wrap.<em>app_name</em> "logwrapper valgrind"</code>
|
||||
<code class="devsite-terminal">adb shell am start -a android.intent.action.MAIN -n <em>app_name</em>/.MainActivity</code>
|
||||
</pre>
|
||||
|
||||
<code><em>app_name</em></code> must be a fully-qualified name such as
|
||||
<strong>com.google.example_app</strong>.
|
||||
|
||||
<h2 id=server-valgrind>Using on the system server</h2>
|
||||
<p>To run the system server with Valgrind:</p>
|
||||
|
||||
<pre class="devsite-click-to-copy">
|
||||
<code class="devsite-terminal">adb shell setprop wrap.system_server "logwrapper valgrind"</code>
|
||||
<code class="devsite-terminal">adb shell stop && adb shell start</code>
|
||||
</pre>
|
||||
|
||||
|
||||
<h2 id=symbols-valgrind>Getting debug symbols</h2>
|
||||
<p>For debug symbols, push unstripped libraries to <code>/data/local/symbols</code>:</p>
|
||||
<pre class="devsite-click-to-copy">
|
||||
<code class="devsite-terminal">adb shell mkdir /data/local/symbols</code>
|
||||
<code class="devsite-terminal">adb push $OUT/symbols /data/local/symbols</code>
|
||||
</pre>
|
||||
|
||||
<h2 id=boot-valgrind>Using during boot up</h2>
|
||||
<p>To use Valgrind during boot up, edit
|
||||
<code>out/target/product/XXXX/root/init.rc</code> by changing</p>
|
||||
<pre class="devsite-click-to-copy">service example /system/bin/foo --arg1 --arg2</pre>
|
||||
<p>to:</p>
|
||||
<pre class="devsite-click-to-copy">>service example /system/bin/logwrapper /system/bin/valgrind /system/bin/foo --arg1 --arg2</pre>
|
||||
<p>To see the effects, create a <code>boot.img</code> and reflash the
|
||||
device.</p>
|
||||
|
||||
</body>
|
||||
</html>
|