upload android base code part4
This commit is contained in:
parent
b9e30e05b1
commit
78ea2404cd
23455 changed files with 5250148 additions and 0 deletions
360
android/cts/hostsidetests/sustainedperf/dhrystone/Rationale
Normal file
360
android/cts/hostsidetests/sustainedperf/dhrystone/Rationale
Normal file
|
@ -0,0 +1,360 @@
|
|||
Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules
|
||||
|
||||
Reinhold P. Weicker
|
||||
Siemens AG, E STE 35
|
||||
Postfach 3240
|
||||
D-8520 Erlangen
|
||||
Germany (West)
|
||||
|
||||
|
||||
|
||||
|
||||
The Dhrystone benchmark program [1] has become a popular benchmark for
|
||||
CPU/compiler performance measurement, in particular in the area of
|
||||
minicomputers, workstations, PC's and microprocesors. It apparently
|
||||
satisfies a need for an easy-to-use integer benchmark; it gives a first
|
||||
performance indication which is more meaningful than MIPS numbers
|
||||
which, in their literal meaning (million instructions per second),
|
||||
cannot be used across different instruction sets (e.g. RISC vs. CISC).
|
||||
With the increasing use of the benchmark, it seems necessary to
|
||||
reconsider the benchmark and to check whether it can still fulfill this
|
||||
function. Version 2 of Dhrystone is the result of such a re-
|
||||
evaluation, it has been made for two reasons:
|
||||
|
||||
o Dhrystone has been published in Ada [1], and Versions in Ada, Pascal
|
||||
and C have been distributed by Reinhold Weicker via floppy disk.
|
||||
However, the version that was used most often for benchmarking has
|
||||
been the version made by Rick Richardson by another translation from
|
||||
the Ada version into the C programming language, this has been the
|
||||
version distributed via the UNIX network Usenet [2].
|
||||
|
||||
There is an obvious need for a common C version of Dhrystone, since C
|
||||
is at present the most popular system programming language for the
|
||||
class of systems (microcomputers, minicomputers, workstations) where
|
||||
Dhrystone is used most. There should be, as far as possible, only
|
||||
one C version of Dhrystone such that results can be compared without
|
||||
restrictions. In the past, the C versions distributed by Rick
|
||||
Richardson (Version 1.1) and by Reinhold Weicker had small (though
|
||||
not significant) differences.
|
||||
|
||||
Together with the new C version, the Ada and Pascal versions have
|
||||
been updated as well.
|
||||
|
||||
o As far as it is possible without changes to the Dhrystone statistics,
|
||||
optimizing compilers should be prevented from removing significant
|
||||
statements. It has turned out in the past that optimizing compilers
|
||||
suppressed code generation for too many statements (by "dead code
|
||||
removal" or "dead variable elimination"). This has lead to the
|
||||
danger that benchmarking results obtained by a naive application of
|
||||
Dhrystone - without inspection of the code that was generated - could
|
||||
become meaningless.
|
||||
|
||||
The overall policiy for version 2 has been that the distribution of
|
||||
statements, operand types and operand locality described in [1] should
|
||||
remain unchanged as much as possible. (Very few changes were
|
||||
necessary; their impact should be negligible.) Also, the order of
|
||||
statements should remain unchanged. Although I am aware of some
|
||||
critical remarks on the benchmark - I agree with several of them - and
|
||||
know some suggestions for improvement, I didn't want to change the
|
||||
benchmark into something different from what has become known as
|
||||
"Dhrystone"; the confusion generated by such a change would probably
|
||||
outweight the benefits. If I were to write a new benchmark program, I
|
||||
wouldn't give it the name "Dhrystone" since this denotes the program
|
||||
published in [1]. However, I do recognize the need for a larger number
|
||||
of representative programs that can be used as benchmarks; users should
|
||||
always be encouraged to use more than just one benchmark.
|
||||
|
||||
The new versions (version 2.1 for C, Pascal and Ada) will be
|
||||
distributed as widely as possible. (Version 2.1 differs from version
|
||||
2.0 distributed via the UNIX Network Usenet in March 1988 only in a few
|
||||
corrections for minor deficiencies found by users of version 2.0.)
|
||||
Readers who want to use the benchmark for their own measurements can
|
||||
obtain a copy in machine-readable form on floppy disk (MS-DOS or XENIX
|
||||
format) from the author.
|
||||
|
||||
|
||||
In general, version 2 follows - in the parts that are significant for
|
||||
performance measurement, i.e. within the measurement loop - the
|
||||
published (Ada) version and the C versions previously distributed.
|
||||
Where the versions distributed by Rick Richardson [2] and Reinhold
|
||||
Weicker have been different, it follows the version distributed by
|
||||
Reinhold Weicker. (However, the differences have been so small that
|
||||
their impact on execution time in all likelihood has been negligible.)
|
||||
The initialization and UNIX instrumentation part - which had been
|
||||
omitted in [1] - follows mostly the ideas of Rick Richardson [2].
|
||||
However, any changes in the initialization part and in the printing of
|
||||
the result have no impact on performance measurement since they are
|
||||
outside the measaurement loop. As a concession to older compilers,
|
||||
names have been made unique within the first 8 characters for the C
|
||||
version.
|
||||
|
||||
The original publication of Dhrystone did not contain any statements
|
||||
for time measurement since they are necessarily system-dependent.
|
||||
However, it turned out that it is not enough just to inclose the main
|
||||
procedure of Dhrystone in a loop and to measure the execution time. If
|
||||
the variables that are computed are not used somehow, there is the
|
||||
danger that the compiler considers them as "dead variables" and
|
||||
suppresses code generation for a part of the statements. Therefore in
|
||||
version 2 all variables of "main" are printed at the end of the
|
||||
program. This also permits some plausibility control for correct
|
||||
execution of the benchmark.
|
||||
|
||||
At several places in the benchmark, code has been added, but only in
|
||||
branches that are not executed. The intention is that optimizing
|
||||
compilers should be prevented from moving code out of the measurement
|
||||
loop, or from removing code altogether. Statements that are executed
|
||||
have been changed in very few places only. In these cases, only the
|
||||
role of some operands has been changed, and it was made sure that the
|
||||
numbers defining the "Dhrystone distribution" (distribution of
|
||||
statements, operand types and locality) still hold as much as possible.
|
||||
Except for sophisticated optimizing compilers, execution times for
|
||||
version 2.1 should be the same as for previous versions.
|
||||
|
||||
Because of the self-imposed limitation that the order and distribution
|
||||
of the executed statements should not be changed, there are still cases
|
||||
where optimizing compilers may not generate code for some statements.
|
||||
To a certain degree, this is unavoidable for small synthetic
|
||||
benchmarks. Users of the benchmark are advised to check code listings
|
||||
whether code is generated for all statements of Dhrystone.
|
||||
|
||||
Contrary to the suggestion in the published paper and its realization
|
||||
in the versions previously distributed, no attempt has been made to
|
||||
subtract the time for the measurement loop overhead. (This calculation
|
||||
has proven difficult to implement in a correct way, and its omission
|
||||
makes the program simpler.) However, since the loop check is now part
|
||||
of the benchmark, this does have an impact - though a very minor one -
|
||||
on the distribution statistics which have been updated for this
|
||||
version.
|
||||
|
||||
|
||||
In this section, all changes are described that affect the measurement
|
||||
loop and that are not just renamings of variables. All remarks refer to
|
||||
the C version; the other language versions have been updated similarly.
|
||||
|
||||
In addition to adding the measurement loop and the printout statements,
|
||||
changes have been made at the following places:
|
||||
|
||||
o In procedure "main", three statements have been added in the non-
|
||||
executed "then" part of the statement
|
||||
if (Enum_Loc == Func_1 (Ch_Index, 'C'))
|
||||
they are
|
||||
strcpy (Str_2_Loc, "DHRYSTONE PROGRAM, 3'RD STRING");
|
||||
Int_2_Loc = Run_Index;
|
||||
Int_Glob = Run_Index;
|
||||
The string assignment prevents movement of the preceding assignment
|
||||
to Str_2_Loc (5'th statement of "main") out of the measurement loop
|
||||
(This probably will not happen for the C version, but it did happen
|
||||
with another language and compiler.) The assignment to Int_2_Loc
|
||||
prevents value propagation for Int_2_Loc, and the assignment to
|
||||
Int_Glob makes the value of Int_Glob possibly dependent from the
|
||||
value of Run_Index.
|
||||
|
||||
o In the three arithmetic computations at the end of the measurement
|
||||
loop in "main ", the role of some variables has been exchanged, to
|
||||
prevent the division from just cancelling out the multiplication as
|
||||
it was in [1]. A very smart compiler might have recognized this and
|
||||
suppressed code generation for the division.
|
||||
|
||||
o For Proc_2, no code has been changed, but the values of the actual
|
||||
parameter have changed due to changes in "main".
|
||||
|
||||
o In Proc_4, the second assignment has been changed from
|
||||
Bool_Loc = Bool_Loc | Bool_Glob;
|
||||
to
|
||||
Bool_Glob = Bool_Loc | Bool_Glob;
|
||||
It now assigns a value to a global variable instead of a local
|
||||
variable (Bool_Loc); Bool_Loc would be a "dead variable" which is not
|
||||
used afterwards.
|
||||
|
||||
o In Func_1, the statement
|
||||
Ch_1_Glob = Ch_1_Loc;
|
||||
was added in the non-executed "else" part of the "if" statement, to
|
||||
prevent the suppression of code generation for the assignment to
|
||||
Ch_1_Loc.
|
||||
|
||||
o In Func_2, the second character comparison statement has been changed
|
||||
to
|
||||
if (Ch_Loc == 'R')
|
||||
('R' instead of 'X') because a comparison with 'X' is implied in the
|
||||
preceding "if" statement.
|
||||
|
||||
Also in Func_2, the statement
|
||||
Int_Glob = Int_Loc;
|
||||
has been added in the non-executed part of the last "if" statement,
|
||||
in order to prevent Int_Loc from becoming a dead variable.
|
||||
|
||||
o In Func_3, a non-executed "else" part has been added to the "if"
|
||||
statement. While the program would not be incorrect without this
|
||||
"else" part, it is considered bad programming practice if a function
|
||||
can be left without a return value.
|
||||
|
||||
To compensate for this change, the (non-executed) "else" part in the
|
||||
"if" statement of Proc_3 was removed.
|
||||
|
||||
The distribution statistics have been changed only by the addition of
|
||||
the measurement loop iteration (1 additional statement, 4 additional
|
||||
local integer operands) and by the change in Proc_4 (one operand
|
||||
changed from local to global). The distribution statistics in the
|
||||
comment headers have been updated accordingly.
|
||||
|
||||
|
||||
The string operations (string assignment and string comparison) have
|
||||
not been changed, to keep the program consistent with the original
|
||||
version.
|
||||
|
||||
There has been some concern that the string operations are over-
|
||||
represented in the program, and that execution time is dominated by
|
||||
these operations. This was true in particular when optimizing
|
||||
compilers removed too much code in the main part of the program, this
|
||||
should have been mitigated in version 2.
|
||||
|
||||
It should be noted that this is a language-dependent issue: Dhrystone
|
||||
was first published in Ada, and with Ada or Pascal semantics, the time
|
||||
spent in the string operations is, at least in all implementations
|
||||
known to me, considerably smaller. In Ada and Pascal, assignment and
|
||||
comparison of strings are operators defined in the language, and the
|
||||
upper bounds of the strings occuring in Dhrystone are part of the type
|
||||
information known at compilation time. The compilers can therefore
|
||||
generate efficient inline code. In C, string assignemt and comparisons
|
||||
are not part of the language, so the string operations must be
|
||||
expressed in terms of the C library functions "strcpy" and "strcmp".
|
||||
(ANSI C allows an implementation to use inline code for these
|
||||
functions.) In addition to the overhead caused by additional function
|
||||
calls, these functions are defined for null-terminated strings where
|
||||
the length of the strings is not known at compilation time; the
|
||||
function has to check every byte for the termination condition (the
|
||||
null byte).
|
||||
|
||||
Obviously, a C library which includes efficiently coded "strcpy" and
|
||||
"strcmp" functions helps to obtain good Dhrystone results. However, I
|
||||
don't think that this is unfair since string functions do occur quite
|
||||
frequently in real programs (editors, command interpreters, etc.). If
|
||||
the strings functions are implemented efficiently, this helps real
|
||||
programs as well as benchmark programs.
|
||||
|
||||
I admit that the string comparison in Dhrystone terminates later (after
|
||||
scanning 20 characters) than most string comparisons in real programs.
|
||||
For consistency with the original benchmark, I didn't change the
|
||||
program despite this weakness.
|
||||
|
||||
|
||||
When Dhrystone is used, the following "ground rules" apply:
|
||||
|
||||
o Separate compilation (Ada and C versions)
|
||||
|
||||
As mentioned in [1], Dhrystone was written to reflect actual
|
||||
programming practice in systems programming. The division into
|
||||
several compilation units (5 in the Ada version, 2 in the C version)
|
||||
is intended, as is the distribution of inter-module and intra-module
|
||||
subprogram calls. Although on many systems there will be no
|
||||
difference in execution time to a Dhrystone version where all
|
||||
compilation units are merged into one file, the rule is that separate
|
||||
compilation should be used. The intention is that real programming
|
||||
practice, where programs consist of several independently compiled
|
||||
units, should be reflected. This also has implies that the compiler,
|
||||
while compiling one unit, has no information about the use of
|
||||
variables, register allocation etc. occuring in other compilation
|
||||
units. Although in real life compilation units will probably be
|
||||
larger, the intention is that these effects of separate compilation
|
||||
are modeled in Dhrystone.
|
||||
|
||||
A few language systems have post-linkage optimization available
|
||||
(e.g., final register allocation is performed after linkage). This
|
||||
is a borderline case: Post-linkage optimization involves additional
|
||||
program preparation time (although not as much as compilation in one
|
||||
unit) which may prevent its general use in practical programming. I
|
||||
think that since it defeats the intentions given above, it should not
|
||||
be used for Dhrystone.
|
||||
|
||||
Unfortunately, ISO/ANSI Pascal does not contain language features for
|
||||
separate compilation. Although most commercial Pascal compilers
|
||||
provide separate compilation in some way, we cannot use it for
|
||||
Dhrystone since such a version would not be portable. Therefore, no
|
||||
attempt has been made to provide a Pascal version with several
|
||||
compilation units.
|
||||
|
||||
o No procedure merging
|
||||
|
||||
Although Dhrystone contains some very short procedures where
|
||||
execution would benefit from procedure merging (inlining, macro
|
||||
expansion of procedures), procedure merging is not to be used. The
|
||||
reason is that the percentage of procedure and function calls is part
|
||||
of the "Dhrystone distribution" of statements contained in [1]. This
|
||||
restriction does not hold for the string functions of the C version
|
||||
since ANSI C allows an implementation to use inline code for these
|
||||
functions.
|
||||
|
||||
|
||||
|
||||
o Other optimizations are allowed, but they should be indicated
|
||||
|
||||
It is often hard to draw an exact line between "normal code
|
||||
generation" and "optimization" in compilers: Some compilers perform
|
||||
operations by default that are invoked in other compilers only when
|
||||
optimization is explicitly requested. Also, we cannot avoid that in
|
||||
benchmarking people try to achieve results that look as good as
|
||||
possible. Therefore, optimizations performed by compilers - other
|
||||
than those listed above - are not forbidden when Dhrystone execution
|
||||
times are measured. Dhrystone is not intended to be non-optimizable
|
||||
but is intended to be similarly optimizable as normal programs. For
|
||||
example, there are several places in Dhrystone where performance
|
||||
benefits from optimizations like common subexpression elimination,
|
||||
value propagation etc., but normal programs usually also benefit from
|
||||
these optimizations. Therefore, no effort was made to artificially
|
||||
prevent such optimizations. However, measurement reports should
|
||||
indicate which compiler optimization levels have been used, and
|
||||
reporting results with different levels of compiler optimization for
|
||||
the same hardware is encouraged.
|
||||
|
||||
o Default results are those without "register" declarations (C version)
|
||||
|
||||
When Dhrystone results are quoted without additional qualification,
|
||||
they should be understood as results obtained without use of the
|
||||
"register" attribute. Good compilers should be able to make good use
|
||||
of registers even without explicit register declarations ([3], p.
|
||||
193).
|
||||
|
||||
Of course, for experimental purposes, post-linkage optimization,
|
||||
procedure merging and/or compilation in one unit can be done to
|
||||
determine their effects. However, Dhrystone numbers obtained under
|
||||
these conditions should be explicitly marked as such; "normal"
|
||||
Dhrystone results should be understood as results obtained following
|
||||
the ground rules listed above.
|
||||
|
||||
In any case, for serious performance evaluation, users are advised to
|
||||
ask for code listings and to check them carefully. In this way, when
|
||||
results for different systems are compared, the reader can get a
|
||||
feeling how much performance difference is due to compiler optimization
|
||||
and how much is due to hardware speed.
|
||||
|
||||
|
||||
The C version 2.1 of Dhrystone has been developed in cooperation with
|
||||
Rick Richardson (Tinton Falls, NJ), it incorporates many ideas from the
|
||||
"Version 1.1" distributed previously by him over the UNIX network
|
||||
Usenet. Through his activity with Usenet, Rick Richardson has made a
|
||||
very valuable contribution to the dissemination of the benchmark. I
|
||||
also thank Chaim Benedelac (National Semiconductor), David Ditzel
|
||||
(SUN), Earl Killian and John Mashey (MIPS), Alan Smith and Rafael
|
||||
Saavedra-Barrera (UC at Berkeley) for their help with comments on
|
||||
earlier versions of the benchmark.
|
||||
|
||||
|
||||
[1]
|
||||
Reinhold P. Weicker: Dhrystone: A Synthetic Systems Programming
|
||||
Benchmark.
|
||||
Communications of the ACM 27, 10 (Oct. 1984), 1013-1030
|
||||
|
||||
[2]
|
||||
Rick Richardson: Dhrystone 1.1 Benchmark Summary (and Program Text)
|
||||
Informal Distribution via "Usenet", Last Version Known to me: Sept.
|
||||
21, 1987
|
||||
|
||||
[3]
|
||||
Brian W. Kernighan and Dennis M. Ritchie: The C Programming
|
||||
Language.
|
||||
Prentice-Hall, Englewood Cliffs (NJ) 1978
|
||||
|
||||
|
||||
|
||||
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue