 |
» |
|
|
 |
dcpiprofileme(1)
NAME
dcpiprofileme - Uses HP DCPI to collect and view ProfileMe data
COLLECTING PROFILEME SAMPLES
On an EV67 (or later) system, tell DCPI to gather ProfileMe data via the command:
dcpid -event pm <profile db dir>
This causes dcpi to collect ProfileMe samples. The data for
each sample is decomposed into named bit and counter values.
BIT NAMES AND THEIR MEANINGS
- retired
- The instruction retired, that is, it was not in the shadow of any trap.
However, it may have caused a mispredict trap.
- taken
- The conditional branch was taken. This bit is undefined for samples for
instructions other than conditional branches or for a conditional branch
when it mispredicts.
- cbrmispredict
- The conditional branch was mispredicted. This bit is clear for
instructions other than conditional branches.
- valid
- The instruction retired and did not cause a trap.
- nyp
- Stands for Not Yet Prefetched. Indicates that when the fetcher asked
for the fetch block containing the instruction, the instruction was not in
the icache and the prefetcher had not yet initiated an off-chip request for
the instruction.
If nyp is set, the instruction's fetch block definitely caused an icache
miss stall.
If nyp is clear, the instruction's fetch block may have still caused an
icache miss stall: the prefetcher may have made an off-chip request for the
instruction, but the instruction may not have arrived at the time the
fetcher needed it.
- ldstorder
- Indicates that a replay trap was caused by one of the
following:
- load store order
a younger load issuing before an older store to the same physical
address
- troll order
a younger load issuing before an older store where the dcache indexes
for the physical addresses match but the higher order address bits are
different
- simultaneous load and store
a load and a store to the same physical address issuing simultaneously
In all three cases, the younger instruction causes a replay trap.
Untested.
- map_stall
- The instruction stalled after it was fetched and before it was mapped.
Such stalls are caused by a shortage of physical registers, integer issue
queue space, floating-point issue queue space, or inums. There are 80 inums
used to track instructions that are in flight.
- early_kill
- The instruction was killed early in the pipeline (before it entered an
issue queue).
- late_kill
- The instruction was killed late in the pipeline.
COUNTER NAMES AND THEIR MEANINGS
- retdelay
- A lower bound on the number of cycles that the instruction's inum
delayed the advance of the retire pointer. Large values indicate a probable
performance problem. For example, the retdelay of the first instruction that uses
the result of a load that misses out to memory might have a value of 100.
- inflight
- For instructions that retired without trapping (retired^notrap), this is
-3 plus the number of cycles elapsed from when the instruction exited the
fetch stage until the instruction retired (that is, approximately the number of
cycles that the instruction was inflight).
TRAP BIT NAMES AND THEIR MEANINGS
Exactly one trap bit is set in any given ProfileMe sample.
- notrap
- None of the below
- mispredict
- The instruction caused a JSR/RET/JMP/JMP_COROUTINE or conditional branch
mispredict
- replays
- The instruction caused a replay trap.
- unaligntrap
- The instruction caused an unaligned load or store.
- dtbmiss
- The instruction caused a DTB single miss.
- dtb2miss3
- The instruction caused a DTB double miss. (3-level page tables)
- dtb2miss4
- The instruction caused a DTB double miss. (4-level page tables)
- itbmiss
- The instruction caused an Instruction TLB miss. Most other bit and
counter values will those for the first instruction in the ITB miss handler.
- arithtrap
- The instruction caused an arithmetic trap.
- fpdisabledtrap
- The instruction caused a floating point disabled trap.
- MT_FPCRtrap
-
- dfaulttrap
- The instruction caused a Dstream fault because the virtual page is
inaccessible or because the virtual address is malformed, that is,
the virtual address is not properly
sign-extended.
- iacvtrap
- The instruction caused an istream access violation. Most other bit and
counter values will those for the first instruction in the IACV fault
handler.
- OPCDECtrap
- The instruction caused an opcdec trap.
- interrupt
- The instruction was pre-empted by an interrupt. Most other bit and
counter values will those for the first instruction in the PAL code that
handles interrupts.
- mchktrap
Note: trap can be used as a synonym for
\!notrap.
VIEWING PROFILEME DATA
Use dcpiprof(1)
to find out how many samples with particular bit values landed in each image
or procedure of a program. Use dcpilist(1)
to find out how many landed on a particular instruction.
The dcpi tools use the following syntax to name sets of samples:
sample_set ::= bit_value
| sample_set ^ bit_value
| any
bit_value ::= <Bit Name>
| ! <Bit Name>
| <Trap Bit Name>
| ! <Trap Bit Name>
/ may be used instead of ! to indicate negation (because
! must usually be escaped on the command line).
Example sample sets:
- retired^notrap
- names all samples where the retired bit and the notrap bit are both set,
that is, samples where the instruction retired and didn't cause a trap.
- taken^!mispredict
- names all samples where the taken bit is set and the mispredict bit is
clear.
Each bit_value is a constraint on the set of samples included in the set:
if the bit_value contains `!', the set includes only samples whose value for
the bit is 0. If the bit_value has no `!', the set includes only samples whose
value for the bit is 1. The sample set contains all samples that satisfy the
constraints. The special sample set any includes all samples.
A sample set may be used as an event-type to determine how many samples in
the set come from a particular image, procedure, or instruction.
To view the counter data, one appends ":CounterName" to the end of
a sample_set. This denotes the total of the counter's values over each sample
in the set.
EXAMPLE USAGE
- dcpiprof -sp \!notrap -pm \!notrap a.out
- Lists, in descending order for each procedure in a.out, the
number of samples where an instruction in the procedure caused some kind of
trap. (Note the use of `\' to prevent the shell from munging
`!'. Note also that `/' can be used on the command line
instead of `\!' to simplify typing.)
- dcpiprof -sp retired:retdelay -pm retired+trap^\!dtbmiss
- Lists, in descending order for all images, the total of the retire delay
count for samples of instructions that retired, along with the number of
samples for retired instructions and the number of samples in which the
instruction trapped and the trap was not a dtbmiss.
- dcpilist -pm retired main a.out
- lists, for each instruction in procedure main of
a.out, the number of samples where the instruction retired.
- dcpilist -pm \!notrap main a.out
- Lists, for each instruction, how many samples where the instruction
caused some kind of trap.
- dcpilist -pm \!notrap+retired main a.out
- Gets the data for the previous two examples with a single command.
dcpiprof also supports the use of + to display 2 or more
sample sets with one command.
- dcpiprof -pm retired:retdelay a.out
- Lists, by procedure, the total of the retire delay count for each sample
of an instruction that retired.
- dcpiprof -pm default+retired:retdelay::retired a.out
- Lists, by procedure, the default information plus a column showing the
average retire-delay per retired instruction in the procedure.
PROFILEME LIMITATIONS
Because retdelay is merely a lower bound, there is no way to account
for all cycles using only ProfileMe data. The retire delay always excludes
stall cycles prior to when the profiled instruction was fetched. This makes it
impossible to measure the length of icache miss stalls.
When a profiled instruction is killed early in the pipeline
(early_kill is set), the PC reported by the hardware may be wrong and
all counter values and bits other than valid, early_kill,
no_trap, and and map_stall may be wrong.
Note that the unreliable data is restricted to instructions that were
killed, and this data can be excluded by requiring \!early_kill.
The taken bit is UNDEFINED for instructions other than conditional
branches or for conditional branches that mispredict.
SEE ALSO
dcpi(1),
dcpi2ps(1),
dcpicat(1),
dcpictl(1),
dcpid(1), dcpidiff(1), dcpiformat(4), dcpilist(1),
dcpiprof(1),
dcpitopstalls(1),
dcpiwhatcg(1)
For more information, see the HP Digital Continuous Profiling Infrastructure
project home page
(http://h30097.www3.hp.com/dcpi).
Comments
Last modified: April 8, 2004
|