This gem presents a small, yet quite effective method of measuring the cycles needed to execute a piece of code. This gem utilises the RDTSC
instruction.
Beginning with the Pentium processor, it is possible to access the time-stamp counter. The time-stamp counter keeps an accurate count of every cycle executed. The time-stamp counter is a 64-bit MSR (model specific register) that is incremented every clock cycle. On reset, the time-stamp counter is set to zero.
Accessing the counter is done by the RDTSC
instruction (read time-stamp counter). The instruction returns the low cycle count in EAX
and high cycle count in EDX
.
The RDTSC
returns the number of cycles executed, not the time taken to execute them. To convert cycles to time use this formula (frequency given in Hz):
time = cycles / frequencySince the counter may overflow, especially on faster processors, the package uses the full 64-bit count.
CPUID
instruction before the RDTSC
instruction. There is however a problem with this. The CPUID
instruction itself takes some time to execute. The soultion here is to measure the exection time of CPUID
and subtract if from the cycle count returned by RDTSC
.CPUID
is that it may longer time to execute the first couple of times it is called. The best thing to do is call the instruction three times and measure the third call. This is utilised in the code below.How to use the package
The package must be included into your code segment. Please note the data last in the package, that must be placed in a data segment.
Then, if your CPU is a Pentium Pro or a Pentium II, PProPII
must be defined (this is because the package must then use serializing to prevent out-of-order execution of RDTSC
).
The package must be initialized by:
call monitor_initThe you call call the macros like this:
; ; some other piece of code time_start ; the code you may want to measure ; . ; . ; . time_stop mov [mycountlow],eax mov [mycounthigh],edxThis was a simple example of the package. The above example does compensate for cache effects (code/data not beeing in cache). If cache effects is not wanted you must "pretouch&qout; the data, simply by just reading it. Then just call the package several times to take care of the code cache:
; ; some other piece of code mov ecx,4 ; execute test code 4 times meassureloop: push ecx time_start ; the code you may want to measure ; . ; . ; . time_stop pop ecx mov [mycount_low+ecx*4],eax mov [mycount_high+ecx*4],edx dec ecx jnz meassureloopNote: The mycount variables must (in this example) be arrays of doublewords with 4 indexes.
Performance monitoring package
It is supposed to run under plain DOS (no EMM and similar) since the may interrupt the process. Also the RDTSC
is a priveleged instruction and does not run in CPL 3. This is not really a problem since a real monitoring session should be performed in an enviroment where the program isn't interrupted since that would mess up the cycle count. A wise thing to do would also be inserting a CLI
right before the time_start
instruction to prevent all types of interrupts.
Here is the actual monitoring package:;
; Performance monitoring package
;
; define PProPII if your CPU is a Pentium Pro or a Pentium II
;
; implements:
;
; monitor_init
; initializes the package
;
; time_start
; start cycle count here
;
; time_stop
; stop counting here
;
; note:
; the package can not do nested measurements, since the macro
; returns all cycles in the same variable
;
;
; define cpuid and rdtsc instructions via macros
; this is not necessary is your assembler supports them
;
MACRO cpuid
db 0fh,0a2h
ENDM
MACRO rdtsc
db 0fh,031h
ENDM
;
; monitor_init:
;
; input:
; nothing
;
; output:
; cpuid_cycle = initialized to exection time of cpuid
;
; destroys:
; nothing
;
monitor_init:
IFDEF PProPII
pushfd
pushad
mov ecx,3
getcpuidtime:
cpuid
rdtsc
mov [cycle],eax
cpuid
rdtsc
sub eax,[cycle]
mov [cpuid_cycle],eax
dec ecx
jnz getcpuidtime
popad
popf
ENDIF
ret
;
; time_start - start timing point here
;
; input:
; none
;
; output:
; time_cycles initialized
;
; destroys:
; eax, ebx, ecx, edx
; eflags
;
MACRO time_start
IFDEF PProPII
cpuid
ENDIF
rdtsc
mov [time_cycles],eax
mov [time_cycles+4],edx
ENDM
;
; time_stop - stop timing point here
;
; input:
; none
;
; output:
; eax = low cycle count
; edx = high cycle count
;
; destroys:
; eax, ebx, ecx, edx
; eflags
;
MACRO time_stop
IFDEF PProPII
cpuid
ENDIF
rdtsc
sub eax,[time_cycles]
sbb edx,[time_cycles+4]
IFDEF ProPII
sub eax,[cpuid_cycle]
sbb edx,0
ENDIF
ENDM
;
; place the following data in your data segment
;
time_cycles dq ?
cycle dd ?
cpuid_cycle dd ?