1 |
Random thoughts about native code generation, which will be compatible |
2 |
with the already existing (non-host-specific) dyntrans core. |
3 |
|
4 |
|
5 |
How to keep track of the number of times a basic block is executed? |
6 |
(Perhaps needed, since unnecessary native code generation may slow things |
7 |
down. Only the blocks that are really common need to be natively |
8 |
translated.) |
9 |
|
10 |
Perhaps having a small additional array per page is a solution? |
11 |
unsigned char count[NR_OF_IC_ENTRIES_PER_PAGE]; |
12 |
For a typical MIPS cpu, that would be 1024 bytes extra per page. |
13 |
The main loop could be changed to increase count, and if count goes beyond |
14 |
a certain threshhold, the block is natively translated. Hm. |
15 |
|
16 |
Or perhaps the overhead of implementing this counter check is more than it |
17 |
is worth? After all, most of the time will be spent executing (some of) |
18 |
the translated loops. |
19 |
|
20 |
------------------------------------- |
21 |
|
22 |
At most one [basic] block is ever translated at any given time. |
23 |
A small array can hold the INR entries, and a small memory area can |
24 |
hold a (double-linked list) of native instruction entries. |
25 |
|
26 |
Simple instructions: |
27 |
|
28 |
32-bit MIPS: |
29 |
andi $5,$5,0xff00 |
30 |
ori $5,$5,0x0011 |
31 |
|
32 |
Intermediate native representation: |
33 |
AND_REG32PTR_REG32PTR_IMM16 (offset to reg 5, offset to reg 5, 0xff00) |
34 |
OR_REG32PTR_REG32PTR_IMM16 (offset to reg 5, offset to reg 5, 0x0011) |
35 |
|
36 |
Non-peephole-optimized x86[_64] code: (esi = struct cpu *) |
37 |
mov eax, [esi + offset_to_source_reg] |
38 |
and eax, 0xff00 |
39 |
mov [esi + offset_to_destination_reg], eax (#1) |
40 |
mov eax, [esi + offset_to_source_reg] (#2) |
41 |
or eax, 0x0011 |
42 |
mov [esi + offset_to_destination_reg], eax |
43 |
|
44 |
Peephole-optimized x86[_64] code: |
45 |
(on the first pass, #2 is removed, since it loads back a value which was |
46 |
previously written. the value is already in eax!) |
47 |
(on the second pass, the store at #1 is removed, since another store |
48 |
later on overwrites the same register) |
49 |
mov eax, [esi + offset_to_source_reg] |
50 |
and eax, 0xff00 |
51 |
or eax, 0x0011 |
52 |
mov [esi + offset_to_destination_reg], eax |
53 |
|
54 |
Native code entry: |
55 |
(none on x86_64) |
56 |
|
57 |
Native code exit: |
58 |
ret[q] |
59 |
|
60 |
--------------------------- |
61 |
|
62 |
Update of nr-of-executed-instructions and the IC pointer: |
63 |
|
64 |
All possible return paths need to update the following: |
65 |
|
66 |
x) The nr-of-executed-instructions count (one less than the |
67 |
number of instructions in the translated block, since an |
68 |
implicit count of 1 is already included). |
69 |
x) The next_ic pointer, and also the cur_page if we have |
70 |
switched page. |
71 |
|
72 |
----------------------------- |
73 |
|
74 |
Stages during translation: |
75 |
|
76 |
Stage 1: |
77 |
Emulated ISA (e.g. MIPS) to INR instructions. |
78 |
Each emulated instruction may be turned into 0 or |
79 |
more INR instructions. |
80 |
This is done in e.g. src/cpus/cpu_mips_instr.c |
81 |
using semi-magic macros. |
82 |
The INR array is a fixed size small array, pointed |
83 |
to by the cpu struct. |
84 |
|
85 |
Stage 2: |
86 |
INR -> native operations (e.g. x86). |
87 |
This is done in src/native/native_x86.c. |
88 |
Things to think about are round-robin use of |
89 |
temporary registers. |
90 |
native_inr_to_native_ops() takes a cpu as input, |
91 |
translates the current INR entries into native |
92 |
pseudo-opcodes. |
93 |
|
94 |
Stage 3: |
95 |
Optimization, native ops -> native ops. |
96 |
This is done in src/native/native_x86_optim.c, |
97 |
and is an optional step. It should be possible |
98 |
to turn this step of, for debugging. |
99 |
If e.g. a value is in a register, and it is stored |
100 |
to memory, then the same memory position does not |
101 |
have to be read back; the value is already in a |
102 |
register. |
103 |
|
104 |
Stage 4: |
105 |
Code generation, native ops -> native machine code. |
106 |
Done in src/native/native_x86_gen.c. |
107 |
|
108 |
Stage 5: |
109 |
Patch _older_ code chunks so that they can branch |
110 |
directly to the new chunk, if possible. |
111 |
An optional step. |
112 |
|
113 |
Stage 6: |
114 |
Enter the newly generated native code chunk into |
115 |
the physpage' ic->f. |