1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
|
Peephole optimization
=====================
Optimization opportunities:
Combining:
OPXXX a -> b<OPV_DISCARD>
OPYYY b -> c
to
OPZZZ a -> c
Loop to SIMD/vector-reg:
(needs a "loop-start-hint" flag on the EBB -or- a loop-start-hint op)
OPXXX ... -> address
OPXXX ... -> counter (divisible by 2^K)
OPXXX ... -> b
start: LOOP_START_HINT
OPXXX b, [address]
OPADD address, 1
OPSUB counter, 1 -> counter
OPJNZ counter, start
to
OPXXX ... -> address
OPXXX ... -> counter/(2^K)
OPXXX widen(...) -> b
start: LOOP_START_HINT
OPXXX b, [address]
OPADD address, 2^K
OPSUB counter, 2^K -> counter
OPJNZ counter, start
This will of course clobber (some) SIMD/vector regs.
- OK if no SIMD/vector regs are used
- if SIMD/vector regs are used outside of the loop EBB, then it/they need to be saved.
- if SIMD/vector regs are used inside the loop EBB, then those cannot be used in the optimization.
Architecture selection / auto-detection
-------------------------------------
Things like SIMD and combined operations can require knowledge of the CPU ISA
version.
This could be specified in the multi-arch triple:
i586-linux-gnu x86, Pentium or later
i686-linux-gnu x86, Pentium Pro or later
x86_64-linux-gnu x86_64 v1
...what is the name of x86_64 v2 and v3 ...?
Auto-detection would be cool, but it would require runtime code.
And it should definitely be off by default, to avoid unintentional
dependencies on a newer CPU ISA version than expected.
|