Peephole optimization
=====================

Optimization opportunities:

Combining:

    OPXXX   a -> b<OPV_DISCARD>
    OPYYY   b -> c

    to

    OPZZZ   a -> c

Loop to SIMD/vector-reg:
(needs a "loop-start-hint" flag on the EBB -or- a loop-start-hint op)

    OPXXX   ... -> address
    OPXXX   ... -> counter  (divisible by 2^K)
    OPXXX   ... -> b
  start:    LOOP_START_HINT
    OPXXX   b, [address]
    OPADD   address, 1
    OPSUB   counter, 1 -> counter
    OPJNZ   counter, start

    to

    OPXXX   ... -> address
    OPXXX   ... -> counter/(2^K)
    OPXXX   widen(...) -> b
  start:    LOOP_START_HINT
    OPXXX   b, [address]
    OPADD   address, 2^K
    OPSUB   counter, 2^K -> counter
    OPJNZ   counter, start

    This will of course clobber (some) SIMD/vector regs.
    - OK if no SIMD/vector regs are used
    - if SIMD/vector regs are used outside of the loop EBB, then it/they need to be saved.
    - if SIMD/vector regs are used inside the loop EBB, then those cannot be used in the optimization.

Architecture selection / auto-detection
-------------------------------------
Things like SIMD and combined operations can require knowledge of the CPU ISA
version.

This could be specified in the multi-arch triple:

    i586-linux-gnu      x86, Pentium or later
    i686-linux-gnu      x86, Pentium Pro or later
    x86_64-linux-gnu    x86_64 v1
    ...what is the name of x86_64 v2 and v3 ...?

Auto-detection would be cool, but it would require runtime code.
And it should definitely be off by default, to avoid unintentional
dependencies on a newer CPU ISA version than expected.