Skip to content

[feature][riscv] handle target address calculation in llvm-objdump disassembly for riscv #108469

@PiJoules

Description

@PiJoules

(Copied from fxbug.dev/42083016)

llvm-objdump for riscv has some deficiencies compared to binutils, mostly concerning multi instruction sequences in riscv to calculate an address or a constant.

Here is a simple test assembly file to generate what seems to be all of the obvious test cases and the results below

.text
test:
  la a0, gdata
  lla a0, gdata
  lla a0, gdata
  lw a0, gdata
  lla a0, ldata

  call func
  tail func

  li a0, 0x12345678
  li a0, 0x1234567890abcdef
  li a0, 0x10000
  li a0, 0xfffff

  .skip 0x100000
func:
  ret

ldata:
  .int 0

.data
gdata:
  .int 0

Compiled with

clang -target fuchsia-elf-riscv64 test.S -nostdlib -o test

disassembled with

llvm-objdump -d test

and

riscv64-elf-objdump -d test

The llvm-objdump output is

0000000000001000 <test>:
    1000: 17 15 10 00   auipc   a0, 257
    1004: 03 35 85 00   ld      a0, 8(a0)
    1008: 17 25 10 00   auipc   a0, 258
    100c: 13 05 85 ff   addi    a0, a0, -8
    1010: 17 25 10 00   auipc   a0, 258
    1014: 13 05 05 ff   addi    a0, a0, -16
    1018: 17 25 10 00   auipc   a0, 258
    101c: 03 25 85 fe   lw      a0, -24(a0)
    1020: 17 05 10 00   auipc   a0, 256
    1024: 13 05 45 04   addi    a0, a0, 68
    1028: 97 00 10 00   auipc   ra, 256
    102c: e7 80 a0 03   jalr    58(ra)
    1030: 17 03 10 00   auipc   t1, 256
    1034: 67 00 23 03   jr      50(t1)
    1038: 37 55 34 12   lui     a0, 74565
    103c: 1b 05 85 67   addiw   a0, a0, 1656
    1040: 37 75 24 00   lui     a0, 583
    1044: 1b 05 d5 8a   addiw   a0, a0, -1875
    1048: 3e 05         slli    a0, a0, 15
    104a: 13 05 15 89   addi    a0, a0, -1903
    104e: 32 05         slli    a0, a0, 12
    1050: 13 05 d5 ab   addi    a0, a0, -1347
    1054: 32 05         slli    a0, a0, 12
    1056: 13 05 f5 de   addi    a0, a0, -529
    105a: 41 65         lui     a0, 16
    105c: 37 05 10 00   lui     a0, 256
    1060: 7d 35         addiw   a0, a0, -1
                ...

0000000000101062 <func>:
  101062: 82 80         ret

0000000000101064 <ldata>:
  101064: 00 00         unimp
  101066: 00 00         unimp

whereas binutils is

0000000000001000 <test>:
    1000:       00101517                auipc   a0,0x101
    1004:       00853503                ld      a0,8(a0) # 102008 <ldata+0xfa4>
    1008:       00102517                auipc   a0,0x102
    100c:       ff850513                add     a0,a0,-8 # 103000 <gdata>
    1010:       00102517                auipc   a0,0x102
    1014:       ff050513                add     a0,a0,-16 # 103000 <gdata>
    1018:       00102517                auipc   a0,0x102
    101c:       fe852503                lw      a0,-24(a0) # 103000 <gdata>
    1020:       00100517                auipc   a0,0x100
    1024:       04450513                add     a0,a0,68 # 101064 <ldata>
    1028:       00100097                auipc   ra,0x100
    102c:       03a080e7                jalr    58(ra) # 101062 <func>
    1030:       00100317                auipc   t1,0x100
    1034:       03230067                jr      50(t1) # 101062 <func>
    1038:       12345537                lui     a0,0x12345
    103c:       6785051b                addw    a0,a0,1656 # 12345678 <gdata+0x12242678>
    1040:       00247537                lui     a0,0x247
    1044:       8ad5051b                addw    a0,a0,-1875 # 2468ad <gdata+0x1438ad>
    1048:       053e                    sll     a0,a0,0xf
    104a:       89150513                add     a0,a0,-1903
    104e:       0532                    sll     a0,a0,0xc
    1050:       abd50513                add     a0,a0,-1347
    1054:       0532                    sll     a0,a0,0xc
    1056:       def50513                add     a0,a0,-529
    105a:       6541                    lui     a0,0x10
    105c:       00100537                lui     a0,0x100
    1060:       357d                    addw    a0,a0,-1 # fffff <test+0xfefff>
        ...

0000000000101062 <func>:
  101062:       8082                    ret

0000000000101064 <ldata>:
  101064:       0000                    unimp
        ...

The obvious deltas are the calculation of the target address for
a) auipc + an instruction that provides the signed 12 bit address (ld, lw, add, jalr, jr)
b) lui + instructions that provide additional immediate data. Note that for large constants a variety of instruction sequences may be emitted and binutils only seems to handle lui + addw. Dealing with more elaborate sequences may be nice.

Looking through disassembly of a large C++ binary generated by clang, it seems that in all cases the auipc/lui sequence is always back to back, and not interleaved with other instructions, so probably just detecting a back to back instruction sequence is all that's needed and seems to be all that binutils does.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions