MIT OpenCourseWare
  • OCW home
  • Course List
  • about OCW
  • Help
  • Feedback
  • Support MIT OCW

Lecture 2: PC Hardware and x86 Programming

Outline
  • PC architecture
  • x86 instruction set
  • gcc calling conventions

PC architecture

  • CPU runs instructions:

    for(;;){
    run next instruction
    }
  • Needs work space: registers
    • AX, CX, DX, BX (16- or 32-bit depending on mode)
    • very fast, very few
  • More work space: memory
    • CPU sends out address on address lines (wires, one bit per wire)
    • Data comes back on data lines after a fashion
    • or data is written to data lines
  • Add more registers: pointers into memory
    • SP - stack pointer
    • BP - frame base pointer
    • SI - source index
    • DI - destination index
  • Only a 16-bit machine, but >64kB memory: segment registers
    • CS - code segment
    • DS - data segment
    • SS - stack segment
    • ES, FS, GS - extra segments
    • seg:off means physical address seg*16+off
  • Instructions are in memory too!
    • IP - instruction pointer (PC on PDP-11, everything else)
    • increment after running each instruction
    • can be modified by CALL, RET, JMP, conditional jumps
  • Want conditional jumps
    • FLAGS - various condition codes
      • whether last arithmetic operation overflowed
      • ... was zero
      • ... was positive
      • ... was negative
      • ... etc.
      • whether interrupts are enabled
      • direction of data copy instructions
    • JG, JZ, JNZ, ...
  • Still not interesting - need I/O to interact with outside world
    • same as memory but set I/O signal
    • only 1024 I/O addresses
    • 
      
             enum {
      
      
                  Data = 0x378+0
      
      
                  Status = 0x378+1,
      
      
                      Notbusy = 0x80,
      
      
                  Ctl = 0x378+2,
      
      
                      Strobe = 0x01,
      
      
              };
      
      
              lptputc(c)
      
      
              {
      
      
                  while((inb(Status)&Notbusy) == 0)
      
      
                      ;
      
      
                  outb(Data, c)
      
      
                  outb(Ctl, Strobe)
      
      
                  outb(Ctl, 0)
      
      
              }
      
      
         
      
      
      
    • Only 1024 I/O addresses - MMIO
      • use normal memory addresses
      • no need for special instructions
      • "magic" memory
      • system controller routes to appropriate device


x86 Instruction Set


  • Two-operand instruction set
    • Intel: op dst, src
    • AT&T (gcc/gas): op src, dst
      • uses b, w, l suffix on instructions to specify size of operands
    • Operands are registers, constant, memory via register, memory via constant
      edx = eax;
      edx = 0x123
      at&t                                "C"
      movl %eax, %edx
      movl $0x123, %edx
      movl (%ebx), %edx    edx = mem[ebx];
      movl 4(%ebx), %edx  edx = mem[ebx+4];
      movl 0x123, %edx      edx = mem[0x123];
  • Instruction classes
    • data movement: MOV, PUSH, POP, ...
    • arithmetic: TEST, SHL, ADD, AND, ...
    • i/o: IN, OUT, ...
    • control: JMP, JZ, JNZ, CALL, RET
    • string: REP MOVSB, ...
    • system: IRET, INT
  • Intel architecture manual Volume 2 is the reference

gcc x86 calling conventions

  • x86 dictates that stack grows down:
    • pushl %eax
      
      
              subl $4, %esp
      
      
              movl %eax, (%esp)
      
      
      
    • 
      
      popl %eax

      movl (%esp), %eax
      addl $4, %esp
    • 
      
      call $0x12345
      
      
           
      
      
              pushl %eip
      
      
              movl $0x12345, %eip
      
      
      
    • 
      
      ret
      popl %eip
  • 
    
    Gcc dictates the rest. Contract between caller and callee on x86:
    
    
    
    • 
      
      after call instruction:
      
      
      
      • %eip points at first instruction of function
      • %esp+4 points at arguments
      • %esp points at return address
    • after ret instruction:
      • %eip contains return address
      • %esp points at arguments
      • caller may have trashed arguments
      • %eax contains return value
      • %ecx, %edx may be trashed
      • %ebp, %ebx, %esi, %edi must contain contents from time of call
      • %ecx, %edx are "caller save"
      • %ebp, %ebx, %esi, %edi are "callee save"
  • 
    
    Can do anything that doesn't violate contract. By convention, gcc does more:
    
    
    
    • 
      
      each function has a stack frame marked by %ebp, %esp

      +----------------------+
      | arg 2 |
      +----------------------+
      | arg 1 |
      +----------------------+
      | ret %eip |
      +============+
      %ebp-> | saved %ebp |
      +----------------------+
      | |
      | |
      | |
      | |
      | |
      %esp-> | |
      +---------------------+
    • 
      
      %esp can move to make stack frame bigger, smaller
      
      
      
    • 
      
      %ebp points at saved %ebp from previous function, chain to walk stack
      
      
      
    • 
      
      function prologue:
      pushl %ebp movl %esp, %ebp
    • 
      
      function epilogue:
      
      
      movl %ebp, %esp
      
      
      popl %ebp
      
      
             
      
      
      or
      leave
  • 
    
    Big example:
    
    
    
    • C code

      int main(void) { return f(8)+1; }
      int f(int x) { return g(x); }
      int g(int x) { return x+3; }
    • assembler

      
      
           _main:
      
      
                          prologue
      
      
                  pushl %ebp
      
      
                  movl %esp, %ebp
      
      
                          body
      
      
                  pushl $8
      
      
                  call _f
      
      
                  addl $1, %eax
      
      
                          epilogue
      
      
                  movl %ebp, %esp
      
      
                  popl %ebp
      
      
                  ret
      
      
              _f:
      
      
                          prologue
      
      
                  pushl %ebp
      
      
                  movl %esp, %ebp
      
      
                          body
      
      
                  pushl 8(%esp)
      
      
                  call _g
      
      
                          epilogue
      
      
                  movl %ebp, %esp
      
      
                  popl %ebp
      
      
                  ret
      
      
      
      
      
              _g:
      
      
                          prologue
      
      
                  pushl %ebp
      
      
                  movl %esp, %ebp
      
      
                          save %ebx
      
      
                  pushl %ebx
      
      
                          body
      
      
                  movl 8(%ebp), %ebx
      
      
                  addl $3, %ebx
      
      
                  movl %ebx, %eax
      
      
                          restore %ebx
      
      
                  popl %ebx
      
      
                          epilogue
      
      
                  movl %ebp, %esp
      
      
                  popl %ebp
      
      
                  ret
      
      
      
  • 
    
    Super-small _g:
    
    
    
    
    
          _g:
    
    
                movl 4(%esp), %eax
    
    
                addl $3, %eax
    
    
                ret