Pervasive Debugging 
===================

Alex Ho (alex.ho at cl.cam.ac.uk)

Introduction
------------

The pervasive debugging project is leveraging Xen to 
debug distributed systems.  We have added a gdb stub
to Xen to allow for remote debugging of both Xen and
guest operating systems.  More information about the
pervasive debugger is available at: http://www.cl.cam.ac.uk/netos/pdb


Implementation
--------------

The gdb stub communicates with gdb running over a serial line.
The main entry point is pdb_handle_exception() which is invoked
from:    pdb_key_pressed()    ('D' on the console)
         do_int3_exception()  (interrupt 3: breakpoint exception)
         do_debug()           (interrupt 1: debug exception)

This accepts characters from the serial port and passes gdb
commands to pdb_process_command() which implements the gdb stub
interface.  This file draws heavily from the kgdb project and
sample gdbstub provided with gdb.

The stub can examine registers, single step and continue, and
read and write memory (in Xen, a domain, or a Linux process'
address space).  The debugger does not currently trace the 
current process, so all bets are off if context switch occurs
in the domain.


Setup
-----

 +-------+ telnet +-----------+ serial +-------+ 
 |  GDB  |--------|  nsplitd  |--------|  Xen  |
 +-------+        +-----------+        +-------+ 

To run pdb, Xen must be appropriately configured and 
a suitable serial interface attached to the target machine.
GDB and nsplitd can run on the same machine.

Xen Configuration

  Add the "pdb=xxx" option to your Xen boot command line
  where xxx is one of the following values:
     com1    gdb stub should communicate on com1
     com1H   gdb stub should communicate on com1 (with high bit set)
     com2    gdb stub should communicate on com2
     com2H   gdb stub should communicate on com2 (with high bit set)

  Symbolic debugging infomration is quite helpful too:
  xeno.bk/xen/arch/x86/Rules.mk
    add -g to CFLAGS to compile Xen with symbols
  xeno.bk/linux-2.4.27-xen-sparse/arch/xen/Makefile
    add -g to CFLAGS to compile Linux with symbols

  You may also want to consider dedicating a register to the
  frame pointer (disable the -fomit-frame-pointer compile flag).

  When booting Xen and domain 0, look for the console text 
  "pdb: pervasive debugger" just before DOM0 starts up.

Serial Port Configuration

  pdb expects to communicate with gdb using the serial port.  Since 
  this port is often shared with the machine's console output, pdb can
  discriminate its communication by setting the high bit of each byte.

  A new tool has been added to the source tree which splits 
  the serial output from a remote machine into two streams: 
  one stream (without the high bit) is the console and 
  one stream (with the high bit stripped) is the pdb communication.

  See:  xeno.bk/tools/misc/nsplitd

  nsplitd configuration
  ---------------------
  hostname$ more /etc/xinetd.d/nsplit
  service nsplit1
  {
        socket_type             = stream
        protocol                = tcp
        wait                    = no
        user                    = wanda
        server                  = /usr/sbin/in.nsplitd
        server_args             = serial.cl.cam.ac.uk:wcons00
        disable                 = no
        only_from               = 128.232.0.0/17 127.0.0.1
  }

  hostname$ egrep 'wcons00|nsplit1' /etc/services
  wcons00         9600/tcp        # Wanda remote console
  nsplit1         12010/tcp       # Nemesis console splitter ports.

  Note: nsplitd was originally written for the Nemesis project
  at Cambridge.

  After nsplitd accepts a connection on <port> (12010 in the above
  example), it starts listening on port <port + 1>.  Characters sent 
  to the <port + 1> will have the high bit set and vice versa for 
  characters received.

  You can connect to the nsplitd using
  'tools/misc/xencons <host> <port>'

GDB 6.0
  pdb has been tested with gdb 6.0.  It should also work with
  earlier versions.


Usage
-----

1. Boot Xen and Linux
2. Interrupt Xen by pressing 'D' at the console
   You should see the console message: 
   (XEN) pdb_handle_exception [0x88][0x101000:0xfc5e72ac]
   At this point Xen is frozen and the pdb stub is waiting for gdb commands 
   on the serial line.
3. Attach with gdb
   (gdb) file xeno.bk/xen/xen
   Reading symbols from xeno.bk/xen/xen...done.
   (gdb) target remote <hostname>:<port + 1>              /* contact nsplitd */
   Remote debugging using serial.srg:12131
   continue_cpu_idle_loop () at current.h:10
   warning: shared library handler failed to enable breakpoint
   (gdb) break __enter_scheduler
   Breakpoint 1 at 0xfc510a94: file schedule.c, line 330.
   (gdb) cont
   Continuing.

   Program received signal SIGTRAP, Trace/breakpoint trap.
   __enter_scheduler () at schedule.c:330
   (gdb) step
   (gdb) step
   (gdb) print next            /* the variable prev has been optimized away! */
   $1 = (struct task_struct *) 0x0
   (gdb) delete
   Delete all breakpoints? (y or n) y
4. You can add additional symbols to gdb
   (gdb) add-sym xeno.bk/linux-2.4.27-xen0/vmlinux
   add symbol table from file "xeno.bk/linux-2.4.27-xen0/vmlinux" at
   (y or n) y
   Reading symbols from xeno.bk/linux-2.4.27-xen0/vmlinux...done.
   (gdb) x/s cpu_vendor_names[0]
   0xc01530d2 <cpdext+62898>:	 "Intel"
   (gdb) break free_uid
   Breakpoint 2 at 0xc0012250
   (gdb) cont
   Continuing.                                  /* run a command in domain 0 */

   Program received signal SIGTRAP, Trace/breakpoint trap.
   free_uid (up=0xbffff738) at user.c:77

   (gdb) print *up
   $2 = {__count = {counter = 0}, processes = {counter = 135190120}, files = {
       counter = 0}, next = 0x395, pprev = 0xbffff878, uid = 134701041}
   (gdb) finish
   Run till exit from #0  free_uid (up=0xbffff738) at user.c:77

   Program received signal SIGTRAP, Trace/breakpoint trap.
   release_task (p=0xc2da0000) at exit.c:51
   (gdb) print *p
   $3 = {state = 4, flags = 4, sigpending = 0, addr_limit = {seg = 3221225472},
     exec_domain = 0xc016a040, need_resched = 0, ptrace = 0, lock_depth = -1, 
     counter = 1, nice = 0, policy = 0, mm = 0x0, processor = 0, 
     cpus_runnable = 1, cpus_allowed = 4294967295, run_list = {next = 0x0, 
       prev = 0x0}, sleep_time = 18995, next_task = 0xc017c000, 
     prev_task = 0xc2f94000, active_mm = 0x0, local_pages = {next = 0xc2da0054,
       prev = 0xc2da0054}, allocation_order = 0, nr_local_pages = 0, 
     ...
5. To resume Xen, enter the "continue" command to gdb.
   This sends the packet $c#63 along the serial channel.

   (gdb) cont
   Continuing.

Debugging Multiple Domains & Processes
--------------------------------------

pdb supports debugging multiple domains & processes.  You can switch
between different domains and processes within domains and examine
variables in each.

The pdb context identifies the current debug target.  It is stored
in the xen variable pdb_ctx and defaults to xen.

   target    pdb_ctx.domain    pdb_ctx.process
   ------    --------------    ---------------
    xen           -1                 -1
  guest os      0,1,2,...            -1
   process      0,1,2,...          0,1,2,...

Unfortunately, gdb doesn't understand debugging multiple process
simultaneously (we're working on it), so at present you are limited 
to just one set of symbols for symbolic debugging.  When debugging
processes, pdb currently supports just Linux 2.4.

   define setup
      file xeno-clone/xeno.bk/xen/xen
      add-sym xeno-clone/xeno.bk/linux-2.4.27-xen0/vmlinux
      add-sym ~ach61/a.out
   end


1. Connect with gdb as before.  A couple of Linux-specific 
   symbols need to be defined.

   (gdb) target remote <hostname>:<port + 1>              /* contact nsplitd */
   Remote debugging using serial.srg:12131
   continue_cpu_idle_loop () at current.h:10
   warning: shared library handler failed to enable breakpoint
   (gdb) set pdb_pidhash_addr = &pidhash
   (gdb) set pdb_init_task_union_addr = &init_task_union

2. The pdb context defaults to Xen and we can read Xen's memory.
   An attempt to access domain 0 memory fails.
  
   (gdb) print pdb_ctx
   $1 = {valid = 0, domain = -1, process = -1, ptbr = 1052672}
   (gdb) print hexchars
   $2 = "0123456789abcdef"
   (gdb) print cpu_vendor_names
   Cannot access memory at address 0xc0191f80

3. Now we change to domain 0.  In addition to changing pdb_ctx.domain,
   we need to change pdb_ctx.valid to signal pdb of the change.
   It is now possible to examine Xen and Linux memory.

   (gdb) set pdb_ctx.domain=0
   (gdb) set pdb_ctx.valid=1
   (gdb) print hexchars
   $3 = "0123456789abcdef"
   (gdb) print cpu_vendor_names
   $4 = {0xc0158b46 "Intel", 0xc0158c37 "Cyrix", 0xc0158b55 "AMD", 
     0xc0158c3d "UMC", 0xc0158c41 "NexGen", 0xc0158c48 "Centaur", 
     0xc0158c50 "Rise", 0xc0158c55 "Transmeta"}

4. Now change to a process within domain 0.  Again, we need to
   change pdb_ctx.valid in addition to pdb_ctx.process.

   (gdb) set pdb_ctx.process=962
   (gdb) set pdb_ctx.valid =1
   (gdb) print pdb_ctx
   $1 = {valid = 0, domain = 0, process = 962, ptbr = 52998144}
   (gdb) print aho_a
   $2 = 20

5. Now we can read the same variable from another process running
   the same executable in another domain.

   (gdb) set pdb_ctx.domain=1
   (gdb) set pdb_ctx.process=1210
   (gdb) set pdb_ctx.valid=1
   (gdb) print pdb_ctx
   $3 = {valid = 0, domain = 1, process = 1210, ptbr = 70574080}
   (gdb) print aho_a
   $4 = 27


Some Helpful .gdbinit Commands
------------------------------

define setup
  file    .../install/boot/xen-syms
  add-sym .../install/boot/vmlinux-syms-2.4.27-xen0
  add-sym /homes/aho/a.out
end
document setup
  load symbols for xen, xenolinux (dom 0), and "a.out"
end

define setup-linux
  set pdb_pidhash_addr = &pidhash
  set pdb_init_task_union_addr = &init_task_union

  set task_struct_mm_offset           = (void *)&(init_task_union.task.mm) - (void *)&(init_task_union.task)
  set task_struct_next_task_offset    = (void *)&(init_task_union.task.next_task) - (void *)&(init_task_union.task)
  set task_struct_pid_offset          = (void *)&(init_task_union.task.pid) - (void *)&(init_task_union.task)
  set task_struct_pidhash_next_offset = (void *)&(init_task_union.task.pidhash_next) - (void *)&(init_task_union.task)
  set task_struct_comm_offset         = (void *)&(init_task_union.task.comm) - (void *)&(init_task_union.task)
  set task_struct_comm_length         = sizeof (init_task_union.task.comm)

  set mm_struct_pgd_offset            = sizeof (struct vm_area_struct *) * 2 + sizeof (rb_root_t)
end
document setup-linux
  define various xenolinux specific offsets and sizes in pdb
end




Changes
-------

04.07.15 aho .gdbinit
04.02.05 aho creation
04.03.31 aho add description on debugging multiple domains
