refactor: per-CPU current_process via GS segment (SMP Phase 1)
Replace the global current_process variable with per-CPU access
through the GS-based percpu_data structure on x86:
- process.h: #define current_process percpu_current() on x86,
keeps extern fallback for non-x86
- scheduler.c: write sites use percpu_set_current()
- interrupts.S: ISR entry now reloads percpu GS by reading LAPIC ID
from MMIO (0xC0400020) and looking up the correct GS selector in
_percpu_gs_lut[256] — solves the chicken-and-egg problem of
needing GS to find the CPU but GS being clobbered by user TLS
- percpu.c: _percpu_gs_lut lookup table populated during percpu_init()
- hal_cpu_set_tls: no longer loads GS immediately (would clobber
kernel percpu GS); user TLS GS is restored on ISR exit via pop
This is the foundation for running the scheduler on AP cores.