--- /dev/null
+# AdrOS — Supplementary Material Analysis & POSIX Gap Report
+
+This document compares the **supplementary-material** reference code and suggestions
+(from the AI monolog in `readme.txt` plus the `.c.txt`/`.S.txt` example files) with
+the **current AdrOS implementation**, and assesses how close AdrOS is to being a
+Unix-like, POSIX-compatible operating system.
+
+---
+
+## Part 1 — Subsystem-by-Subsystem Comparison
+
+### 1.1 Physical Memory Manager (PMM)
+
+| Aspect | Supplementary Suggestion | AdrOS Current State | Gap |
+|--------|--------------------------|---------------------|-----|
+| Bitmap allocator | ✅ Bitmap-based | ✅ Bitmap-based (`src/mm/pmm.c`) | None |
+| Multiboot memory map parsing | ✅ Parse MMAP entries | ✅ Full Multiboot2 MMAP parsing, clamping, fallback | None |
+| Kernel/module protection | ✅ Reserve kernel + initrd | ✅ Protects kernel (`_start`–`_end`), modules, low 1MB | None |
+| Frame reference counting | ✅ `uint16_t frame_ref_count[]` for CoW | ❌ Not implemented | **Critical for CoW fork** |
+| Contiguous block allocation | ✅ `pmm_alloc_blocks(count)` for DMA | ❌ Only single-frame `pmm_alloc_page()` | Needed for DMA drivers |
+| Atomic ref operations | ✅ `__sync_fetch_and_add` | ❌ N/A (no refcount) | Future |
+| Spinlock protection | ✅ `spinlock_acquire(&pmm_lock)` | ❌ PMM has no lock (single-core safe only) | Needed for SMP |
+
+**Summary:** AdrOS PMM is solid for single-core use. Missing ref-counting (blocks CoW) and contiguous allocation (blocks DMA).
+
+---
+
+### 1.2 Virtual Memory Manager (VMM)
+
+| Aspect | Supplementary Suggestion | AdrOS Current State | Gap |
+|--------|--------------------------|---------------------|-----|
+| Higher-half kernel | ✅ 0xC0000000 | ✅ Identical | None |
+| Recursive page directory | Mentioned but not detailed | ✅ PDE[1023] self-map, `x86_pd_recursive()` | AdrOS is ahead |
+| Per-process address spaces | ✅ Clone kernel PD | ✅ `vmm_as_create_kernel_clone()`, `vmm_as_clone_user()` | None |
+| W^X logical policy | ✅ `vmm_apply_wx_policy()` rejects RWX | ✅ ELF loader maps `.text` as RO after load via `vmm_protect_range()` | Partial — no policy function, but effect achieved |
+| W^X hardware (NX bit) | ✅ PAE + NX via EFER MSR | ❌ 32-bit paging, no PAE, no NX | Long-term |
+| CPUID feature detection | ✅ `cpu_get_features()` for PAE/NX | ❌ Not implemented | Long-term |
+| `vmm_find_free_area()` | ✅ Scan user VA space for holes | ❌ Not implemented | Needed for `mmap` |
+| `vmm_map_dma_buffer()` | ✅ Map phys into user VA | ❌ Not implemented | Needed for zero-copy I/O |
+| TLB flush | ✅ `invlpg` + full flush | ✅ `invlpg()` per page | None |
+| Spinlock on VMM ops | ✅ `vmm_kernel_lock` | ❌ No lock | Needed for SMP |
+
+**Summary:** AdrOS VMM is functional and well-designed (recursive mapping is elegant). Missing hardware NX (requires PAE migration) and free-area search for `mmap`.
+
+---
+
+### 1.3 Kernel Heap
+
+| Aspect | Supplementary Suggestion | AdrOS Current State | Gap |
+|--------|--------------------------|---------------------|-----|
+| Doubly-linked free list | Mentioned | ✅ `heap.c` with `HEAP_MAGIC` validation | None |
+| Coalescing | Mentioned | ✅ Forward + backward coalesce (fixed in previous session) | None |
+| Spinlock | ✅ Required | ✅ `heap_lock` spinlock present | None |
+| Slab allocator | ✅ `slab_cache_t` for fixed-size objects | ❌ Not implemented | Medium priority |
+
+**Summary:** Heap works correctly. Slab allocator would improve performance for frequent small allocations (process structs, file descriptors).
+
+---
+
+### 1.4 Process Scheduler
+
+| Aspect | Supplementary Suggestion | AdrOS Current State | Gap |
+|--------|--------------------------|---------------------|-----|
+| Process states | ✅ READY/RUNNING/SLEEPING/ZOMBIE | ✅ READY/RUNNING/ZOMBIE/BLOCKED/SLEEPING | AdrOS has more states |
+| Round-robin scheduling | Baseline | ✅ Implemented in `scheduler.c` | None |
+| O(1) scheduler (bitmap + active/expired) | ✅ Full implementation | ❌ Simple linked-list traversal | Enhancement |
+| Priority queues (MLFQ) | ✅ 32 priority levels | ❌ No priority levels | Enhancement |
+| Unix decay-based priority | ✅ `p_cpu` decay + `nice` | ❌ Not implemented | Enhancement |
+| Per-CPU runqueues | ✅ `cpu_runqueue_t` per CPU | ❌ Single global queue | Needed for SMP |
+| Sleep/wakeup (wait queues) | ✅ `sleep(chan, lock)` / `wakeup(chan)` | ✅ Process blocking via `PROCESS_BLOCKED` state + manual wake | Partial — no generic wait queue abstraction |
+| Context switch (assembly) | ✅ Save/restore callee-saved + CR3 | ✅ `context_switch.S` saves/restores regs + CR3 | None |
+| `fork()` | ✅ Slab + CoW + enqueue | ✅ `process_fork_create()` — full copy (no CoW) | CoW missing |
+| `execve()` | ✅ Load ELF, reset stack | ✅ `syscall_execve_impl()` — loads ELF, handles argv/envp | None |
+| Spinlock protection | ✅ `sched_lock` | ✅ `sched_lock` present | None |
+
+**Summary:** AdrOS scheduler is functional with all essential operations. The supplementary material suggests O(1)/MLFQ upgrades which are performance enhancements, not correctness issues.
+
+---
+
+### 1.5 Signals
+
+| Aspect | Supplementary Suggestion | AdrOS Current State | Gap |
+|--------|--------------------------|---------------------|-----|
+| Signal bitmask (pending/blocked) | ✅ `uint32_t pending_signals` | ✅ `sig_pending_mask` + `sig_blocked_mask` | None |
+| `sigaction` | ✅ Handler array | ✅ `sigactions[PROCESS_MAX_SIG]` | None |
+| Signal trampoline | ✅ Build stack frame, redirect EIP | ✅ Full trampoline in `deliver_signals_to_usermode()` | None |
+| `sigreturn` | ✅ Restore saved context | ✅ `syscall_sigreturn_impl()` with `SIGFRAME_MAGIC` | None |
+| `SA_SIGINFO` | Mentioned | ✅ Supported (siginfo_t + ucontext_t on stack) | None |
+| Signal restorer (userspace) | ✅ `sigrestorer.S` | ✅ Kernel injects trampoline code bytes on user stack | AdrOS approach is self-contained |
+
+**Summary:** AdrOS signal implementation is **complete and robust**. This is one of the strongest subsystems — ahead of what the supplementary material suggests.
+
+---
+
+### 1.6 Virtual File System (VFS)
+
+| Aspect | Supplementary Suggestion | AdrOS Current State | Gap |
+|--------|--------------------------|---------------------|-----|
+| Mount table | ✅ Linked list of mount points | ✅ Up to 8 mounts, longest-prefix matching | None |
+| `vfs_lookup` path resolution | ✅ Find mount + delegate to driver | ✅ Full path resolution with mount traversal | None |
+| `fs_node_t` with ops | ✅ `vfs_ops_t` function pointers | ✅ `read`/`write`/`open`/`close`/`finddir`/`readdir` | None |
+| File descriptor table | ✅ Per-process `fd_table[16]` | ✅ Per-process `files[PROCESS_MAX_FILES]` with refcount | None |
+| File cursor (offset) | ✅ `cursor` field | ✅ `offset` in `struct file` | None |
+| USTAR InitRD parser | ✅ Full implementation | ❌ Custom binary format (`mkinitrd`) | Different approach, both work |
+| LZ4 decompression | ✅ Decompress initrd.tar.lz4 | ❌ Not implemented | Enhancement |
+| `pivot_root` | ✅ `sys_pivot_root()` | ❌ Not implemented | Needed for real init flow |
+| Multiple FS types | ✅ USTAR + FAT | ✅ tmpfs + devfs + overlayfs + diskfs + persistfs | **AdrOS is ahead** |
+| `readdir` generic | Mentioned | ✅ All FS types implement `readdir` callback | None |
+
+**Summary:** AdrOS VFS is **more advanced** than the supplementary material suggests. It has 5 filesystem types, overlayfs, and generic readdir. The supplementary material's USTAR/LZ4 approach is an alternative InitRD strategy.
+
+---
+
+### 1.7 TTY / PTY
+
+| Aspect | Supplementary Suggestion | AdrOS Current State | Gap |
+|--------|--------------------------|---------------------|-----|
+| Circular buffer for keyboard | ✅ Ring buffer + wait queue | ✅ Ring buffer in `tty.c` with blocking reads | None |
+| `tty_push_char` from IRQ | ✅ IRQ1 handler → buffer | ✅ Keyboard IRQ → `tty_input_char()` | None |
+| Canonical mode (line editing) | ✅ Buffer until Enter | ✅ Line-buffered with echo + backspace | None |
+| PTY master/slave | Not discussed | ✅ Full PTY implementation with `/dev/ptmx` + `/dev/pts/0` | **AdrOS is ahead** |
+| Job control (SIGTTIN/SIGTTOU) | Not discussed | ✅ `pty_jobctl_read_check()` / `pty_jobctl_write_check()` | **AdrOS is ahead** |
+| `poll()` support | ✅ `tty_poll()` | ✅ `pty_master_can_read()` etc. integrated with `poll` | None |
+| Raw mode | Not discussed | ❌ Not implemented | Needed for editors/games |
+
+**Summary:** AdrOS TTY/PTY is **significantly ahead** of the supplementary material. Full PTY with job control is a major achievement.
+
+---
+
+### 1.8 Spinlocks & Synchronization
+
+| Aspect | Supplementary Suggestion | AdrOS Current State | Gap |
+|--------|--------------------------|---------------------|-----|
+| `xchg`-based spinlock | ✅ Inline asm `xchgl` | ✅ `__sync_lock_test_and_set` (generates `xchg`) | Equivalent |
+| `pause` in spin loop | ✅ `__asm__ volatile("pause")` | ✅ Present in `spin_lock()` | None |
+| IRQ save/restore | ✅ `pushcli`/`popcli` with nesting | ✅ `irq_save()`/`irq_restore()` via `pushf`/`popf` | None |
+| `spin_lock_irqsave` | ✅ Combined lock + IRQ disable | ✅ `spin_lock_irqsave()` / `spin_unlock_irqrestore()` | None |
+| Debug name field | ✅ `char *name` for panic messages | ❌ No name field | Minor |
+| CPU ID tracking | ✅ `lock->cpu_id` for deadlock detection | ❌ Not tracked | Needed for SMP |
+| Nesting counter (`ncli`) | ✅ Per-CPU nesting | ❌ Not implemented (flat save/restore) | Needed for SMP |
+
+**Summary:** AdrOS spinlocks are correct for single-core. The supplementary material's SMP-aware features (CPU tracking, nesting) are needed only when AdrOS targets multi-core.
+
+---
+
+### 1.9 ELF Loader
+
+| Aspect | Supplementary Suggestion | AdrOS Current State | Gap |
+|--------|--------------------------|---------------------|-----|
+| Parse ELF headers | ✅ `Elf32_Ehdr` + `Elf32_Phdr` | ✅ Full validation + PT_LOAD processing | None |
+| Map segments with correct flags | ✅ PF_W → WRITABLE, PF_X → EXECUTABLE | ✅ Maps with `VMM_FLAG_RW`, then `vmm_protect_range()` for .text | None |
+| W^X enforcement | ✅ Policy in `vmm_map` | ✅ `.text` marked read-only after copy | Achieved differently |
+| Reject kernel-range vaddrs | Not discussed | ✅ Rejects `p_vaddr >= 0xC0000000` | **AdrOS is ahead** |
+| User stack allocation | ✅ Mentioned | ✅ Maps user stack at `0x00800000` | None |
+
+**Summary:** AdrOS ELF loader is **complete and secure** with proper validation and W^X enforcement.
+
+---
+
+### 1.10 User-Space / libc
+
+| Aspect | Supplementary Suggestion | AdrOS Current State | Gap |
+|--------|--------------------------|---------------------|-----|
+| `crt0.S` (entry point) | ✅ `_start` → `main` → `exit` | ✅ `user/crt0.S` with argc/argv setup | None |
+| Syscall stub (int 0x80) | ✅ `_syscall_invoke` via registers | ✅ `_syscall` in `user/syscall.S` | None |
+| `SYSENTER` fast path | ✅ vDSO + MSR setup | ❌ Only `int 0x80` | Enhancement |
+| libc wrappers | ✅ `syscalls.c` with errno | ❌ Raw syscall wrappers only, no errno | **Key gap** |
+| `init.c` (early userspace) | ✅ mount + pivot_root + execve | ✅ `user/init.c` — comprehensive smoke tests | Different purpose |
+| User linker script | ✅ `user.ld` at 0x08048000 | ✅ `user/user.ld` at 0x00400000 | Both valid |
+
+**Summary:** AdrOS has a working userspace with syscall stubs and a comprehensive test binary. Missing a proper libc and `SYSENTER` optimization.
+
+---
+
+### 1.11 Drivers (Not Yet in AdrOS)
+
+| Driver | Supplementary Suggestion | AdrOS Current State |
+|--------|--------------------------|---------------------|
+| PCI enumeration | ✅ Full scan (bus/dev/func) | ❌ Not implemented |
+| Intel E1000 NIC | ✅ RX/TX descriptor rings + DMA | ❌ Not implemented |
+| VBE/Framebuffer | ✅ Map LFB + MTRR write-combining | ❌ VGA text mode only |
+| Intel HDA Audio | ✅ DMA ring buffers | ❌ Not implemented |
+| lwIP TCP/IP stack | ✅ `sys_arch.c` bridge | ❌ Not implemented |
+
+---
+
+### 1.12 Advanced Features (Not Yet in AdrOS)
+
+| Feature | Supplementary Suggestion | AdrOS Current State |
+|---------|--------------------------|---------------------|
+| Copy-on-Write (CoW) fork | ✅ Full implementation with ref-counting | ❌ Full address space copy |
+| Slab allocator | ✅ `slab_cache_t` with free-list-in-place | ❌ Only `kmalloc`/`kfree` |
+| Shared memory (shmem/mmap) | ✅ `sys_shmget` / `sys_shmat` | ❌ Not implemented |
+| Zero-copy DMA I/O | ✅ Map DMA buffer into user VA | ❌ Not implemented |
+| vDSO | ✅ Kernel-mapped page with syscall code | ❌ Not implemented |
+
+---
+
+## Part 2 — POSIX Compatibility Assessment
+
+### Overall Score: **~45% toward a practical Unix-like POSIX system**
+
+This score reflects that AdrOS has the **core architectural skeleton** of a Unix system
+fully in place, but lacks several key POSIX interfaces and userland components needed
+for real-world use.
+
+### What AdrOS Already Has (Strengths)
+
+1. **Process model** — `fork`, `execve`, `waitpid`, `exit`, `getpid`, `getppid`, `setsid`, `setpgid`, `getpgrp` — all working
+2. **File I/O** — `open`, `read`, `write`, `close`, `lseek`, `stat`, `fstat`, `dup`, `dup2`, `dup3`, `pipe`, `pipe2`, `fcntl`, `getdents` — comprehensive
+3. **Signals** — `sigaction`, `sigprocmask`, `kill`, `sigreturn` with full trampoline — robust
+4. **VFS** — 5 filesystem types, mount table, path resolution, per-process cwd — excellent
+5. **TTY/PTY** — Line discipline, job control, blocking I/O, `ioctl` — very good
+6. **Select/Poll** — Working for pipes and TTY devices
+7. **Memory isolation** — Per-process address spaces, user/kernel separation, `uaccess` validation
+8. **ELF loading** — Secure loader with W^X enforcement
+9. **Spinlocks** — Correct `xchg`-based implementation with IRQ save/restore
+
+### What's Missing for Practical POSIX (Gaps by Priority)
+
+#### Tier 1 — Blocks basic usability
+| Gap | Impact | Effort |
+|-----|--------|--------|
+| **Minimal libc** (`printf`, `malloc`, `string.h`, `stdio.h`) | Can't build real userland programs | Medium |
+| **Shell** (`sh`-compatible) | No interactive use without it | Medium |
+| **Signal characters** (Ctrl+C → SIGINT, Ctrl+D → EOF) | Can't interrupt/control processes | Small |
+| **`brk`/`sbrk`** (user heap) | No `malloc` in userspace | Small-Medium |
+| **Core utilities** (`ls`, `cat`, `echo`, `mkdir`, `rm`) | No file management | Medium |
+
+#### Tier 2 — Required for POSIX compliance
+| Gap | Impact | Effort |
+|-----|--------|--------|
+| **`mmap`/`munmap`** | No memory-mapped files, no shared memory | Medium-Large |
+| **`O_CLOEXEC`** | FD leaks across `execve` | Small |
+| **Permissions** (`uid`/`gid`/mode/`chmod`/`chown`) | No multi-user security | Medium |
+| **Hard/symbolic links** | Incomplete filesystem semantics | Medium |
+| **`/proc` filesystem** | No process introspection | Medium |
+| **`nanosleep`/`clock_gettime`** | No time management | Small |
+| **Raw TTY mode** | Can't run editors or games | Small |
+
+#### Tier 3 — Full Unix experience
+| Gap | Impact | Effort |
+|-----|--------|--------|
+| **CoW fork** | Memory waste on fork-heavy workloads | Large |
+| **PAE + NX bit** | No hardware W^X enforcement | Large |
+| **Slab allocator** | Performance for frequent small allocs | Medium |
+| **Networking** (socket API + TCP/IP) | No network connectivity | Very Large |
+| **Threads** (`clone`/`pthread`) | No multi-threaded programs | Large |
+| **Dynamic linking** (`ld.so`) | Can't use shared libraries | Very Large |
+| **VBE framebuffer** | No graphical output | Medium |
+| **PCI + device drivers** | No hardware discovery | Large |
+
+---
+
+## Part 3 — Architectural Comparison Summary
+
+| Dimension | Supplementary Material | AdrOS Current | Verdict |
+|-----------|----------------------|---------------|---------|
+| **Boot flow** | GRUB → Stub (LZ4) → Kernel → USTAR InitRD | GRUB → Kernel → Custom InitRD → OverlayFS | Both valid; AdrOS is simpler |
+| **Memory architecture** | PMM + Slab + CoW + Zero-Copy DMA | PMM + Heap (linked list) | Supplementary is more advanced |
+| **Scheduler** | O(1) with bitmap + active/expired arrays | Round-robin with linked list | Supplementary is more advanced |
+| **VFS** | USTAR + FAT (planned) | tmpfs + devfs + overlayfs + diskfs + persistfs | **AdrOS is more advanced** |
+| **Syscall interface** | int 0x80 + SYSENTER + vDSO | int 0x80 only | Supplementary has more optimization |
+| **Signal handling** | Basic trampoline concept | Full SA_SIGINFO + sigreturn + sigframe | **AdrOS is more advanced** |
+| **TTY/PTY** | Basic circular buffer | Full PTY with job control | **AdrOS is more advanced** |
+| **Synchronization** | SMP-aware spinlocks with CPU tracking | Single-core spinlocks with IRQ save | Supplementary targets SMP |
+| **Userland** | libc stubs + init + shell concept | Raw syscall wrappers + test binary | Both early-stage |
+| **Drivers** | PCI + E1000 + VBE + HDA (conceptual) | UART + VGA text + PS/2 + ATA PIO | Supplementary has more scope |
+
+---
+
+## Part 4 — Recommendations
+
+### Immediate Actions (use supplementary material as inspiration)
+
+1. **Add signal characters to TTY** — Ctrl+C/Ctrl+Z/Ctrl+D handling in `tty_input_char()`. Small change, huge usability gain.
+
+2. **Implement `brk`/`sbrk` syscall** — Track a per-process heap break pointer. Essential for userland `malloc`.
+
+3. **Build minimal libc** — Start with `write`-based `printf`, `brk`-based `malloc`, `string.h`. The supplementary `syscalls.c.txt` and `unistd.c.txt` show the pattern.
+
+4. **Build a shell** — All required syscalls (`fork`+`execve`+`waitpid`+`pipe`+`dup2`+`chdir`) are already implemented.
+
+### Medium-Term (architectural improvements from supplementary material)
+
+5. **PMM ref-counting** — Add `uint16_t` ref-count array alongside bitmap. Prerequisite for CoW.
+
+6. **CoW fork** — Use PTE bit 9 as CoW marker, handle in page fault. The supplementary material's `vmm_copy_for_fork()` pattern is clean.
+
+7. **W^X policy function** — Add `vmm_apply_wx_policy()` as a centralized check. Currently AdrOS achieves W^X ad-hoc in the ELF loader.
+
+8. **`mmap`/`munmap`** — Requires `vmm_find_free_area()` from supplementary material. Critical for POSIX.
+
+### Long-Term (from supplementary material roadmap)
+
+9. **CPUID + PAE + NX** — Follow the `cpu_get_features()` / `cpu_enable_nx()` pattern for hardware W^X.
+
+10. **O(1) scheduler** — The active/expired bitmap swap pattern is elegant and well-suited for AdrOS.
+
+11. **Slab allocator** — The supplementary material's free-list-in-place design is simple and effective.
+
+12. **PCI + networking** — Follow the PCI scan → BAR mapping → E1000 DMA ring → lwIP bridge pattern.
+
+---
+
+## Conclusion
+
+AdrOS is a **well-architected hobby OS** that has already implemented many of the hardest
+parts of a Unix-like system: process management with signals, a multi-filesystem VFS,
+PTY with job control, and a secure ELF loader. It is approximately **45% of the way**
+to a practical POSIX-compatible system.
+
+The supplementary material provides excellent **architectural blueprints** for the next
+evolution: CoW memory, O(1) scheduling, hardware NX, and networking. However, AdrOS is
+already **ahead** of the supplementary material in several areas (VFS diversity, signal
+handling, PTY/job control).
+
+The most impactful next steps are **not** the advanced features from the supplementary
+material, but rather the **userland enablers**: a minimal libc, a shell, and `brk`/`sbrk`.
+These would transform AdrOS from a kernel with smoke tests into an interactive Unix system.
char string[0];
};
+struct multiboot_tag_framebuffer {
+ uint32_t type;
+ uint32_t size;
+ uint64_t framebuffer_addr;
+ uint32_t framebuffer_pitch;
+ uint32_t framebuffer_width;
+ uint32_t framebuffer_height;
+ uint8_t framebuffer_bpp;
+ uint8_t framebuffer_type;
+ uint8_t reserved;
+};
+
#define MULTIBOOT_MEMORY_AVAILABLE 1
#define MULTIBOOT_MEMORY_RESERVED 2
#define MULTIBOOT_MEMORY_ACPI_RECLAIMABLE 3
#define PF_W 0x2
#define PF_R 0x4
-int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uintptr_t* user_stack_top_out, uintptr_t* addr_space_out);
+int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uintptr_t* user_stack_top_out, uintptr_t* addr_space_out, uintptr_t* heap_break_out);
#endif
uintptr_t initrd_end;
const char* cmdline;
+
+ uintptr_t fb_addr;
+ uint32_t fb_pitch;
+ uint32_t fb_width;
+ uint32_t fb_height;
+ uint8_t fb_bpp;
};
#endif
--- /dev/null
+#ifndef PCI_H
+#define PCI_H
+
+#include <stdint.h>
+
+struct pci_device {
+ uint8_t bus;
+ uint8_t slot;
+ uint8_t func;
+ uint16_t vendor_id;
+ uint16_t device_id;
+ uint8_t class_code;
+ uint8_t subclass;
+ uint8_t prog_if;
+ uint8_t header_type;
+ uint32_t bar[6];
+ uint8_t irq_line;
+};
+
+#define PCI_MAX_DEVICES 32
+
+uint32_t pci_config_read(uint8_t bus, uint8_t slot, uint8_t func, uint8_t offset);
+void pci_config_write(uint8_t bus, uint8_t slot, uint8_t func, uint8_t offset, uint32_t value);
+
+void pci_init(void);
+int pci_get_device_count(void);
+const struct pci_device* pci_get_device(int index);
+const struct pci_device* pci_find_device(uint16_t vendor, uint16_t device);
+const struct pci_device* pci_find_class(uint8_t class_code, uint8_t subclass);
+
+#endif
// Allocate a single physical page
void* pmm_alloc_page(void);
-// Free a physical page
+// Free a physical page (decrements refcount, frees at 0)
void pmm_free_page(void* ptr);
+// Reference counting for Copy-on-Write
+void pmm_incref(uintptr_t paddr);
+uint16_t pmm_decref(uintptr_t paddr);
+uint16_t pmm_get_refcount(uintptr_t paddr);
+
// Helper to print memory stats
void pmm_print_stats(void);
uintptr_t sp;
uintptr_t addr_space;
uint32_t* kernel_stack;
+#define SCHED_NUM_PRIOS 32
+#define SCHED_DEFAULT_PRIO 16
+
+ uint8_t priority; // 0 = highest, 31 = lowest
+ int8_t nice; // -20 to +19 (maps to priority)
process_state_t state;
- uint32_t wake_at_tick; // New: When to wake up (global tick count)
+ uint32_t wake_at_tick;
int exit_status;
int has_user_regs;
// For SIGSEGV: last page fault address (CR2) captured in ring3.
uintptr_t last_fault_addr;
+#define PROCESS_MAX_MMAPS 32
+ struct {
+ uintptr_t base;
+ uint32_t length;
+ } mmaps[PROCESS_MAX_MMAPS];
+
+ uintptr_t heap_start;
+ uintptr_t heap_break;
+
char cwd[128];
int waiting;
int wait_result_pid;
int wait_result_status;
struct file* files[PROCESS_MAX_FILES];
+ uint8_t fd_flags[PROCESS_MAX_FILES];
struct process* next;
- struct process* prev; // Doubly linked list helps here too! (Optional but good)
+ struct process* prev;
+
+ struct process* rq_next; // O(1) runqueue per-priority list
+ struct process* rq_prev;
};
// Global pointer to the currently running process
// Kill a process (minimal signals). Returns 0 on success or -errno.
int process_kill(uint32_t pid, int sig);
+// Send a signal to all processes in a process group.
+int process_kill_pgrp(uint32_t pgrp, int sig);
+
// Create a child process that will resume in usermode from a saved register frame.
struct process* process_fork_create(uintptr_t child_as, const struct registers* child_regs);
--- /dev/null
+#ifndef PROCFS_H
+#define PROCFS_H
+
+#include "fs.h"
+
+fs_node_t* procfs_create_root(void);
+
+#endif
--- /dev/null
+#ifndef SLAB_H
+#define SLAB_H
+
+#include <stdint.h>
+#include <stddef.h>
+#include "spinlock.h"
+
+typedef struct slab_cache {
+ const char* name;
+ uint32_t obj_size;
+ uint32_t objs_per_slab;
+ void* free_list;
+ uint32_t total_allocs;
+ uint32_t total_frees;
+ spinlock_t lock;
+} slab_cache_t;
+
+void slab_cache_init(slab_cache_t* cache, const char* name, uint32_t obj_size);
+void* slab_alloc(slab_cache_t* cache);
+void slab_free(slab_cache_t* cache, void* obj);
+
+#endif
SYSCALL_RENAME = 39,
SYSCALL_RMDIR = 40,
+
+ SYSCALL_BRK = 41,
+ SYSCALL_NANOSLEEP = 42,
+ SYSCALL_CLOCK_GETTIME = 43,
+ SYSCALL_MMAP = 44,
+ SYSCALL_MUNMAP = 45,
};
#endif
#include <stdint.h>
struct termios {
+ uint32_t c_iflag;
+ uint32_t c_oflag;
+ uint32_t c_cflag;
uint32_t c_lflag;
};
+struct winsize {
+ uint16_t ws_row;
+ uint16_t ws_col;
+ uint16_t ws_xpixel;
+ uint16_t ws_ypixel;
+};
+
enum {
- TTY_ICANON = 0x0001,
- TTY_ECHO = 0x0002,
+ TTY_ICANON = 0x0002,
+ TTY_ECHO = 0x0008,
+ TTY_ISIG = 0x0001,
};
void tty_init(void);
--- /dev/null
+#ifndef VBE_H
+#define VBE_H
+
+#include <stdint.h>
+#include "kernel/boot_info.h"
+
+struct vbe_info {
+ uintptr_t phys_addr;
+ volatile uint8_t* virt_addr;
+ uint32_t pitch;
+ uint32_t width;
+ uint32_t height;
+ uint8_t bpp;
+ uint32_t size;
+};
+
+int vbe_init(const struct boot_info* bi);
+int vbe_available(void);
+const struct vbe_info* vbe_get_info(void);
+
+void vbe_put_pixel(uint32_t x, uint32_t y, uint32_t color);
+void vbe_fill_rect(uint32_t x, uint32_t y, uint32_t w, uint32_t h, uint32_t color);
+void vbe_clear(uint32_t color);
+
+#endif
#define VMM_FLAG_PRESENT (1 << 0)
#define VMM_FLAG_RW (1 << 1)
#define VMM_FLAG_USER (1 << 2)
+#define VMM_FLAG_COW (1 << 9) /* OS-available bit: Copy-on-Write marker */
/*
* Initialize Virtual Memory Manager
uintptr_t vmm_as_clone_user(uintptr_t src_as);
+/*
+ * Clone user address space using Copy-on-Write.
+ * Shared pages are marked read-only + COW bit; physical frames get incref'd.
+ */
+uintptr_t vmm_as_clone_user_cow(uintptr_t src_as);
+
+/*
+ * Handle a Copy-on-Write page fault.
+ * Returns 1 if the fault was a CoW fault and was resolved, 0 otherwise.
+ */
+int vmm_handle_cow_fault(uintptr_t fault_addr);
+
/*
* Update flags for an already-mapped virtual page.
* Keeps the physical frame, only changes PRESENT/RW/USER bits.
bi.initrd_start = 0;
bi.initrd_end = 0;
bi.cmdline = NULL;
+ bi.fb_addr = 0;
+ bi.fb_pitch = 0;
+ bi.fb_width = 0;
+ bi.fb_height = 0;
+ bi.fb_bpp = 0;
if (mbi_phys) {
uint32_t total_size = *(volatile uint32_t*)mbi_phys;
const struct multiboot_tag_string* s = (const struct multiboot_tag_string*)tag;
bi.cmdline = s->string;
}
+ if (tag->type == MULTIBOOT_TAG_TYPE_FRAMEBUFFER) {
+ const struct multiboot_tag_framebuffer* fb = (const struct multiboot_tag_framebuffer*)tag;
+ bi.fb_addr = (uintptr_t)fb->framebuffer_addr;
+ bi.fb_pitch = fb->framebuffer_pitch;
+ bi.fb_width = fb->framebuffer_width;
+ bi.fb_height = fb->framebuffer_height;
+ bi.fb_bpp = fb->framebuffer_bpp;
+ }
}
}
uintptr_t entry = 0;
uintptr_t user_sp = 0;
uintptr_t user_as = 0;
- if (elf32_load_user_from_initrd("/bin/init.elf", &entry, &user_sp, &user_as) != 0) {
+ uintptr_t heap_brk = 0;
+ if (elf32_load_user_from_initrd("/bin/init.elf", &entry, &user_sp, &user_as, &heap_brk) != 0) {
process_exit_notify(1);
schedule();
for (;;) hal_cpu_idle();
}
current_process->addr_space = user_as;
+ current_process->heap_start = heap_brk;
+ current_process->heap_break = heap_brk;
vmm_as_activate(user_as);
uart_print("[ELF] starting /bin/init.elf\n");
#include "process.h"
#include "spinlock.h"
#include "uaccess.h"
+#include "vmm.h"
#include "syscall.h"
#include "signal.h"
#include <stddef.h>
__asm__ volatile("mov %%cr2, %0" : "=r"(cr2));
if ((regs->cs & 3U) == 3U) {
+ // Check for Copy-on-Write fault (write to read-only CoW page).
+ // Error code bit 1 = caused by a write.
+ if ((regs->err_code & 0x2) && vmm_handle_cow_fault((uintptr_t)cr2)) {
+ return; // CoW resolved, resume user process.
+ }
+
const int SIG_SEGV = 11;
if (current_process) {
current_process->last_fault_addr = (uintptr_t)cr2;
#define X86_PTE_PRESENT 0x1
#define X86_PTE_RW 0x2
#define X86_PTE_USER 0x4
+#define X86_PTE_COW 0x200 /* Bit 9: OS-available, marks Copy-on-Write */
/* Defined in boot.S (Physical address loaded in CR3, but accessed via virt alias) */
/* Wait, boot_pd is in BSS. Linker put it at 0xC0xxxxxx.
if (flags & VMM_FLAG_PRESENT) x86_flags |= X86_PTE_PRESENT;
if (flags & VMM_FLAG_RW) x86_flags |= X86_PTE_RW;
if (flags & VMM_FLAG_USER) x86_flags |= X86_PTE_USER;
+ if (flags & VMM_FLAG_COW) x86_flags |= X86_PTE_COW;
return x86_flags;
}
invlpg((uintptr_t)virt);
}
+uintptr_t vmm_as_clone_user_cow(uintptr_t src_as) {
+ if (!src_as) return 0;
+
+ uintptr_t new_as = vmm_as_create_kernel_clone();
+ if (!new_as) return 0;
+
+ uintptr_t old_as = hal_cpu_get_address_space();
+ vmm_as_activate(src_as);
+ volatile uint32_t* src_pd = x86_pd_recursive();
+
+ for (uint32_t pdi = 0; pdi < 768; pdi++) {
+ uint32_t pde = (uint32_t)src_pd[pdi];
+ if ((pde & X86_PTE_PRESENT) == 0) continue;
+
+ volatile uint32_t* src_pt = x86_pt_recursive(pdi);
+
+ for (uint32_t pti = 0; pti < 1024; pti++) {
+ uint32_t pte = (uint32_t)src_pt[pti];
+ if (!(pte & X86_PTE_PRESENT)) continue;
+ if ((pte & X86_PTE_USER) == 0) continue;
+
+ uint32_t frame_phys = pte & 0xFFFFF000;
+ uintptr_t va = ((uintptr_t)pdi << 22) | ((uintptr_t)pti << 12);
+
+ // Mark source page as read-only + CoW if it was writable.
+ uint32_t new_pte = frame_phys | X86_PTE_PRESENT | X86_PTE_USER;
+ if (pte & X86_PTE_RW) {
+ new_pte |= X86_PTE_COW; // Was writable -> CoW
+ // Remove RW from source
+ src_pt[pti] = new_pte;
+ invlpg(va);
+ } else {
+ new_pte = pte; // Keep as-is (read-only text, etc.)
+ }
+
+ // Increment physical frame refcount
+ pmm_incref((uintptr_t)frame_phys);
+
+ // Map same frame into child with same flags
+ vmm_as_map_page(new_as, (uint64_t)frame_phys, (uint64_t)va,
+ VMM_FLAG_PRESENT | VMM_FLAG_USER |
+ ((new_pte & X86_PTE_COW) ? VMM_FLAG_COW : 0));
+ }
+ }
+
+ vmm_as_activate(old_as);
+ return new_as;
+}
+
+int vmm_handle_cow_fault(uintptr_t fault_addr) {
+ uintptr_t va = fault_addr & ~(uintptr_t)0xFFF;
+ uint32_t pdi = va >> 22;
+ uint32_t pti = (va >> 12) & 0x3FF;
+
+ if (pdi >= 768) return 0; // Kernel space, not CoW
+
+ volatile uint32_t* pd = x86_pd_recursive();
+ if ((pd[pdi] & X86_PTE_PRESENT) == 0) return 0;
+
+ volatile uint32_t* pt = x86_pt_recursive(pdi);
+ uint32_t pte = pt[pti];
+
+ if (!(pte & X86_PTE_PRESENT)) return 0;
+ if (!(pte & X86_PTE_COW)) return 0;
+
+ uint32_t old_frame = pte & 0xFFFFF000;
+ uint16_t rc = pmm_get_refcount((uintptr_t)old_frame);
+
+ if (rc <= 1) {
+ // We're the sole owner — just make it writable and clear CoW.
+ pt[pti] = old_frame | X86_PTE_PRESENT | X86_PTE_RW | X86_PTE_USER;
+ invlpg(va);
+ return 1;
+ }
+
+ // Allocate a new frame and copy the page contents.
+ void* new_frame = pmm_alloc_page();
+ if (!new_frame) return 0; // OOM — caller will SIGSEGV
+
+ // Use a temporary kernel VA to copy data.
+ const uintptr_t TMP_COW_VA = 0xBFFFD000U;
+ vmm_map_page((uint64_t)(uintptr_t)new_frame, (uint64_t)TMP_COW_VA,
+ VMM_FLAG_PRESENT | VMM_FLAG_RW);
+ memcpy((void*)TMP_COW_VA, (const void*)va, 4096);
+ vmm_unmap_page((uint64_t)TMP_COW_VA);
+
+ // Decrement old frame refcount.
+ pmm_decref((uintptr_t)old_frame);
+
+ // Map new frame as writable (no CoW).
+ pt[pti] = (uint32_t)(uintptr_t)new_frame | X86_PTE_PRESENT | X86_PTE_RW | X86_PTE_USER;
+ invlpg(va);
+
+ return 1;
+}
+
void vmm_init(void) {
uart_print("[VMM] Higher Half Kernel Active.\n");
--- /dev/null
+#include "pci.h"
+#include "io.h"
+#include "uart_console.h"
+#include "utils.h"
+
+#define PCI_CONFIG_ADDR 0xCF8
+#define PCI_CONFIG_DATA 0xCFC
+
+static struct pci_device pci_devices[PCI_MAX_DEVICES];
+static int pci_device_count = 0;
+
+uint32_t pci_config_read(uint8_t bus, uint8_t slot, uint8_t func, uint8_t offset) {
+ uint32_t address = (1U << 31)
+ | ((uint32_t)bus << 16)
+ | ((uint32_t)(slot & 0x1F) << 11)
+ | ((uint32_t)(func & 0x07) << 8)
+ | ((uint32_t)offset & 0xFC);
+ outl(PCI_CONFIG_ADDR, address);
+ return inl(PCI_CONFIG_DATA);
+}
+
+void pci_config_write(uint8_t bus, uint8_t slot, uint8_t func, uint8_t offset, uint32_t value) {
+ uint32_t address = (1U << 31)
+ | ((uint32_t)bus << 16)
+ | ((uint32_t)(slot & 0x1F) << 11)
+ | ((uint32_t)(func & 0x07) << 8)
+ | ((uint32_t)offset & 0xFC);
+ outl(PCI_CONFIG_ADDR, address);
+ outl(PCI_CONFIG_DATA, value);
+}
+
+static void pci_scan_func(uint8_t bus, uint8_t slot, uint8_t func) {
+ uint32_t reg0 = pci_config_read(bus, slot, func, 0x00);
+ uint16_t vendor = (uint16_t)(reg0 & 0xFFFF);
+ uint16_t device = (uint16_t)(reg0 >> 16);
+
+ if (vendor == 0xFFFF) return;
+ if (pci_device_count >= PCI_MAX_DEVICES) return;
+
+ struct pci_device* d = &pci_devices[pci_device_count];
+ d->bus = bus;
+ d->slot = slot;
+ d->func = func;
+ d->vendor_id = vendor;
+ d->device_id = device;
+
+ uint32_t reg2 = pci_config_read(bus, slot, func, 0x08);
+ d->class_code = (uint8_t)(reg2 >> 24);
+ d->subclass = (uint8_t)(reg2 >> 16);
+ d->prog_if = (uint8_t)(reg2 >> 8);
+
+ uint32_t reg3 = pci_config_read(bus, slot, func, 0x0C);
+ d->header_type = (uint8_t)(reg3 >> 16);
+
+ for (int i = 0; i < 6; i++) {
+ d->bar[i] = pci_config_read(bus, slot, func, (uint8_t)(0x10 + i * 4));
+ }
+
+ uint32_t reg_irq = pci_config_read(bus, slot, func, 0x3C);
+ d->irq_line = (uint8_t)(reg_irq & 0xFF);
+
+ pci_device_count++;
+}
+
+static void pci_scan_slot(uint8_t bus, uint8_t slot) {
+ uint32_t reg0 = pci_config_read(bus, slot, 0, 0x00);
+ if ((reg0 & 0xFFFF) == 0xFFFF) return;
+
+ pci_scan_func(bus, slot, 0);
+
+ uint32_t reg3 = pci_config_read(bus, slot, 0, 0x0C);
+ uint8_t header_type = (uint8_t)(reg3 >> 16);
+ if (header_type & 0x80) {
+ for (uint8_t func = 1; func < 8; func++) {
+ pci_scan_func(bus, slot, func);
+ }
+ }
+}
+
+static void pci_scan_bus(uint8_t bus) {
+ for (uint8_t slot = 0; slot < 32; slot++) {
+ pci_scan_slot(bus, slot);
+ }
+}
+
+void pci_init(void) {
+ pci_device_count = 0;
+
+ uint32_t reg3 = pci_config_read(0, 0, 0, 0x0C);
+ uint8_t header_type = (uint8_t)(reg3 >> 16);
+
+ if (header_type & 0x80) {
+ for (uint8_t func = 0; func < 8; func++) {
+ uint32_t r = pci_config_read(0, 0, func, 0x00);
+ if ((r & 0xFFFF) == 0xFFFF) continue;
+ pci_scan_bus(func);
+ }
+ } else {
+ pci_scan_bus(0);
+ }
+
+ uart_print("[PCI] Enumerated ");
+ char buf[8];
+ itoa(pci_device_count, buf, 10);
+ uart_print(buf);
+ uart_print(" device(s)\n");
+
+ for (int i = 0; i < pci_device_count; i++) {
+ struct pci_device* d = &pci_devices[i];
+ uart_print(" ");
+ char hex[12];
+ itoa_hex(d->vendor_id, hex); uart_print(hex);
+ uart_print(":");
+ itoa_hex(d->device_id, hex); uart_print(hex);
+ uart_print(" class=");
+ itoa_hex(d->class_code, hex); uart_print(hex);
+ uart_print(":");
+ itoa_hex(d->subclass, hex); uart_print(hex);
+ uart_print("\n");
+ }
+}
+
+int pci_get_device_count(void) {
+ return pci_device_count;
+}
+
+const struct pci_device* pci_get_device(int index) {
+ if (index < 0 || index >= pci_device_count) return 0;
+ return &pci_devices[index];
+}
+
+const struct pci_device* pci_find_device(uint16_t vendor, uint16_t device) {
+ for (int i = 0; i < pci_device_count; i++) {
+ if (pci_devices[i].vendor_id == vendor && pci_devices[i].device_id == device)
+ return &pci_devices[i];
+ }
+ return 0;
+}
+
+const struct pci_device* pci_find_class(uint8_t class_code, uint8_t subclass) {
+ for (int i = 0; i < pci_device_count; i++) {
+ if (pci_devices[i].class_code == class_code && pci_devices[i].subclass == subclass)
+ return &pci_devices[i];
+ }
+ return 0;
+}
--- /dev/null
+#include "vbe.h"
+#include "vmm.h"
+#include "uart_console.h"
+#include "utils.h"
+
+#include <stddef.h>
+
+static struct vbe_info g_vbe;
+static int g_vbe_ready = 0;
+
+int vbe_init(const struct boot_info* bi) {
+ if (!bi || bi->fb_addr == 0 || bi->fb_width == 0 || bi->fb_height == 0 || bi->fb_bpp == 0) {
+ uart_print("[VBE] No framebuffer provided by bootloader.\n");
+ return -1;
+ }
+
+ g_vbe.phys_addr = bi->fb_addr;
+ g_vbe.pitch = bi->fb_pitch;
+ g_vbe.width = bi->fb_width;
+ g_vbe.height = bi->fb_height;
+ g_vbe.bpp = bi->fb_bpp;
+ g_vbe.size = g_vbe.pitch * g_vbe.height;
+
+ uint32_t pages = (g_vbe.size + 0xFFF) >> 12;
+ uintptr_t virt_base = 0xD0000000U;
+
+ for (uint32_t i = 0; i < pages; i++) {
+ vmm_map_page((uint64_t)(g_vbe.phys_addr + i * 0x1000),
+ (uint64_t)(virt_base + i * 0x1000),
+ VMM_FLAG_PRESENT | VMM_FLAG_RW);
+ }
+
+ g_vbe.virt_addr = (volatile uint8_t*)virt_base;
+ g_vbe_ready = 1;
+
+ uart_print("[VBE] Framebuffer ");
+ char buf[16];
+ itoa(g_vbe.width, buf, 10); uart_print(buf);
+ uart_print("x");
+ itoa(g_vbe.height, buf, 10); uart_print(buf);
+ uart_print("x");
+ itoa(g_vbe.bpp, buf, 10); uart_print(buf);
+ uart_print(" @ 0x");
+ itoa_hex(g_vbe.phys_addr, buf); uart_print(buf);
+ uart_print(" mapped to 0x");
+ itoa_hex(virt_base, buf); uart_print(buf);
+ uart_print("\n");
+
+ return 0;
+}
+
+int vbe_available(void) {
+ return g_vbe_ready;
+}
+
+const struct vbe_info* vbe_get_info(void) {
+ if (!g_vbe_ready) return NULL;
+ return &g_vbe;
+}
+
+void vbe_put_pixel(uint32_t x, uint32_t y, uint32_t color) {
+ if (!g_vbe_ready) return;
+ if (x >= g_vbe.width || y >= g_vbe.height) return;
+
+ uint32_t offset = y * g_vbe.pitch + x * (g_vbe.bpp / 8);
+ volatile uint8_t* pixel = g_vbe.virt_addr + offset;
+
+ if (g_vbe.bpp == 32) {
+ *(volatile uint32_t*)pixel = color;
+ } else if (g_vbe.bpp == 24) {
+ pixel[0] = (uint8_t)(color & 0xFF);
+ pixel[1] = (uint8_t)((color >> 8) & 0xFF);
+ pixel[2] = (uint8_t)((color >> 16) & 0xFF);
+ } else if (g_vbe.bpp == 16) {
+ *(volatile uint16_t*)pixel = (uint16_t)color;
+ }
+}
+
+void vbe_fill_rect(uint32_t x, uint32_t y, uint32_t w, uint32_t h, uint32_t color) {
+ if (!g_vbe_ready) return;
+
+ uint32_t x_end = x + w;
+ uint32_t y_end = y + h;
+ if (x_end > g_vbe.width) x_end = g_vbe.width;
+ if (y_end > g_vbe.height) y_end = g_vbe.height;
+
+ uint32_t bytes_pp = g_vbe.bpp / 8;
+
+ for (uint32_t row = y; row < y_end; row++) {
+ volatile uint8_t* row_ptr = g_vbe.virt_addr + row * g_vbe.pitch + x * bytes_pp;
+ if (g_vbe.bpp == 32) {
+ volatile uint32_t* p = (volatile uint32_t*)row_ptr;
+ for (uint32_t col = x; col < x_end; col++) {
+ *p++ = color;
+ }
+ } else {
+ for (uint32_t col = x; col < x_end; col++) {
+ uint32_t off = (col - x) * bytes_pp;
+ if (bytes_pp == 3) {
+ row_ptr[off] = (uint8_t)(color & 0xFF);
+ row_ptr[off + 1] = (uint8_t)((color >> 8) & 0xFF);
+ row_ptr[off + 2] = (uint8_t)((color >> 16) & 0xFF);
+ } else if (bytes_pp == 2) {
+ *(volatile uint16_t*)(row_ptr + off) = (uint16_t)color;
+ }
+ }
+ }
+ }
+}
+
+void vbe_clear(uint32_t color) {
+ if (!g_vbe_ready) return;
+ vbe_fill_rect(0, 0, g_vbe.width, g_vbe.height, color);
+}
return 0;
}
-int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uintptr_t* user_stack_top_out, uintptr_t* addr_space_out) {
+int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uintptr_t* user_stack_top_out, uintptr_t* addr_space_out, uintptr_t* heap_break_out) {
if (!filename || !entry_out || !user_stack_top_out || !addr_space_out) return -EFAULT;
if (!fs_root) return -EINVAL;
}
const elf32_phdr_t* ph = (const elf32_phdr_t*)(file + eh->e_phoff);
+ uintptr_t highest_seg_end = 0;
for (uint16_t i = 0; i < eh->e_phnum; i++) {
if (ph[i].p_type != PT_LOAD) continue;
if (ph[i].p_memsz > ph[i].p_filesz) {
memset((void*)(uintptr_t)(ph[i].p_vaddr + ph[i].p_filesz), 0, ph[i].p_memsz - ph[i].p_filesz);
}
+
+ if (seg_end > highest_seg_end) {
+ highest_seg_end = seg_end;
+ }
}
const uintptr_t user_stack_base = 0x00800000U;
*entry_out = (uintptr_t)eh->e_entry;
*user_stack_top_out = user_stack_base + user_stack_size;
*addr_space_out = new_as;
+ if (heap_break_out) {
+ *heap_break_out = (highest_seg_end + 0xFFFU) & ~(uintptr_t)0xFFFU;
+ }
kfree(file);
vmm_as_activate(old_as);
return 0;
}
#else
-int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uintptr_t* user_stack_top_out, uintptr_t* addr_space_out) {
+int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uintptr_t* user_stack_top_out, uintptr_t* addr_space_out, uintptr_t* heap_break_out) {
(void)filename;
(void)entry_out;
(void)user_stack_top_out;
(void)addr_space_out;
+ (void)heap_break_out;
return -1;
}
#endif
#include "pty.h"
#include "persistfs.h"
#include "diskfs.h"
+#include "procfs.h"
+#include "pci.h"
+#include "vbe.h"
#include "uart_console.h"
#include "hal/mm.h"
(void)vfs_mount("/tmp", tmp);
}
+ pci_init();
+ vbe_init(bi);
+
tty_init();
pty_init();
(void)vfs_mount("/disk", disk);
}
+ fs_node_t* proc = procfs_create_root();
+ if (proc) {
+ (void)vfs_mount("/proc", proc);
+ }
+
int user_ret = arch_platform_start_userspace(bi);
if (bi && cmdline_has_token(bi->cmdline, "ring3")) {
--- /dev/null
+#include "procfs.h"
+
+#include "process.h"
+#include "utils.h"
+#include "heap.h"
+#include "pmm.h"
+#include "timer.h"
+
+#include <stddef.h>
+
+static fs_node_t g_proc_root;
+static fs_node_t g_proc_self;
+static fs_node_t g_proc_self_status;
+static fs_node_t g_proc_uptime;
+static fs_node_t g_proc_meminfo;
+
+extern struct process* ready_queue_head;
+
+static int proc_snprintf(char* buf, uint32_t sz, const char* key, uint32_t val) {
+ if (sz < 2) return 0;
+ uint32_t w = 0;
+ const char* p = key;
+ while (*p && w + 1 < sz) buf[w++] = *p++;
+ char num[16];
+ itoa(val, num, 10);
+ p = num;
+ while (*p && w + 1 < sz) buf[w++] = *p++;
+ if (w + 1 < sz) buf[w++] = '\n';
+ buf[w] = 0;
+ return (int)w;
+}
+
+static uint32_t proc_self_status_read(fs_node_t* node, uint32_t offset, uint32_t size, uint8_t* buffer) {
+ (void)node;
+ if (!current_process) return 0;
+
+ char tmp[512];
+ uint32_t len = 0;
+
+ len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "Pid:\t", current_process->pid);
+ len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "PPid:\t", current_process->parent_pid);
+ len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "Pgrp:\t", current_process->pgrp_id);
+ len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "Session:\t", current_process->session_id);
+
+ const char* state_str = "unknown\n";
+ switch (current_process->state) {
+ case PROCESS_READY: state_str = "R (ready)\n"; break;
+ case PROCESS_RUNNING: state_str = "R (running)\n"; break;
+ case PROCESS_BLOCKED: state_str = "S (blocked)\n"; break;
+ case PROCESS_SLEEPING: state_str = "S (sleeping)\n"; break;
+ case PROCESS_ZOMBIE: state_str = "Z (zombie)\n"; break;
+ }
+ const char* s = "State:\t";
+ while (*s && len + 1 < sizeof(tmp)) tmp[len++] = *s++;
+ s = state_str;
+ while (*s && len + 1 < sizeof(tmp)) tmp[len++] = *s++;
+
+ len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "SigPnd:\t", current_process->sig_pending_mask);
+ len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "SigBlk:\t", current_process->sig_blocked_mask);
+ len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "HeapStart:\t", (uint32_t)current_process->heap_start);
+ len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "HeapBreak:\t", (uint32_t)current_process->heap_break);
+
+ if (offset >= len) return 0;
+ uint32_t avail = len - offset;
+ if (size > avail) size = avail;
+ memcpy(buffer, tmp + offset, size);
+ return size;
+}
+
+static uint32_t proc_uptime_read(fs_node_t* node, uint32_t offset, uint32_t size, uint8_t* buffer) {
+ (void)node;
+ uint32_t ticks = get_tick_count();
+ uint32_t secs = (ticks * 20) / 1000;
+ uint32_t frac = ((ticks * 20) % 1000) / 10;
+
+ char tmp[64];
+ uint32_t len = 0;
+ char num[16];
+ itoa(secs, num, 10);
+ const char* p = num;
+ while (*p && len + 2 < sizeof(tmp)) tmp[len++] = *p++;
+ if (len + 2 < sizeof(tmp)) tmp[len++] = '.';
+ if (frac < 10 && len + 2 < sizeof(tmp)) tmp[len++] = '0';
+ itoa(frac, num, 10);
+ p = num;
+ while (*p && len + 2 < sizeof(tmp)) tmp[len++] = *p++;
+ if (len + 1 < sizeof(tmp)) tmp[len++] = '\n';
+ if (len < sizeof(tmp)) tmp[len] = 0;
+ else tmp[sizeof(tmp) - 1] = 0;
+
+ if (offset >= len) return 0;
+ uint32_t avail = len - offset;
+ if (size > avail) size = avail;
+ memcpy(buffer, tmp + offset, size);
+ return size;
+}
+
+extern void pmm_print_stats(void);
+
+static uint32_t proc_meminfo_read(fs_node_t* node, uint32_t offset, uint32_t size, uint8_t* buffer) {
+ (void)node;
+
+ char tmp[256];
+ uint32_t len = 0;
+
+ /* Count processes */
+ uint32_t nprocs = 0;
+ if (ready_queue_head) {
+ struct process* it = ready_queue_head;
+ const struct process* start = it;
+ do {
+ nprocs++;
+ it = it->next;
+ } while (it && it != start);
+ }
+
+ len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "Processes:\t", nprocs);
+ len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "TickCount:\t", get_tick_count());
+
+ if (offset >= len) return 0;
+ uint32_t avail = len - offset;
+ if (size > avail) size = avail;
+ memcpy(buffer, tmp + offset, size);
+ return size;
+}
+
+static fs_node_t* proc_self_finddir(fs_node_t* node, const char* name) {
+ (void)node;
+ if (strcmp(name, "status") == 0) return &g_proc_self_status;
+ return NULL;
+}
+
+static int proc_self_readdir(fs_node_t* node, uint32_t* inout_index, void* buf, uint32_t buf_len) {
+ (void)node;
+ if (!inout_index || !buf) return -1;
+ if (buf_len < sizeof(struct vfs_dirent)) return -1;
+
+ static const char* entries[] = { "status" };
+ uint32_t idx = *inout_index;
+ if (idx >= 1) return 0;
+
+ struct vfs_dirent* d = (struct vfs_dirent*)buf;
+ d->d_ino = 100 + idx;
+ d->d_type = 0;
+ d->d_reclen = sizeof(struct vfs_dirent);
+ {
+ const char* s = entries[idx];
+ uint32_t j = 0;
+ while (s[j] && j + 1 < sizeof(d->d_name)) { d->d_name[j] = s[j]; j++; }
+ d->d_name[j] = 0;
+ }
+ *inout_index = idx + 1;
+ return (int)sizeof(struct vfs_dirent);
+}
+
+static fs_node_t* proc_root_finddir(fs_node_t* node, const char* name) {
+ (void)node;
+ if (strcmp(name, "self") == 0) return &g_proc_self;
+ if (strcmp(name, "uptime") == 0) return &g_proc_uptime;
+ if (strcmp(name, "meminfo") == 0) return &g_proc_meminfo;
+ return NULL;
+}
+
+static int proc_root_readdir(fs_node_t* node, uint32_t* inout_index, void* buf, uint32_t buf_len) {
+ (void)node;
+ if (!inout_index || !buf) return -1;
+ if (buf_len < sizeof(struct vfs_dirent)) return -1;
+
+ static const char* entries[] = { "self", "uptime", "meminfo" };
+ uint32_t idx = *inout_index;
+ if (idx >= 3) return 0;
+
+ struct vfs_dirent* d = (struct vfs_dirent*)buf;
+ d->d_ino = 200 + idx;
+ d->d_type = (idx == 0) ? 2 : 0;
+ d->d_reclen = sizeof(struct vfs_dirent);
+ {
+ const char* s = entries[idx];
+ uint32_t j = 0;
+ while (s[j] && j + 1 < sizeof(d->d_name)) { d->d_name[j] = s[j]; j++; }
+ d->d_name[j] = 0;
+ }
+ *inout_index = idx + 1;
+ return (int)sizeof(struct vfs_dirent);
+}
+
+fs_node_t* procfs_create_root(void) {
+ memset(&g_proc_root, 0, sizeof(g_proc_root));
+ strcpy(g_proc_root.name, "proc");
+ g_proc_root.flags = FS_DIRECTORY;
+ g_proc_root.finddir = proc_root_finddir;
+ g_proc_root.readdir = proc_root_readdir;
+
+ memset(&g_proc_self, 0, sizeof(g_proc_self));
+ strcpy(g_proc_self.name, "self");
+ g_proc_self.flags = FS_DIRECTORY;
+ g_proc_self.finddir = proc_self_finddir;
+ g_proc_self.readdir = proc_self_readdir;
+
+ memset(&g_proc_self_status, 0, sizeof(g_proc_self_status));
+ strcpy(g_proc_self_status.name, "status");
+ g_proc_self_status.flags = FS_FILE;
+ g_proc_self_status.read = proc_self_status_read;
+
+ memset(&g_proc_uptime, 0, sizeof(g_proc_uptime));
+ strcpy(g_proc_uptime.name, "uptime");
+ g_proc_uptime.flags = FS_FILE;
+ g_proc_uptime.read = proc_uptime_read;
+
+ memset(&g_proc_meminfo, 0, sizeof(g_proc_meminfo));
+ strcpy(g_proc_meminfo.name, "meminfo");
+ g_proc_meminfo.flags = FS_FILE;
+ g_proc_meminfo.read = proc_meminfo_read;
+
+ return &g_proc_root;
+}
int jc = pty_jobctl_write_check();
if (jc < 0) return jc;
+ enum { SIGINT_NUM = 2, SIGQUIT_NUM = 3, SIGTSTP_NUM = 20 };
+
+ const uint8_t* bytes = (const uint8_t*)kbuf;
+ for (uint32_t i = 0; i < len; i++) {
+ uint8_t ch = bytes[i];
+ int sig = 0;
+ if (ch == 0x03) sig = SIGINT_NUM;
+ else if (ch == 0x1C) sig = SIGQUIT_NUM;
+ else if (ch == 0x1A) sig = SIGTSTP_NUM;
+ if (sig && pty_fg_pgrp != 0) {
+ process_kill_pgrp(pty_fg_pgrp, sig);
+ }
+ }
+
uintptr_t flags = spin_lock_irqsave(&pty_lock);
uint32_t free = rb_free(m2s_head, m2s_tail);
uint32_t to_write = len;
static spinlock_t sched_lock = {0};
static uintptr_t kernel_as = 0;
+/* ---------- O(1) runqueue ---------- */
+struct prio_queue {
+ struct process* head;
+ struct process* tail;
+};
+
+struct runqueue {
+ uint32_t bitmap; // bit i set => queue[i] non-empty
+ struct prio_queue queue[SCHED_NUM_PRIOS];
+};
+
+static struct runqueue rq_active_store;
+static struct runqueue rq_expired_store;
+static struct runqueue* rq_active = &rq_active_store;
+static struct runqueue* rq_expired = &rq_expired_store;
+
+static inline uint32_t bsf32(uint32_t v) {
+ uint32_t r;
+ __asm__ volatile("bsf %1, %0" : "=r"(r) : "rm"(v) : "cc");
+ return r;
+}
+
+static void rq_enqueue(struct runqueue* rq, struct process* p) {
+ uint8_t prio = p->priority;
+ struct prio_queue* pq = &rq->queue[prio];
+ p->rq_next = NULL;
+ p->rq_prev = pq->tail;
+ if (pq->tail) pq->tail->rq_next = p;
+ else pq->head = p;
+ pq->tail = p;
+ rq->bitmap |= (1U << prio);
+}
+
+static void rq_dequeue(struct runqueue* rq, struct process* p) {
+ uint8_t prio = p->priority;
+ struct prio_queue* pq = &rq->queue[prio];
+ if (p->rq_prev) p->rq_prev->rq_next = p->rq_next;
+ else pq->head = p->rq_next;
+ if (p->rq_next) p->rq_next->rq_prev = p->rq_prev;
+ else pq->tail = p->rq_prev;
+ p->rq_next = NULL;
+ p->rq_prev = NULL;
+ if (!pq->head) rq->bitmap &= ~(1U << prio);
+}
+
+static struct process* rq_pick_next(void) {
+ if (rq_active->bitmap) {
+ uint32_t prio = bsf32(rq_active->bitmap);
+ return rq_active->queue[prio].head;
+ }
+ // Swap active <-> expired
+ struct runqueue* tmp = rq_active;
+ rq_active = rq_expired;
+ rq_expired = tmp;
+ if (rq_active->bitmap) {
+ uint32_t prio = bsf32(rq_active->bitmap);
+ return rq_active->queue[prio].head;
+ }
+ return NULL; // only idle task left
+}
+
void thread_wrapper(void (*fn)(void));
static struct process* process_find_locked(uint32_t pid) {
parent->wait_result_pid = (int)p->pid;
parent->wait_result_status = p->exit_status;
parent->state = PROCESS_READY;
+ rq_enqueue(rq_active, parent);
}
}
}
p->sig_pending_mask |= (1U << (uint32_t)sig);
if (p->state == PROCESS_BLOCKED || p->state == PROCESS_SLEEPING) {
p->state = PROCESS_READY;
+ rq_enqueue(rq_active, p);
}
}
return 0;
}
+int process_kill_pgrp(uint32_t pgrp, int sig) {
+ if (pgrp == 0) return -EINVAL;
+ if (sig <= 0 || sig >= PROCESS_MAX_SIG) return -EINVAL;
+
+ uintptr_t flags = spin_lock_irqsave(&sched_lock);
+ int found = 0;
+
+ struct process* it = ready_queue_head;
+ if (it) {
+ const struct process* const start = it;
+ do {
+ if (it->pgrp_id == pgrp && it->pid != 0 && it->state != PROCESS_ZOMBIE) {
+ it->sig_pending_mask |= (1U << (uint32_t)sig);
+ if (it->state == PROCESS_BLOCKED || it->state == PROCESS_SLEEPING) {
+ it->state = PROCESS_READY;
+ rq_enqueue(rq_active, it);
+ }
+ found = 1;
+ }
+ it = it->next;
+ } while (it && it != start);
+ }
+
+ spin_unlock_irqrestore(&sched_lock, flags);
+ return found ? 0 : -ESRCH;
+}
+
int process_waitpid(int pid, int* status_out, uint32_t options) {
if (!current_process) return -ECHILD;
parent->wait_result_pid = (int)current_process->pid;
parent->wait_result_status = status;
parent->state = PROCESS_READY;
+ rq_enqueue(rq_active, parent);
}
}
}
proc->parent_pid = current_process ? current_process->pid : 0;
proc->session_id = current_process ? current_process->session_id : proc->pid;
proc->pgrp_id = current_process ? current_process->pgrp_id : proc->pid;
+ proc->priority = current_process ? current_process->priority : SCHED_DEFAULT_PRIO;
+ proc->nice = current_process ? current_process->nice : 0;
proc->state = PROCESS_READY;
proc->addr_space = child_as;
proc->wake_at_tick = 0;
ready_queue_head->prev = proc;
ready_queue_tail = proc;
+ rq_enqueue(rq_active, proc);
+
spin_unlock_irqrestore(&sched_lock, flags);
return proc;
}
}
memset(kernel_proc, 0, sizeof(*kernel_proc));
-
+
+ memset(&rq_active_store, 0, sizeof(rq_active_store));
+ memset(&rq_expired_store, 0, sizeof(rq_expired_store));
+
kernel_proc->pid = 0;
kernel_proc->parent_pid = 0;
kernel_proc->session_id = 0;
kernel_proc->pgrp_id = 0;
+ kernel_proc->priority = SCHED_NUM_PRIOS - 1; // idle = lowest priority
+ kernel_proc->nice = 19;
kernel_proc->state = PROCESS_RUNNING;
kernel_proc->wake_at_tick = 0;
kernel_proc->addr_space = hal_cpu_get_address_space();
proc->parent_pid = current_process ? current_process->pid : 0;
proc->session_id = current_process ? current_process->session_id : proc->pid;
proc->pgrp_id = current_process ? current_process->pgrp_id : proc->pid;
+ proc->priority = SCHED_DEFAULT_PRIO;
+ proc->nice = 0;
proc->state = PROCESS_READY;
proc->addr_space = kernel_as ? kernel_as : (current_process ? current_process->addr_space : 0);
proc->wake_at_tick = 0;
ready_queue_head->prev = proc;
ready_queue_tail = proc;
+ rq_enqueue(rq_active, proc);
+
spin_unlock_irqrestore(&sched_lock, flags);
return proc;
}
-// Find next READY process
+// Find next READY process — O(1) via bitmap
struct process* get_next_ready_process(void) {
- if (!current_process) return NULL;
- if (!current_process->next) return current_process;
-
- struct process* iterator = current_process->next;
+ struct process* next = rq_pick_next();
+ if (next) return next;
- // Scan the full circular list for a READY process.
- while (iterator && iterator != current_process) {
- if (iterator->state == PROCESS_READY) {
- return iterator;
- }
- iterator = iterator->next;
- }
-
- // If current is ready/running, return it.
- if (current_process->state == PROCESS_RUNNING || current_process->state == PROCESS_READY)
- return current_process;
-
- // If EVERYONE is sleeping, we must return the IDLE task (PID 0)
- // Assuming PID 0 is always in the list.
- // Search specifically for PID 0
- iterator = current_process->next;
- while (iterator && iterator->pid != 0) {
- iterator = iterator->next;
- if (iterator == current_process) break; // Should not happen
- }
- return iterator ? iterator : current_process;
+ // Fallback: idle task (PID 0)
+ if (current_process && current_process->pid == 0) return current_process;
+ struct process* it = ready_queue_head;
+ if (!it) return current_process;
+ const struct process* start = it;
+ do {
+ if (it->pid == 0) return it;
+ it = it->next;
+ } while (it && it != start);
+ return current_process;
}
void schedule(void) {
}
struct process* prev = current_process;
- struct process* next = get_next_ready_process();
-
- if (prev == next) {
- spin_unlock_irqrestore(&sched_lock, irq_flags);
- return;
- }
- // Only change state to READY if it was RUNNING.
- // If it was SLEEPING/BLOCKED, leave it as is.
+ // Put prev back into expired runqueue if it's still runnable.
if (prev->state == PROCESS_RUNNING) {
prev->state = PROCESS_READY;
+ rq_enqueue(rq_expired, prev);
+ }
+
+ // Pick highest-priority READY process (may swap active/expired).
+ struct process* next = get_next_ready_process();
+
+ if (next) {
+ // next is in rq_active (possibly after swap) — remove it.
+ rq_dequeue(rq_active, next);
+ }
+
+ if (!next) {
+ // Nothing in runqueues. If prev is still runnable, keep it.
+ if (prev->state == PROCESS_READY) {
+ rq_dequeue(rq_expired, prev);
+ next = prev;
+ } else {
+ // Fall back to idle (PID 0).
+ struct process* it = ready_queue_head;
+ next = it;
+ if (it) {
+ const struct process* start = it;
+ do {
+ if (it->pid == 0) { next = it; break; }
+ it = it->next;
+ } while (it && it != start);
+ }
+ }
+ }
+
+ if (prev == next) {
+ prev->state = PROCESS_RUNNING;
+ spin_unlock_irqrestore(&sched_lock, irq_flags);
+ return;
}
current_process = next;
hal_cpu_set_address_space(current_process->addr_space);
}
- // For ring3->ring0 transitions, esp0 must point to the top of the kernel stack.
if (current_process->kernel_stack) {
hal_cpu_set_kernel_stack((uintptr_t)current_process->kernel_stack + 4096);
}
context_switch(&prev->sp, current_process->sp);
- // Do not restore the old IF state after switching stacks.
- // The previous context may have entered schedule() with IF=0 (e.g. syscall/ISR),
- // and propagating that would prevent timer/keyboard IRQs from firing.
hal_cpu_enable_interrupts();
}
if (iter->state == PROCESS_SLEEPING) {
if (current_tick >= iter->wake_at_tick) {
iter->state = PROCESS_READY;
- // uart_print("Woke up PID ");
+ rq_enqueue(rq_active, iter);
}
}
iter = iter->next;
#include "elf.h"
#include "stat.h"
#include "vmm.h"
+#include "pmm.h"
+#include "timer.h"
#include "hal/cpu.h"
enum {
O_NONBLOCK = 0x800,
+ O_CLOEXEC = 0x80000,
};
enum {
+ FD_CLOEXEC = 1,
+};
+
+enum {
+ FCNTL_F_DUPFD = 0,
+ FCNTL_F_GETFD = 1,
+ FCNTL_F_SETFD = 2,
FCNTL_F_GETFL = 3,
FCNTL_F_SETFL = 4,
};
current_process->addr_space = src_as;
}
- uintptr_t child_as = vmm_as_clone_user(src_as);
+ uintptr_t child_as = vmm_as_clone_user_cow(src_as);
if (!child_as) return -ENOMEM;
struct registers child_regs = *regs;
return -ENOMEM;
}
+ child->heap_start = current_process->heap_start;
+ child->heap_break = current_process->heap_break;
+
for (int fd = 0; fd < PROCESS_MAX_FILES; fd++) {
struct file* f = current_process->files[fd];
if (!f) continue;
f->refcount++;
child->files[fd] = f;
+ child->fd_flags[fd] = current_process->fd_flags[fd];
}
return (int)child->pid;
if (!current_process) return -ECHILD;
if (kfds[0] >= 0 && kfds[0] < PROCESS_MAX_FILES && current_process->files[kfds[0]]) {
- current_process->files[kfds[0]]->flags = flags;
+ current_process->files[kfds[0]]->flags = flags & ~O_CLOEXEC;
}
if (kfds[1] >= 0 && kfds[1] < PROCESS_MAX_FILES && current_process->files[kfds[1]]) {
- current_process->files[kfds[1]]->flags = flags;
+ current_process->files[kfds[1]]->flags = flags & ~O_CLOEXEC;
+ }
+ if (flags & O_CLOEXEC) {
+ if (kfds[0] >= 0 && kfds[0] < PROCESS_MAX_FILES) current_process->fd_flags[kfds[0]] = FD_CLOEXEC;
+ if (kfds[1] >= 0 && kfds[1] < PROCESS_MAX_FILES) current_process->fd_flags[kfds[1]] = FD_CLOEXEC;
}
if (copy_to_user(user_fds, kfds, sizeof(kfds)) < 0) {
uintptr_t entry = 0;
uintptr_t user_sp = 0;
uintptr_t new_as = 0;
- if (elf32_load_user_from_initrd(path, &entry, &user_sp, &new_as) != 0) {
+ uintptr_t heap_brk = 0;
+ if (elf32_load_user_from_initrd(path, &entry, &user_sp, &new_as, &heap_brk) != 0) {
ret = -EINVAL;
goto out;
}
}
current_process->addr_space = new_as;
+ current_process->heap_start = heap_brk;
+ current_process->heap_break = heap_brk;
vmm_as_activate(new_as);
// Build a minimal initial user stack: argc, argv pointers, envp pointers, strings.
(void)argv_va;
(void)envp_va;
+ for (int i = 0; i < PROCESS_MAX_FILES; i++) {
+ if (current_process->fd_flags[i] & FD_CLOEXEC) {
+ (void)fd_close(i);
+ current_process->fd_flags[i] = 0;
+ }
+ }
+
if (old_as && old_as != new_as) {
vmm_as_destroy(old_as);
}
kfree(f);
return -EMFILE;
}
+ if ((flags & O_CLOEXEC) && current_process) {
+ current_process->fd_flags[fd] = FD_CLOEXEC;
+ }
return fd;
}
struct file* f = fd_get(fd);
if (!f) return -EBADF;
+ if (cmd == FCNTL_F_GETFD) {
+ if (!current_process) return 0;
+ return (int)current_process->fd_flags[fd];
+ }
+ if (cmd == FCNTL_F_SETFD) {
+ if (!current_process) return -EINVAL;
+ current_process->fd_flags[fd] = (uint8_t)(arg & FD_CLOEXEC);
+ return 0;
+ }
if (cmd == FCNTL_F_GETFL) {
return (int)f->flags;
}
if (cmd == FCNTL_F_SETFL) {
- // Minimal: allow toggling O_NONBLOCK only.
uint32_t keep = f->flags & ~O_NONBLOCK;
uint32_t set = arg & O_NONBLOCK;
f->flags = keep | set;
return 0;
}
+struct timespec {
+ uint32_t tv_sec;
+ uint32_t tv_nsec;
+};
+
+enum {
+ CLOCK_REALTIME = 0,
+ CLOCK_MONOTONIC = 1,
+};
+
+static int syscall_nanosleep_impl(const struct timespec* user_req, struct timespec* user_rem) {
+ if (!user_req) return -EFAULT;
+ if (user_range_ok(user_req, sizeof(struct timespec)) == 0) return -EFAULT;
+
+ struct timespec req;
+ if (copy_from_user(&req, user_req, sizeof(req)) < 0) return -EFAULT;
+
+ if (req.tv_nsec >= 1000000000U) return -EINVAL;
+
+ const uint32_t TICK_MS = 20;
+ uint32_t ms = req.tv_sec * 1000U + req.tv_nsec / 1000000U;
+ uint32_t ticks = (ms + TICK_MS - 1) / TICK_MS;
+ if (ticks == 0 && (req.tv_sec > 0 || req.tv_nsec > 0)) ticks = 1;
+
+ if (ticks > 0) {
+ process_sleep(ticks);
+ }
+
+ if (user_rem) {
+ if (user_range_ok(user_rem, sizeof(struct timespec)) != 0) {
+ struct timespec rem = {0, 0};
+ (void)copy_to_user(user_rem, &rem, sizeof(rem));
+ }
+ }
+
+ return 0;
+}
+
+static int syscall_clock_gettime_impl(uint32_t clk_id, struct timespec* user_tp) {
+ if (!user_tp) return -EFAULT;
+ if (user_range_ok(user_tp, sizeof(struct timespec)) == 0) return -EFAULT;
+
+ if (clk_id != CLOCK_REALTIME && clk_id != CLOCK_MONOTONIC) return -EINVAL;
+
+ uint32_t ticks = get_tick_count();
+ const uint32_t TICK_MS = 20;
+ uint32_t total_ms = ticks * TICK_MS;
+
+ struct timespec tp;
+ tp.tv_sec = total_ms / 1000U;
+ tp.tv_nsec = (total_ms % 1000U) * 1000000U;
+
+ if (copy_to_user(user_tp, &tp, sizeof(tp)) < 0) return -EFAULT;
+ return 0;
+}
+
+enum {
+ PROT_NONE = 0x0,
+ PROT_READ = 0x1,
+ PROT_WRITE = 0x2,
+ PROT_EXEC = 0x4,
+};
+
+enum {
+ MAP_SHARED = 0x01,
+ MAP_PRIVATE = 0x02,
+ MAP_FIXED = 0x10,
+ MAP_ANONYMOUS = 0x20,
+};
+
+static uintptr_t mmap_find_free(uint32_t length) {
+ if (!current_process) return 0;
+ const uintptr_t MMAP_BASE = 0x40000000U;
+ const uintptr_t MMAP_END = 0x7FF00000U;
+
+ for (uintptr_t candidate = MMAP_BASE; candidate + length <= MMAP_END; candidate += 0x1000U) {
+ int overlap = 0;
+ for (int i = 0; i < PROCESS_MAX_MMAPS; i++) {
+ if (current_process->mmaps[i].length == 0) continue;
+ uintptr_t mb = current_process->mmaps[i].base;
+ uint32_t ml = current_process->mmaps[i].length;
+ if (candidate < mb + ml && candidate + length > mb) {
+ overlap = 1;
+ candidate = ((mb + ml + 0xFFFU) & ~(uintptr_t)0xFFFU) - 0x1000U;
+ break;
+ }
+ }
+ if (!overlap) return candidate;
+ }
+ return 0;
+}
+
+static uintptr_t syscall_mmap_impl(uintptr_t addr, uint32_t length, uint32_t prot,
+ uint32_t flags, int fd, uint32_t offset) {
+ (void)offset;
+ if (!current_process) return (uintptr_t)-EINVAL;
+ if (length == 0) return (uintptr_t)-EINVAL;
+
+ if (!(flags & MAP_ANONYMOUS)) return (uintptr_t)-ENOSYS;
+ if (fd != -1) return (uintptr_t)-EINVAL;
+
+ uint32_t aligned_len = (length + 0xFFFU) & ~(uint32_t)0xFFFU;
+
+ uintptr_t base;
+ if (flags & MAP_FIXED) {
+ if (addr == 0 || (addr & 0xFFF)) return (uintptr_t)-EINVAL;
+ if (addr >= 0xC0000000U) return (uintptr_t)-EINVAL;
+ base = addr;
+ } else {
+ base = mmap_find_free(aligned_len);
+ if (!base) return (uintptr_t)-ENOMEM;
+ }
+
+ int slot = -1;
+ for (int i = 0; i < PROCESS_MAX_MMAPS; i++) {
+ if (current_process->mmaps[i].length == 0) { slot = i; break; }
+ }
+ if (slot < 0) return (uintptr_t)-ENOMEM;
+
+ uint32_t vmm_flags = VMM_FLAG_PRESENT | VMM_FLAG_USER;
+ if (prot & PROT_WRITE) vmm_flags |= VMM_FLAG_RW;
+
+ for (uintptr_t va = base; va < base + aligned_len; va += 0x1000U) {
+ void* frame = pmm_alloc_page();
+ if (!frame) return (uintptr_t)-ENOMEM;
+ vmm_map_page((uint64_t)(uintptr_t)frame, (uint64_t)va, vmm_flags);
+ memset((void*)va, 0, 0x1000U);
+ }
+
+ current_process->mmaps[slot].base = base;
+ current_process->mmaps[slot].length = aligned_len;
+
+ return base;
+}
+
+static int syscall_munmap_impl(uintptr_t addr, uint32_t length) {
+ if (!current_process) return -EINVAL;
+ if (addr == 0 || (addr & 0xFFF)) return -EINVAL;
+ if (length == 0) return -EINVAL;
+
+ uint32_t aligned_len = (length + 0xFFFU) & ~(uint32_t)0xFFFU;
+
+ int found = -1;
+ for (int i = 0; i < PROCESS_MAX_MMAPS; i++) {
+ if (current_process->mmaps[i].base == addr &&
+ current_process->mmaps[i].length == aligned_len) {
+ found = i;
+ break;
+ }
+ }
+ if (found < 0) return -EINVAL;
+
+ for (uintptr_t va = addr; va < addr + aligned_len; va += 0x1000U) {
+ vmm_unmap_page((uint64_t)va);
+ }
+
+ current_process->mmaps[found].base = 0;
+ current_process->mmaps[found].length = 0;
+ return 0;
+}
+
+static uintptr_t syscall_brk_impl(uintptr_t addr) {
+ if (!current_process) return 0;
+
+ if (addr == 0) {
+ return current_process->heap_break;
+ }
+
+ const uintptr_t X86_KERN_BASE = 0xC0000000U;
+ const uintptr_t USER_STACK_BASE = 0x00800000U;
+
+ if (addr < current_process->heap_start) return current_process->heap_break;
+ if (addr >= USER_STACK_BASE) return current_process->heap_break;
+ if (addr >= X86_KERN_BASE) return current_process->heap_break;
+
+ uintptr_t old_brk = current_process->heap_break;
+ uintptr_t new_brk = (addr + 0xFFFU) & ~(uintptr_t)0xFFFU;
+ uintptr_t old_brk_page = (old_brk + 0xFFFU) & ~(uintptr_t)0xFFFU;
+
+ if (new_brk > old_brk_page) {
+ for (uintptr_t va = old_brk_page; va < new_brk; va += 0x1000U) {
+ void* frame = pmm_alloc_page();
+ if (!frame) {
+ return current_process->heap_break;
+ }
+ vmm_as_map_page(current_process->addr_space,
+ (uint64_t)(uintptr_t)frame, (uint64_t)va,
+ VMM_FLAG_PRESENT | VMM_FLAG_RW | VMM_FLAG_USER);
+ memset((void*)va, 0, 0x1000U);
+ }
+ } else if (new_brk < old_brk_page) {
+ for (uintptr_t va = new_brk; va < old_brk_page; va += 0x1000U) {
+ vmm_unmap_page((uint64_t)va);
+ }
+ }
+
+ current_process->heap_break = addr;
+ return addr;
+}
+
static void syscall_handler(struct registers* regs) {
uint32_t syscall_no = regs->eax;
return;
}
+ if (syscall_no == SYSCALL_BRK) {
+ uintptr_t addr = (uintptr_t)regs->ebx;
+ regs->eax = (uint32_t)syscall_brk_impl(addr);
+ return;
+ }
+
+ if (syscall_no == SYSCALL_NANOSLEEP) {
+ const struct timespec* req = (const struct timespec*)regs->ebx;
+ struct timespec* rem = (struct timespec*)regs->ecx;
+ regs->eax = (uint32_t)syscall_nanosleep_impl(req, rem);
+ return;
+ }
+
+ if (syscall_no == SYSCALL_CLOCK_GETTIME) {
+ uint32_t clk_id = regs->ebx;
+ struct timespec* tp = (struct timespec*)regs->ecx;
+ regs->eax = (uint32_t)syscall_clock_gettime_impl(clk_id, tp);
+ return;
+ }
+
+ if (syscall_no == SYSCALL_MMAP) {
+ uintptr_t addr = (uintptr_t)regs->ebx;
+ uint32_t length = regs->ecx;
+ uint32_t prot = regs->edx;
+ uint32_t mflags = regs->esi;
+ int fd = (int)regs->edi;
+ regs->eax = (uint32_t)syscall_mmap_impl(addr, length, prot, mflags, fd, 0);
+ return;
+ }
+
+ if (syscall_no == SYSCALL_MUNMAP) {
+ uintptr_t addr = (uintptr_t)regs->ebx;
+ uint32_t length = regs->ecx;
+ regs->eax = (uint32_t)syscall_munmap_impl(addr, length);
+ return;
+ }
+
regs->eax = (uint32_t)-ENOSYS;
}
static uint32_t waitq_head = 0;
static uint32_t waitq_tail = 0;
-static uint32_t tty_lflag = TTY_ICANON | TTY_ECHO;
+static uint32_t tty_lflag = TTY_ICANON | TTY_ECHO | TTY_ISIG;
+
+static struct winsize tty_winsize = { 24, 80, 0, 0 };
static uint32_t tty_session_id = 0;
static uint32_t tty_fg_pgrp = 0;
TTY_TCSETS = 0x5402,
TTY_TIOCGPGRP = 0x540F,
TTY_TIOCSPGRP = 0x5410,
+ TTY_TIOCGWINSZ = 0x5413,
+ TTY_TIOCSWINSZ = 0x5414,
};
int tty_ioctl(uint32_t cmd, void* user_arg) {
if (cmd == TTY_TCGETS) {
struct termios t;
+ memset(&t, 0, sizeof(t));
uintptr_t flags = spin_lock_irqsave(&tty_lock);
t.c_lflag = tty_lflag;
spin_unlock_irqrestore(&tty_lock, flags);
struct termios t;
if (copy_from_user(&t, user_arg, sizeof(t)) < 0) return -EFAULT;
uintptr_t flags = spin_lock_irqsave(&tty_lock);
- tty_lflag = t.c_lflag & (TTY_ICANON | TTY_ECHO);
+ tty_lflag = t.c_lflag & (TTY_ICANON | TTY_ECHO | TTY_ISIG);
spin_unlock_irqrestore(&tty_lock, flags);
return 0;
}
+ if (cmd == TTY_TIOCGWINSZ) {
+ if (user_range_ok(user_arg, sizeof(struct winsize)) == 0) return -EFAULT;
+ if (copy_to_user(user_arg, &tty_winsize, sizeof(tty_winsize)) < 0) return -EFAULT;
+ return 0;
+ }
+
+ if (cmd == TTY_TIOCSWINSZ) {
+ if (user_range_ok(user_arg, sizeof(struct winsize)) == 0) return -EFAULT;
+ if (copy_from_user(&tty_winsize, user_arg, sizeof(tty_winsize)) < 0) return -EFAULT;
+ return 0;
+ }
+
return -EINVAL;
}
uintptr_t flags = spin_lock_irqsave(&tty_lock);
uint32_t lflag = tty_lflag;
+ enum { SIGINT_NUM = 2, SIGQUIT_NUM = 3, SIGTSTP_NUM = 20 };
+
+ if (lflag & TTY_ISIG) {
+ if (c == 0x03) {
+ spin_unlock_irqrestore(&tty_lock, flags);
+ if (lflag & TTY_ECHO) {
+ uart_print("^C\n");
+ }
+ if (tty_fg_pgrp != 0) {
+ process_kill_pgrp(tty_fg_pgrp, SIGINT_NUM);
+ }
+ return;
+ }
+
+ if (c == 0x1C) {
+ spin_unlock_irqrestore(&tty_lock, flags);
+ if (lflag & TTY_ECHO) {
+ uart_print("^\\\n");
+ }
+ if (tty_fg_pgrp != 0) {
+ process_kill_pgrp(tty_fg_pgrp, SIGQUIT_NUM);
+ }
+ return;
+ }
+
+ if (c == 0x1A) {
+ spin_unlock_irqrestore(&tty_lock, flags);
+ if (lflag & TTY_ECHO) {
+ uart_print("^Z\n");
+ }
+ if (tty_fg_pgrp != 0) {
+ process_kill_pgrp(tty_fg_pgrp, SIGTSTP_NUM);
+ }
+ return;
+ }
+ }
+
+ if (c == 0x04 && (lflag & TTY_ICANON)) {
+ if (lflag & TTY_ECHO) {
+ uart_print("^D");
+ }
+ for (uint32_t i = 0; i < line_len; i++) {
+ canon_push(line_buf[i]);
+ }
+ line_len = 0;
+ tty_wake_one();
+ spin_unlock_irqrestore(&tty_lock, flags);
+ return;
+ }
+
if ((lflag & TTY_ICANON) == 0) {
if (c == '\r') c = '\n';
canon_push(c);
#define BITMAP_SIZE (MAX_RAM_SIZE / PAGE_SIZE / 8)
static uint8_t memory_bitmap[BITMAP_SIZE];
+static uint16_t frame_refcount[MAX_RAM_SIZE / PAGE_SIZE];
static uint64_t total_memory = 0;
static uint64_t used_memory = 0;
static uint64_t max_frames = 0;
if (!bitmap_test(i)) {
bitmap_set(i);
+ frame_refcount[i] = 1;
used_memory += PAGE_SIZE;
last_alloc_frame = i + 1;
if (last_alloc_frame >= max_frames) last_alloc_frame = 1;
void pmm_free_page(void* ptr) {
uintptr_t addr = (uintptr_t)ptr;
uint64_t frame = addr / PAGE_SIZE;
+ if (frame == 0 || frame >= max_frames) return;
+
+ uint16_t rc = frame_refcount[frame];
+ if (rc > 1) {
+ __sync_sub_and_fetch(&frame_refcount[frame], 1);
+ return;
+ }
+
+ frame_refcount[frame] = 0;
bitmap_unset(frame);
used_memory -= PAGE_SIZE;
}
+
+void pmm_incref(uintptr_t paddr) {
+ uint64_t frame = paddr / PAGE_SIZE;
+ if (frame == 0 || frame >= max_frames) return;
+ __sync_fetch_and_add(&frame_refcount[frame], 1);
+}
+
+uint16_t pmm_decref(uintptr_t paddr) {
+ uint64_t frame = paddr / PAGE_SIZE;
+ if (frame == 0 || frame >= max_frames) return 0;
+ uint16_t new_val = __sync_sub_and_fetch(&frame_refcount[frame], 1);
+ if (new_val == 0) {
+ bitmap_unset(frame);
+ used_memory -= PAGE_SIZE;
+ }
+ return new_val;
+}
+
+uint16_t pmm_get_refcount(uintptr_t paddr) {
+ uint64_t frame = paddr / PAGE_SIZE;
+ if (frame >= max_frames) return 0;
+ return frame_refcount[frame];
+}
--- /dev/null
+#include "slab.h"
+#include "pmm.h"
+#include "uart_console.h"
+
+#include <stddef.h>
+
+struct slab_free_node {
+ struct slab_free_node* next;
+};
+
+void slab_cache_init(slab_cache_t* cache, const char* name, uint32_t obj_size) {
+ if (!cache) return;
+ cache->name = name;
+ if (obj_size < sizeof(struct slab_free_node)) {
+ obj_size = sizeof(struct slab_free_node);
+ }
+ cache->obj_size = (obj_size + 7U) & ~7U;
+ cache->objs_per_slab = PAGE_SIZE / cache->obj_size;
+ cache->free_list = NULL;
+ cache->total_allocs = 0;
+ cache->total_frees = 0;
+ spinlock_init(&cache->lock);
+}
+
+static int slab_grow(slab_cache_t* cache) {
+ void* page = pmm_alloc_page();
+ if (!page) return -1;
+
+ uint8_t* base = (uint8_t*)(uintptr_t)page;
+
+ /* In higher-half kernel the physical page needs to be accessible.
+ * For simplicity we assume the kernel heap region or identity-mapped
+ * low memory is used. We map via the kernel virtual address. */
+ /* TODO: For pages above 4MB, a proper kernel mapping is needed.
+ * For now, slab pages come from pmm_alloc_page which returns
+ * physical addresses. We need to convert to virtual. */
+
+ /* Use kernel virtual = phys + 0xC0000000 for higher-half */
+ uint8_t* vbase = base + 0xC0000000U;
+
+ for (uint32_t i = 0; i < cache->objs_per_slab; i++) {
+ struct slab_free_node* node = (struct slab_free_node*)(vbase + i * cache->obj_size);
+ node->next = (struct slab_free_node*)cache->free_list;
+ cache->free_list = node;
+ }
+
+ return 0;
+}
+
+void* slab_alloc(slab_cache_t* cache) {
+ if (!cache) return NULL;
+
+ uintptr_t flags = spin_lock_irqsave(&cache->lock);
+
+ if (!cache->free_list) {
+ if (slab_grow(cache) < 0) {
+ spin_unlock_irqrestore(&cache->lock, flags);
+ return NULL;
+ }
+ }
+
+ struct slab_free_node* node = (struct slab_free_node*)cache->free_list;
+ cache->free_list = node->next;
+ cache->total_allocs++;
+
+ spin_unlock_irqrestore(&cache->lock, flags);
+
+ return (void*)node;
+}
+
+void slab_free(slab_cache_t* cache, void* obj) {
+ if (!cache || !obj) return;
+
+ uintptr_t flags = spin_lock_irqsave(&cache->lock);
+
+ struct slab_free_node* node = (struct slab_free_node*)obj;
+ node->next = (struct slab_free_node*)cache->free_list;
+ cache->free_list = node;
+ cache->total_frees++;
+
+ spin_unlock_irqrestore(&cache->lock, flags);
+}