From: Tulio A M Mendes Date: Tue, 10 Feb 2026 03:42:56 +0000 (-0300) Subject: POSIX Tiers 1-3: signals, brk/sbrk, nanosleep, raw TTY, O_CLOEXEC, PMM refcount,... X-Git-Url: https://projects.tadryanom.me/docs/static/gitweb.css?a=commitdiff_plain;h=e9527b35d81f3e9caae4171af4c987cb3160c8fb;p=AdrOS.git POSIX Tiers 1-3: signals, brk/sbrk, nanosleep, raw TTY, O_CLOEXEC, PMM refcount, CoW fork, mmap/munmap, procfs, slab allocator, O(1) scheduler, PCI enumeration, VBE framebuffer Tier 1: - Signal characters (Ctrl+C/Z/\ -> SIGINT/SIGTSTP/SIGQUIT) in TTY/PTY - brk/sbrk syscall with ELF loader heap break support - nanosleep/clock_gettime syscalls (tick-based) - Raw TTY mode (non-canonical) + ISIG flag + TIOCGWINSZ/TIOCSWINSZ Tier 2: - O_CLOEXEC/FD_CLOEXEC: per-FD flags, fcntl F_GETFD/F_SETFD, close on execve - PMM ref-counting (uint16_t per-frame, atomic ops) - Copy-on-Write fork: PTE bit 9 COW marker, page fault handler, vmm_as_clone_user_cow - mmap/munmap: anonymous MAP_PRIVATE, per-process VMA tracker (32 slots) - /proc filesystem: /proc/self/status, /proc/uptime, /proc/meminfo Tier 3: - Slab allocator: free-list-in-place, per-cache spinlock, auto-grow - O(1) scheduler: 32-priority bitmap + active/expired runqueue swap - PCI enumeration: bus/slot/func scan, vendor/device/class/BAR/IRQ - VBE framebuffer: multiboot2 tag parsing, kernel VA mapping, pixel ops All changes pass: make clean && make, cppcheck --enable=warning,performance,portability, and QEMU smoke test (10s stable boot, all init tests pass). --- diff --git a/docs/SUPPLEMENTARY_ANALYSIS.md b/docs/SUPPLEMENTARY_ANALYSIS.md new file mode 100644 index 0000000..02d7ab0 --- /dev/null +++ b/docs/SUPPLEMENTARY_ANALYSIS.md @@ -0,0 +1,320 @@ +# AdrOS — Supplementary Material Analysis & POSIX Gap Report + +This document compares the **supplementary-material** reference code and suggestions +(from the AI monolog in `readme.txt` plus the `.c.txt`/`.S.txt` example files) with +the **current AdrOS implementation**, and assesses how close AdrOS is to being a +Unix-like, POSIX-compatible operating system. + +--- + +## Part 1 — Subsystem-by-Subsystem Comparison + +### 1.1 Physical Memory Manager (PMM) + +| Aspect | Supplementary Suggestion | AdrOS Current State | Gap | +|--------|--------------------------|---------------------|-----| +| Bitmap allocator | ✅ Bitmap-based | ✅ Bitmap-based (`src/mm/pmm.c`) | None | +| Multiboot memory map parsing | ✅ Parse MMAP entries | ✅ Full Multiboot2 MMAP parsing, clamping, fallback | None | +| Kernel/module protection | ✅ Reserve kernel + initrd | ✅ Protects kernel (`_start`–`_end`), modules, low 1MB | None | +| Frame reference counting | ✅ `uint16_t frame_ref_count[]` for CoW | ❌ Not implemented | **Critical for CoW fork** | +| Contiguous block allocation | ✅ `pmm_alloc_blocks(count)` for DMA | ❌ Only single-frame `pmm_alloc_page()` | Needed for DMA drivers | +| Atomic ref operations | ✅ `__sync_fetch_and_add` | ❌ N/A (no refcount) | Future | +| Spinlock protection | ✅ `spinlock_acquire(&pmm_lock)` | ❌ PMM has no lock (single-core safe only) | Needed for SMP | + +**Summary:** AdrOS PMM is solid for single-core use. Missing ref-counting (blocks CoW) and contiguous allocation (blocks DMA). + +--- + +### 1.2 Virtual Memory Manager (VMM) + +| Aspect | Supplementary Suggestion | AdrOS Current State | Gap | +|--------|--------------------------|---------------------|-----| +| Higher-half kernel | ✅ 0xC0000000 | ✅ Identical | None | +| Recursive page directory | Mentioned but not detailed | ✅ PDE[1023] self-map, `x86_pd_recursive()` | AdrOS is ahead | +| Per-process address spaces | ✅ Clone kernel PD | ✅ `vmm_as_create_kernel_clone()`, `vmm_as_clone_user()` | None | +| W^X logical policy | ✅ `vmm_apply_wx_policy()` rejects RWX | ✅ ELF loader maps `.text` as RO after load via `vmm_protect_range()` | Partial — no policy function, but effect achieved | +| W^X hardware (NX bit) | ✅ PAE + NX via EFER MSR | ❌ 32-bit paging, no PAE, no NX | Long-term | +| CPUID feature detection | ✅ `cpu_get_features()` for PAE/NX | ❌ Not implemented | Long-term | +| `vmm_find_free_area()` | ✅ Scan user VA space for holes | ❌ Not implemented | Needed for `mmap` | +| `vmm_map_dma_buffer()` | ✅ Map phys into user VA | ❌ Not implemented | Needed for zero-copy I/O | +| TLB flush | ✅ `invlpg` + full flush | ✅ `invlpg()` per page | None | +| Spinlock on VMM ops | ✅ `vmm_kernel_lock` | ❌ No lock | Needed for SMP | + +**Summary:** AdrOS VMM is functional and well-designed (recursive mapping is elegant). Missing hardware NX (requires PAE migration) and free-area search for `mmap`. + +--- + +### 1.3 Kernel Heap + +| Aspect | Supplementary Suggestion | AdrOS Current State | Gap | +|--------|--------------------------|---------------------|-----| +| Doubly-linked free list | Mentioned | ✅ `heap.c` with `HEAP_MAGIC` validation | None | +| Coalescing | Mentioned | ✅ Forward + backward coalesce (fixed in previous session) | None | +| Spinlock | ✅ Required | ✅ `heap_lock` spinlock present | None | +| Slab allocator | ✅ `slab_cache_t` for fixed-size objects | ❌ Not implemented | Medium priority | + +**Summary:** Heap works correctly. Slab allocator would improve performance for frequent small allocations (process structs, file descriptors). + +--- + +### 1.4 Process Scheduler + +| Aspect | Supplementary Suggestion | AdrOS Current State | Gap | +|--------|--------------------------|---------------------|-----| +| Process states | ✅ READY/RUNNING/SLEEPING/ZOMBIE | ✅ READY/RUNNING/ZOMBIE/BLOCKED/SLEEPING | AdrOS has more states | +| Round-robin scheduling | Baseline | ✅ Implemented in `scheduler.c` | None | +| O(1) scheduler (bitmap + active/expired) | ✅ Full implementation | ❌ Simple linked-list traversal | Enhancement | +| Priority queues (MLFQ) | ✅ 32 priority levels | ❌ No priority levels | Enhancement | +| Unix decay-based priority | ✅ `p_cpu` decay + `nice` | ❌ Not implemented | Enhancement | +| Per-CPU runqueues | ✅ `cpu_runqueue_t` per CPU | ❌ Single global queue | Needed for SMP | +| Sleep/wakeup (wait queues) | ✅ `sleep(chan, lock)` / `wakeup(chan)` | ✅ Process blocking via `PROCESS_BLOCKED` state + manual wake | Partial — no generic wait queue abstraction | +| Context switch (assembly) | ✅ Save/restore callee-saved + CR3 | ✅ `context_switch.S` saves/restores regs + CR3 | None | +| `fork()` | ✅ Slab + CoW + enqueue | ✅ `process_fork_create()` — full copy (no CoW) | CoW missing | +| `execve()` | ✅ Load ELF, reset stack | ✅ `syscall_execve_impl()` — loads ELF, handles argv/envp | None | +| Spinlock protection | ✅ `sched_lock` | ✅ `sched_lock` present | None | + +**Summary:** AdrOS scheduler is functional with all essential operations. The supplementary material suggests O(1)/MLFQ upgrades which are performance enhancements, not correctness issues. + +--- + +### 1.5 Signals + +| Aspect | Supplementary Suggestion | AdrOS Current State | Gap | +|--------|--------------------------|---------------------|-----| +| Signal bitmask (pending/blocked) | ✅ `uint32_t pending_signals` | ✅ `sig_pending_mask` + `sig_blocked_mask` | None | +| `sigaction` | ✅ Handler array | ✅ `sigactions[PROCESS_MAX_SIG]` | None | +| Signal trampoline | ✅ Build stack frame, redirect EIP | ✅ Full trampoline in `deliver_signals_to_usermode()` | None | +| `sigreturn` | ✅ Restore saved context | ✅ `syscall_sigreturn_impl()` with `SIGFRAME_MAGIC` | None | +| `SA_SIGINFO` | Mentioned | ✅ Supported (siginfo_t + ucontext_t on stack) | None | +| Signal restorer (userspace) | ✅ `sigrestorer.S` | ✅ Kernel injects trampoline code bytes on user stack | AdrOS approach is self-contained | + +**Summary:** AdrOS signal implementation is **complete and robust**. This is one of the strongest subsystems — ahead of what the supplementary material suggests. + +--- + +### 1.6 Virtual File System (VFS) + +| Aspect | Supplementary Suggestion | AdrOS Current State | Gap | +|--------|--------------------------|---------------------|-----| +| Mount table | ✅ Linked list of mount points | ✅ Up to 8 mounts, longest-prefix matching | None | +| `vfs_lookup` path resolution | ✅ Find mount + delegate to driver | ✅ Full path resolution with mount traversal | None | +| `fs_node_t` with ops | ✅ `vfs_ops_t` function pointers | ✅ `read`/`write`/`open`/`close`/`finddir`/`readdir` | None | +| File descriptor table | ✅ Per-process `fd_table[16]` | ✅ Per-process `files[PROCESS_MAX_FILES]` with refcount | None | +| File cursor (offset) | ✅ `cursor` field | ✅ `offset` in `struct file` | None | +| USTAR InitRD parser | ✅ Full implementation | ❌ Custom binary format (`mkinitrd`) | Different approach, both work | +| LZ4 decompression | ✅ Decompress initrd.tar.lz4 | ❌ Not implemented | Enhancement | +| `pivot_root` | ✅ `sys_pivot_root()` | ❌ Not implemented | Needed for real init flow | +| Multiple FS types | ✅ USTAR + FAT | ✅ tmpfs + devfs + overlayfs + diskfs + persistfs | **AdrOS is ahead** | +| `readdir` generic | Mentioned | ✅ All FS types implement `readdir` callback | None | + +**Summary:** AdrOS VFS is **more advanced** than the supplementary material suggests. It has 5 filesystem types, overlayfs, and generic readdir. The supplementary material's USTAR/LZ4 approach is an alternative InitRD strategy. + +--- + +### 1.7 TTY / PTY + +| Aspect | Supplementary Suggestion | AdrOS Current State | Gap | +|--------|--------------------------|---------------------|-----| +| Circular buffer for keyboard | ✅ Ring buffer + wait queue | ✅ Ring buffer in `tty.c` with blocking reads | None | +| `tty_push_char` from IRQ | ✅ IRQ1 handler → buffer | ✅ Keyboard IRQ → `tty_input_char()` | None | +| Canonical mode (line editing) | ✅ Buffer until Enter | ✅ Line-buffered with echo + backspace | None | +| PTY master/slave | Not discussed | ✅ Full PTY implementation with `/dev/ptmx` + `/dev/pts/0` | **AdrOS is ahead** | +| Job control (SIGTTIN/SIGTTOU) | Not discussed | ✅ `pty_jobctl_read_check()` / `pty_jobctl_write_check()` | **AdrOS is ahead** | +| `poll()` support | ✅ `tty_poll()` | ✅ `pty_master_can_read()` etc. integrated with `poll` | None | +| Raw mode | Not discussed | ❌ Not implemented | Needed for editors/games | + +**Summary:** AdrOS TTY/PTY is **significantly ahead** of the supplementary material. Full PTY with job control is a major achievement. + +--- + +### 1.8 Spinlocks & Synchronization + +| Aspect | Supplementary Suggestion | AdrOS Current State | Gap | +|--------|--------------------------|---------------------|-----| +| `xchg`-based spinlock | ✅ Inline asm `xchgl` | ✅ `__sync_lock_test_and_set` (generates `xchg`) | Equivalent | +| `pause` in spin loop | ✅ `__asm__ volatile("pause")` | ✅ Present in `spin_lock()` | None | +| IRQ save/restore | ✅ `pushcli`/`popcli` with nesting | ✅ `irq_save()`/`irq_restore()` via `pushf`/`popf` | None | +| `spin_lock_irqsave` | ✅ Combined lock + IRQ disable | ✅ `spin_lock_irqsave()` / `spin_unlock_irqrestore()` | None | +| Debug name field | ✅ `char *name` for panic messages | ❌ No name field | Minor | +| CPU ID tracking | ✅ `lock->cpu_id` for deadlock detection | ❌ Not tracked | Needed for SMP | +| Nesting counter (`ncli`) | ✅ Per-CPU nesting | ❌ Not implemented (flat save/restore) | Needed for SMP | + +**Summary:** AdrOS spinlocks are correct for single-core. The supplementary material's SMP-aware features (CPU tracking, nesting) are needed only when AdrOS targets multi-core. + +--- + +### 1.9 ELF Loader + +| Aspect | Supplementary Suggestion | AdrOS Current State | Gap | +|--------|--------------------------|---------------------|-----| +| Parse ELF headers | ✅ `Elf32_Ehdr` + `Elf32_Phdr` | ✅ Full validation + PT_LOAD processing | None | +| Map segments with correct flags | ✅ PF_W → WRITABLE, PF_X → EXECUTABLE | ✅ Maps with `VMM_FLAG_RW`, then `vmm_protect_range()` for .text | None | +| W^X enforcement | ✅ Policy in `vmm_map` | ✅ `.text` marked read-only after copy | Achieved differently | +| Reject kernel-range vaddrs | Not discussed | ✅ Rejects `p_vaddr >= 0xC0000000` | **AdrOS is ahead** | +| User stack allocation | ✅ Mentioned | ✅ Maps user stack at `0x00800000` | None | + +**Summary:** AdrOS ELF loader is **complete and secure** with proper validation and W^X enforcement. + +--- + +### 1.10 User-Space / libc + +| Aspect | Supplementary Suggestion | AdrOS Current State | Gap | +|--------|--------------------------|---------------------|-----| +| `crt0.S` (entry point) | ✅ `_start` → `main` → `exit` | ✅ `user/crt0.S` with argc/argv setup | None | +| Syscall stub (int 0x80) | ✅ `_syscall_invoke` via registers | ✅ `_syscall` in `user/syscall.S` | None | +| `SYSENTER` fast path | ✅ vDSO + MSR setup | ❌ Only `int 0x80` | Enhancement | +| libc wrappers | ✅ `syscalls.c` with errno | ❌ Raw syscall wrappers only, no errno | **Key gap** | +| `init.c` (early userspace) | ✅ mount + pivot_root + execve | ✅ `user/init.c` — comprehensive smoke tests | Different purpose | +| User linker script | ✅ `user.ld` at 0x08048000 | ✅ `user/user.ld` at 0x00400000 | Both valid | + +**Summary:** AdrOS has a working userspace with syscall stubs and a comprehensive test binary. Missing a proper libc and `SYSENTER` optimization. + +--- + +### 1.11 Drivers (Not Yet in AdrOS) + +| Driver | Supplementary Suggestion | AdrOS Current State | +|--------|--------------------------|---------------------| +| PCI enumeration | ✅ Full scan (bus/dev/func) | ❌ Not implemented | +| Intel E1000 NIC | ✅ RX/TX descriptor rings + DMA | ❌ Not implemented | +| VBE/Framebuffer | ✅ Map LFB + MTRR write-combining | ❌ VGA text mode only | +| Intel HDA Audio | ✅ DMA ring buffers | ❌ Not implemented | +| lwIP TCP/IP stack | ✅ `sys_arch.c` bridge | ❌ Not implemented | + +--- + +### 1.12 Advanced Features (Not Yet in AdrOS) + +| Feature | Supplementary Suggestion | AdrOS Current State | +|---------|--------------------------|---------------------| +| Copy-on-Write (CoW) fork | ✅ Full implementation with ref-counting | ❌ Full address space copy | +| Slab allocator | ✅ `slab_cache_t` with free-list-in-place | ❌ Only `kmalloc`/`kfree` | +| Shared memory (shmem/mmap) | ✅ `sys_shmget` / `sys_shmat` | ❌ Not implemented | +| Zero-copy DMA I/O | ✅ Map DMA buffer into user VA | ❌ Not implemented | +| vDSO | ✅ Kernel-mapped page with syscall code | ❌ Not implemented | + +--- + +## Part 2 — POSIX Compatibility Assessment + +### Overall Score: **~45% toward a practical Unix-like POSIX system** + +This score reflects that AdrOS has the **core architectural skeleton** of a Unix system +fully in place, but lacks several key POSIX interfaces and userland components needed +for real-world use. + +### What AdrOS Already Has (Strengths) + +1. **Process model** — `fork`, `execve`, `waitpid`, `exit`, `getpid`, `getppid`, `setsid`, `setpgid`, `getpgrp` — all working +2. **File I/O** — `open`, `read`, `write`, `close`, `lseek`, `stat`, `fstat`, `dup`, `dup2`, `dup3`, `pipe`, `pipe2`, `fcntl`, `getdents` — comprehensive +3. **Signals** — `sigaction`, `sigprocmask`, `kill`, `sigreturn` with full trampoline — robust +4. **VFS** — 5 filesystem types, mount table, path resolution, per-process cwd — excellent +5. **TTY/PTY** — Line discipline, job control, blocking I/O, `ioctl` — very good +6. **Select/Poll** — Working for pipes and TTY devices +7. **Memory isolation** — Per-process address spaces, user/kernel separation, `uaccess` validation +8. **ELF loading** — Secure loader with W^X enforcement +9. **Spinlocks** — Correct `xchg`-based implementation with IRQ save/restore + +### What's Missing for Practical POSIX (Gaps by Priority) + +#### Tier 1 — Blocks basic usability +| Gap | Impact | Effort | +|-----|--------|--------| +| **Minimal libc** (`printf`, `malloc`, `string.h`, `stdio.h`) | Can't build real userland programs | Medium | +| **Shell** (`sh`-compatible) | No interactive use without it | Medium | +| **Signal characters** (Ctrl+C → SIGINT, Ctrl+D → EOF) | Can't interrupt/control processes | Small | +| **`brk`/`sbrk`** (user heap) | No `malloc` in userspace | Small-Medium | +| **Core utilities** (`ls`, `cat`, `echo`, `mkdir`, `rm`) | No file management | Medium | + +#### Tier 2 — Required for POSIX compliance +| Gap | Impact | Effort | +|-----|--------|--------| +| **`mmap`/`munmap`** | No memory-mapped files, no shared memory | Medium-Large | +| **`O_CLOEXEC`** | FD leaks across `execve` | Small | +| **Permissions** (`uid`/`gid`/mode/`chmod`/`chown`) | No multi-user security | Medium | +| **Hard/symbolic links** | Incomplete filesystem semantics | Medium | +| **`/proc` filesystem** | No process introspection | Medium | +| **`nanosleep`/`clock_gettime`** | No time management | Small | +| **Raw TTY mode** | Can't run editors or games | Small | + +#### Tier 3 — Full Unix experience +| Gap | Impact | Effort | +|-----|--------|--------| +| **CoW fork** | Memory waste on fork-heavy workloads | Large | +| **PAE + NX bit** | No hardware W^X enforcement | Large | +| **Slab allocator** | Performance for frequent small allocs | Medium | +| **Networking** (socket API + TCP/IP) | No network connectivity | Very Large | +| **Threads** (`clone`/`pthread`) | No multi-threaded programs | Large | +| **Dynamic linking** (`ld.so`) | Can't use shared libraries | Very Large | +| **VBE framebuffer** | No graphical output | Medium | +| **PCI + device drivers** | No hardware discovery | Large | + +--- + +## Part 3 — Architectural Comparison Summary + +| Dimension | Supplementary Material | AdrOS Current | Verdict | +|-----------|----------------------|---------------|---------| +| **Boot flow** | GRUB → Stub (LZ4) → Kernel → USTAR InitRD | GRUB → Kernel → Custom InitRD → OverlayFS | Both valid; AdrOS is simpler | +| **Memory architecture** | PMM + Slab + CoW + Zero-Copy DMA | PMM + Heap (linked list) | Supplementary is more advanced | +| **Scheduler** | O(1) with bitmap + active/expired arrays | Round-robin with linked list | Supplementary is more advanced | +| **VFS** | USTAR + FAT (planned) | tmpfs + devfs + overlayfs + diskfs + persistfs | **AdrOS is more advanced** | +| **Syscall interface** | int 0x80 + SYSENTER + vDSO | int 0x80 only | Supplementary has more optimization | +| **Signal handling** | Basic trampoline concept | Full SA_SIGINFO + sigreturn + sigframe | **AdrOS is more advanced** | +| **TTY/PTY** | Basic circular buffer | Full PTY with job control | **AdrOS is more advanced** | +| **Synchronization** | SMP-aware spinlocks with CPU tracking | Single-core spinlocks with IRQ save | Supplementary targets SMP | +| **Userland** | libc stubs + init + shell concept | Raw syscall wrappers + test binary | Both early-stage | +| **Drivers** | PCI + E1000 + VBE + HDA (conceptual) | UART + VGA text + PS/2 + ATA PIO | Supplementary has more scope | + +--- + +## Part 4 — Recommendations + +### Immediate Actions (use supplementary material as inspiration) + +1. **Add signal characters to TTY** — Ctrl+C/Ctrl+Z/Ctrl+D handling in `tty_input_char()`. Small change, huge usability gain. + +2. **Implement `brk`/`sbrk` syscall** — Track a per-process heap break pointer. Essential for userland `malloc`. + +3. **Build minimal libc** — Start with `write`-based `printf`, `brk`-based `malloc`, `string.h`. The supplementary `syscalls.c.txt` and `unistd.c.txt` show the pattern. + +4. **Build a shell** — All required syscalls (`fork`+`execve`+`waitpid`+`pipe`+`dup2`+`chdir`) are already implemented. + +### Medium-Term (architectural improvements from supplementary material) + +5. **PMM ref-counting** — Add `uint16_t` ref-count array alongside bitmap. Prerequisite for CoW. + +6. **CoW fork** — Use PTE bit 9 as CoW marker, handle in page fault. The supplementary material's `vmm_copy_for_fork()` pattern is clean. + +7. **W^X policy function** — Add `vmm_apply_wx_policy()` as a centralized check. Currently AdrOS achieves W^X ad-hoc in the ELF loader. + +8. **`mmap`/`munmap`** — Requires `vmm_find_free_area()` from supplementary material. Critical for POSIX. + +### Long-Term (from supplementary material roadmap) + +9. **CPUID + PAE + NX** — Follow the `cpu_get_features()` / `cpu_enable_nx()` pattern for hardware W^X. + +10. **O(1) scheduler** — The active/expired bitmap swap pattern is elegant and well-suited for AdrOS. + +11. **Slab allocator** — The supplementary material's free-list-in-place design is simple and effective. + +12. **PCI + networking** — Follow the PCI scan → BAR mapping → E1000 DMA ring → lwIP bridge pattern. + +--- + +## Conclusion + +AdrOS is a **well-architected hobby OS** that has already implemented many of the hardest +parts of a Unix-like system: process management with signals, a multi-filesystem VFS, +PTY with job control, and a secure ELF loader. It is approximately **45% of the way** +to a practical POSIX-compatible system. + +The supplementary material provides excellent **architectural blueprints** for the next +evolution: CoW memory, O(1) scheduling, hardware NX, and networking. However, AdrOS is +already **ahead** of the supplementary material in several areas (VFS diversity, signal +handling, PTY/job control). + +The most impactful next steps are **not** the advanced features from the supplementary +material, but rather the **userland enablers**: a minimal libc, a shell, and `brk`/`sbrk`. +These would transform AdrOS from a kernel with smoke tests into an interactive Unix system. diff --git a/include/arch/x86/multiboot2.h b/include/arch/x86/multiboot2.h index c336290..8377274 100644 --- a/include/arch/x86/multiboot2.h +++ b/include/arch/x86/multiboot2.h @@ -71,6 +71,18 @@ struct multiboot_tag_module { char string[0]; }; +struct multiboot_tag_framebuffer { + uint32_t type; + uint32_t size; + uint64_t framebuffer_addr; + uint32_t framebuffer_pitch; + uint32_t framebuffer_width; + uint32_t framebuffer_height; + uint8_t framebuffer_bpp; + uint8_t framebuffer_type; + uint8_t reserved; +}; + #define MULTIBOOT_MEMORY_AVAILABLE 1 #define MULTIBOOT_MEMORY_RESERVED 2 #define MULTIBOOT_MEMORY_ACPI_RECLAIMABLE 3 diff --git a/include/elf.h b/include/elf.h index b811b08..07f0b92 100644 --- a/include/elf.h +++ b/include/elf.h @@ -51,6 +51,6 @@ typedef struct { #define PF_W 0x2 #define PF_R 0x4 -int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uintptr_t* user_stack_top_out, uintptr_t* addr_space_out); +int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uintptr_t* user_stack_top_out, uintptr_t* addr_space_out, uintptr_t* heap_break_out); #endif diff --git a/include/kernel/boot_info.h b/include/kernel/boot_info.h index af08602..5696d6b 100644 --- a/include/kernel/boot_info.h +++ b/include/kernel/boot_info.h @@ -12,6 +12,12 @@ struct boot_info { uintptr_t initrd_end; const char* cmdline; + + uintptr_t fb_addr; + uint32_t fb_pitch; + uint32_t fb_width; + uint32_t fb_height; + uint8_t fb_bpp; }; #endif diff --git a/include/pci.h b/include/pci.h new file mode 100644 index 0000000..bc0f79e --- /dev/null +++ b/include/pci.h @@ -0,0 +1,31 @@ +#ifndef PCI_H +#define PCI_H + +#include + +struct pci_device { + uint8_t bus; + uint8_t slot; + uint8_t func; + uint16_t vendor_id; + uint16_t device_id; + uint8_t class_code; + uint8_t subclass; + uint8_t prog_if; + uint8_t header_type; + uint32_t bar[6]; + uint8_t irq_line; +}; + +#define PCI_MAX_DEVICES 32 + +uint32_t pci_config_read(uint8_t bus, uint8_t slot, uint8_t func, uint8_t offset); +void pci_config_write(uint8_t bus, uint8_t slot, uint8_t func, uint8_t offset, uint32_t value); + +void pci_init(void); +int pci_get_device_count(void); +const struct pci_device* pci_get_device(int index); +const struct pci_device* pci_find_device(uint16_t vendor, uint16_t device); +const struct pci_device* pci_find_class(uint8_t class_code, uint8_t subclass); + +#endif diff --git a/include/pmm.h b/include/pmm.h index 83ba57d..90b8774 100644 --- a/include/pmm.h +++ b/include/pmm.h @@ -12,9 +12,14 @@ void pmm_init(void* boot_info); // Allocate a single physical page void* pmm_alloc_page(void); -// Free a physical page +// Free a physical page (decrements refcount, frees at 0) void pmm_free_page(void* ptr); +// Reference counting for Copy-on-Write +void pmm_incref(uintptr_t paddr); +uint16_t pmm_decref(uintptr_t paddr); +uint16_t pmm_get_refcount(uintptr_t paddr); + // Helper to print memory stats void pmm_print_stats(void); diff --git a/include/process.h b/include/process.h index f6d8813..ffadd94 100644 --- a/include/process.h +++ b/include/process.h @@ -33,8 +33,13 @@ struct process { uintptr_t sp; uintptr_t addr_space; uint32_t* kernel_stack; +#define SCHED_NUM_PRIOS 32 +#define SCHED_DEFAULT_PRIO 16 + + uint8_t priority; // 0 = highest, 31 = lowest + int8_t nice; // -20 to +19 (maps to priority) process_state_t state; - uint32_t wake_at_tick; // New: When to wake up (global tick count) + uint32_t wake_at_tick; int exit_status; int has_user_regs; @@ -51,6 +56,15 @@ struct process { // For SIGSEGV: last page fault address (CR2) captured in ring3. uintptr_t last_fault_addr; +#define PROCESS_MAX_MMAPS 32 + struct { + uintptr_t base; + uint32_t length; + } mmaps[PROCESS_MAX_MMAPS]; + + uintptr_t heap_start; + uintptr_t heap_break; + char cwd[128]; int waiting; @@ -58,8 +72,12 @@ struct process { int wait_result_pid; int wait_result_status; struct file* files[PROCESS_MAX_FILES]; + uint8_t fd_flags[PROCESS_MAX_FILES]; struct process* next; - struct process* prev; // Doubly linked list helps here too! (Optional but good) + struct process* prev; + + struct process* rq_next; // O(1) runqueue per-priority list + struct process* rq_prev; }; // Global pointer to the currently running process @@ -94,6 +112,9 @@ void process_exit_notify(int status); // Kill a process (minimal signals). Returns 0 on success or -errno. int process_kill(uint32_t pid, int sig); +// Send a signal to all processes in a process group. +int process_kill_pgrp(uint32_t pgrp, int sig); + // Create a child process that will resume in usermode from a saved register frame. struct process* process_fork_create(uintptr_t child_as, const struct registers* child_regs); diff --git a/include/procfs.h b/include/procfs.h new file mode 100644 index 0000000..ae66137 --- /dev/null +++ b/include/procfs.h @@ -0,0 +1,8 @@ +#ifndef PROCFS_H +#define PROCFS_H + +#include "fs.h" + +fs_node_t* procfs_create_root(void); + +#endif diff --git a/include/slab.h b/include/slab.h new file mode 100644 index 0000000..466277f --- /dev/null +++ b/include/slab.h @@ -0,0 +1,22 @@ +#ifndef SLAB_H +#define SLAB_H + +#include +#include +#include "spinlock.h" + +typedef struct slab_cache { + const char* name; + uint32_t obj_size; + uint32_t objs_per_slab; + void* free_list; + uint32_t total_allocs; + uint32_t total_frees; + spinlock_t lock; +} slab_cache_t; + +void slab_cache_init(slab_cache_t* cache, const char* name, uint32_t obj_size); +void* slab_alloc(slab_cache_t* cache); +void slab_free(slab_cache_t* cache, void* obj); + +#endif diff --git a/include/syscall.h b/include/syscall.h index 9e3bc7b..2814745 100644 --- a/include/syscall.h +++ b/include/syscall.h @@ -56,6 +56,12 @@ enum { SYSCALL_RENAME = 39, SYSCALL_RMDIR = 40, + + SYSCALL_BRK = 41, + SYSCALL_NANOSLEEP = 42, + SYSCALL_CLOCK_GETTIME = 43, + SYSCALL_MMAP = 44, + SYSCALL_MUNMAP = 45, }; #endif diff --git a/include/tty.h b/include/tty.h index bf95310..0267b7e 100644 --- a/include/tty.h +++ b/include/tty.h @@ -5,12 +5,23 @@ #include struct termios { + uint32_t c_iflag; + uint32_t c_oflag; + uint32_t c_cflag; uint32_t c_lflag; }; +struct winsize { + uint16_t ws_row; + uint16_t ws_col; + uint16_t ws_xpixel; + uint16_t ws_ypixel; +}; + enum { - TTY_ICANON = 0x0001, - TTY_ECHO = 0x0002, + TTY_ICANON = 0x0002, + TTY_ECHO = 0x0008, + TTY_ISIG = 0x0001, }; void tty_init(void); diff --git a/include/vbe.h b/include/vbe.h new file mode 100644 index 0000000..15c1848 --- /dev/null +++ b/include/vbe.h @@ -0,0 +1,25 @@ +#ifndef VBE_H +#define VBE_H + +#include +#include "kernel/boot_info.h" + +struct vbe_info { + uintptr_t phys_addr; + volatile uint8_t* virt_addr; + uint32_t pitch; + uint32_t width; + uint32_t height; + uint8_t bpp; + uint32_t size; +}; + +int vbe_init(const struct boot_info* bi); +int vbe_available(void); +const struct vbe_info* vbe_get_info(void); + +void vbe_put_pixel(uint32_t x, uint32_t y, uint32_t color); +void vbe_fill_rect(uint32_t x, uint32_t y, uint32_t w, uint32_t h, uint32_t color); +void vbe_clear(uint32_t color); + +#endif diff --git a/include/vmm.h b/include/vmm.h index 644f58d..3ef6560 100644 --- a/include/vmm.h +++ b/include/vmm.h @@ -7,6 +7,7 @@ #define VMM_FLAG_PRESENT (1 << 0) #define VMM_FLAG_RW (1 << 1) #define VMM_FLAG_USER (1 << 2) +#define VMM_FLAG_COW (1 << 9) /* OS-available bit: Copy-on-Write marker */ /* * Initialize Virtual Memory Manager @@ -29,6 +30,18 @@ void vmm_as_map_page(uintptr_t as, uint64_t phys, uint64_t virt, uint32_t flags) uintptr_t vmm_as_clone_user(uintptr_t src_as); +/* + * Clone user address space using Copy-on-Write. + * Shared pages are marked read-only + COW bit; physical frames get incref'd. + */ +uintptr_t vmm_as_clone_user_cow(uintptr_t src_as); + +/* + * Handle a Copy-on-Write page fault. + * Returns 1 if the fault was a CoW fault and was resolved, 0 otherwise. + */ +int vmm_handle_cow_fault(uintptr_t fault_addr); + /* * Update flags for an already-mapped virtual page. * Keeps the physical frame, only changes PRESENT/RW/USER bits. diff --git a/src/arch/x86/arch_early_setup.c b/src/arch/x86/arch_early_setup.c index f18dfe4..5760bbd 100644 --- a/src/arch/x86/arch_early_setup.c +++ b/src/arch/x86/arch_early_setup.c @@ -38,6 +38,11 @@ static uint32_t multiboot_copy_size; bi.initrd_start = 0; bi.initrd_end = 0; bi.cmdline = NULL; + bi.fb_addr = 0; + bi.fb_pitch = 0; + bi.fb_width = 0; + bi.fb_height = 0; + bi.fb_bpp = 0; if (mbi_phys) { uint32_t total_size = *(volatile uint32_t*)mbi_phys; @@ -70,6 +75,14 @@ static uint32_t multiboot_copy_size; const struct multiboot_tag_string* s = (const struct multiboot_tag_string*)tag; bi.cmdline = s->string; } + if (tag->type == MULTIBOOT_TAG_TYPE_FRAMEBUFFER) { + const struct multiboot_tag_framebuffer* fb = (const struct multiboot_tag_framebuffer*)tag; + bi.fb_addr = (uintptr_t)fb->framebuffer_addr; + bi.fb_pitch = fb->framebuffer_pitch; + bi.fb_width = fb->framebuffer_width; + bi.fb_height = fb->framebuffer_height; + bi.fb_bpp = fb->framebuffer_bpp; + } } } diff --git a/src/arch/x86/arch_platform.c b/src/arch/x86/arch_platform.c index 5d8580a..ad876f6 100644 --- a/src/arch/x86/arch_platform.c +++ b/src/arch/x86/arch_platform.c @@ -36,13 +36,16 @@ static void userspace_init_thread(void) { uintptr_t entry = 0; uintptr_t user_sp = 0; uintptr_t user_as = 0; - if (elf32_load_user_from_initrd("/bin/init.elf", &entry, &user_sp, &user_as) != 0) { + uintptr_t heap_brk = 0; + if (elf32_load_user_from_initrd("/bin/init.elf", &entry, &user_sp, &user_as, &heap_brk) != 0) { process_exit_notify(1); schedule(); for (;;) hal_cpu_idle(); } current_process->addr_space = user_as; + current_process->heap_start = heap_brk; + current_process->heap_break = heap_brk; vmm_as_activate(user_as); uart_print("[ELF] starting /bin/init.elf\n"); diff --git a/src/arch/x86/idt.c b/src/arch/x86/idt.c index a926003..4e39b4b 100644 --- a/src/arch/x86/idt.c +++ b/src/arch/x86/idt.c @@ -4,6 +4,7 @@ #include "process.h" #include "spinlock.h" #include "uaccess.h" +#include "vmm.h" #include "syscall.h" #include "signal.h" #include @@ -328,6 +329,12 @@ void isr_handler(struct registers* regs) { __asm__ volatile("mov %%cr2, %0" : "=r"(cr2)); if ((regs->cs & 3U) == 3U) { + // Check for Copy-on-Write fault (write to read-only CoW page). + // Error code bit 1 = caused by a write. + if ((regs->err_code & 0x2) && vmm_handle_cow_fault((uintptr_t)cr2)) { + return; // CoW resolved, resume user process. + } + const int SIG_SEGV = 11; if (current_process) { current_process->last_fault_addr = (uintptr_t)cr2; diff --git a/src/arch/x86/vmm.c b/src/arch/x86/vmm.c index 3491a70..9ea6d35 100644 --- a/src/arch/x86/vmm.c +++ b/src/arch/x86/vmm.c @@ -18,6 +18,7 @@ #define X86_PTE_PRESENT 0x1 #define X86_PTE_RW 0x2 #define X86_PTE_USER 0x4 +#define X86_PTE_COW 0x200 /* Bit 9: OS-available, marks Copy-on-Write */ /* Defined in boot.S (Physical address loaded in CR3, but accessed via virt alias) */ /* Wait, boot_pd is in BSS. Linker put it at 0xC0xxxxxx. @@ -35,6 +36,7 @@ static uint32_t vmm_flags_to_x86(uint32_t flags) { if (flags & VMM_FLAG_PRESENT) x86_flags |= X86_PTE_PRESENT; if (flags & VMM_FLAG_RW) x86_flags |= X86_PTE_RW; if (flags & VMM_FLAG_USER) x86_flags |= X86_PTE_USER; + if (flags & VMM_FLAG_COW) x86_flags |= X86_PTE_COW; return x86_flags; } @@ -281,6 +283,102 @@ void vmm_unmap_page(uint64_t virt) { invlpg((uintptr_t)virt); } +uintptr_t vmm_as_clone_user_cow(uintptr_t src_as) { + if (!src_as) return 0; + + uintptr_t new_as = vmm_as_create_kernel_clone(); + if (!new_as) return 0; + + uintptr_t old_as = hal_cpu_get_address_space(); + vmm_as_activate(src_as); + volatile uint32_t* src_pd = x86_pd_recursive(); + + for (uint32_t pdi = 0; pdi < 768; pdi++) { + uint32_t pde = (uint32_t)src_pd[pdi]; + if ((pde & X86_PTE_PRESENT) == 0) continue; + + volatile uint32_t* src_pt = x86_pt_recursive(pdi); + + for (uint32_t pti = 0; pti < 1024; pti++) { + uint32_t pte = (uint32_t)src_pt[pti]; + if (!(pte & X86_PTE_PRESENT)) continue; + if ((pte & X86_PTE_USER) == 0) continue; + + uint32_t frame_phys = pte & 0xFFFFF000; + uintptr_t va = ((uintptr_t)pdi << 22) | ((uintptr_t)pti << 12); + + // Mark source page as read-only + CoW if it was writable. + uint32_t new_pte = frame_phys | X86_PTE_PRESENT | X86_PTE_USER; + if (pte & X86_PTE_RW) { + new_pte |= X86_PTE_COW; // Was writable -> CoW + // Remove RW from source + src_pt[pti] = new_pte; + invlpg(va); + } else { + new_pte = pte; // Keep as-is (read-only text, etc.) + } + + // Increment physical frame refcount + pmm_incref((uintptr_t)frame_phys); + + // Map same frame into child with same flags + vmm_as_map_page(new_as, (uint64_t)frame_phys, (uint64_t)va, + VMM_FLAG_PRESENT | VMM_FLAG_USER | + ((new_pte & X86_PTE_COW) ? VMM_FLAG_COW : 0)); + } + } + + vmm_as_activate(old_as); + return new_as; +} + +int vmm_handle_cow_fault(uintptr_t fault_addr) { + uintptr_t va = fault_addr & ~(uintptr_t)0xFFF; + uint32_t pdi = va >> 22; + uint32_t pti = (va >> 12) & 0x3FF; + + if (pdi >= 768) return 0; // Kernel space, not CoW + + volatile uint32_t* pd = x86_pd_recursive(); + if ((pd[pdi] & X86_PTE_PRESENT) == 0) return 0; + + volatile uint32_t* pt = x86_pt_recursive(pdi); + uint32_t pte = pt[pti]; + + if (!(pte & X86_PTE_PRESENT)) return 0; + if (!(pte & X86_PTE_COW)) return 0; + + uint32_t old_frame = pte & 0xFFFFF000; + uint16_t rc = pmm_get_refcount((uintptr_t)old_frame); + + if (rc <= 1) { + // We're the sole owner — just make it writable and clear CoW. + pt[pti] = old_frame | X86_PTE_PRESENT | X86_PTE_RW | X86_PTE_USER; + invlpg(va); + return 1; + } + + // Allocate a new frame and copy the page contents. + void* new_frame = pmm_alloc_page(); + if (!new_frame) return 0; // OOM — caller will SIGSEGV + + // Use a temporary kernel VA to copy data. + const uintptr_t TMP_COW_VA = 0xBFFFD000U; + vmm_map_page((uint64_t)(uintptr_t)new_frame, (uint64_t)TMP_COW_VA, + VMM_FLAG_PRESENT | VMM_FLAG_RW); + memcpy((void*)TMP_COW_VA, (const void*)va, 4096); + vmm_unmap_page((uint64_t)TMP_COW_VA); + + // Decrement old frame refcount. + pmm_decref((uintptr_t)old_frame); + + // Map new frame as writable (no CoW). + pt[pti] = (uint32_t)(uintptr_t)new_frame | X86_PTE_PRESENT | X86_PTE_RW | X86_PTE_USER; + invlpg(va); + + return 1; +} + void vmm_init(void) { uart_print("[VMM] Higher Half Kernel Active.\n"); diff --git a/src/drivers/pci.c b/src/drivers/pci.c new file mode 100644 index 0000000..c5dcd0e --- /dev/null +++ b/src/drivers/pci.c @@ -0,0 +1,146 @@ +#include "pci.h" +#include "io.h" +#include "uart_console.h" +#include "utils.h" + +#define PCI_CONFIG_ADDR 0xCF8 +#define PCI_CONFIG_DATA 0xCFC + +static struct pci_device pci_devices[PCI_MAX_DEVICES]; +static int pci_device_count = 0; + +uint32_t pci_config_read(uint8_t bus, uint8_t slot, uint8_t func, uint8_t offset) { + uint32_t address = (1U << 31) + | ((uint32_t)bus << 16) + | ((uint32_t)(slot & 0x1F) << 11) + | ((uint32_t)(func & 0x07) << 8) + | ((uint32_t)offset & 0xFC); + outl(PCI_CONFIG_ADDR, address); + return inl(PCI_CONFIG_DATA); +} + +void pci_config_write(uint8_t bus, uint8_t slot, uint8_t func, uint8_t offset, uint32_t value) { + uint32_t address = (1U << 31) + | ((uint32_t)bus << 16) + | ((uint32_t)(slot & 0x1F) << 11) + | ((uint32_t)(func & 0x07) << 8) + | ((uint32_t)offset & 0xFC); + outl(PCI_CONFIG_ADDR, address); + outl(PCI_CONFIG_DATA, value); +} + +static void pci_scan_func(uint8_t bus, uint8_t slot, uint8_t func) { + uint32_t reg0 = pci_config_read(bus, slot, func, 0x00); + uint16_t vendor = (uint16_t)(reg0 & 0xFFFF); + uint16_t device = (uint16_t)(reg0 >> 16); + + if (vendor == 0xFFFF) return; + if (pci_device_count >= PCI_MAX_DEVICES) return; + + struct pci_device* d = &pci_devices[pci_device_count]; + d->bus = bus; + d->slot = slot; + d->func = func; + d->vendor_id = vendor; + d->device_id = device; + + uint32_t reg2 = pci_config_read(bus, slot, func, 0x08); + d->class_code = (uint8_t)(reg2 >> 24); + d->subclass = (uint8_t)(reg2 >> 16); + d->prog_if = (uint8_t)(reg2 >> 8); + + uint32_t reg3 = pci_config_read(bus, slot, func, 0x0C); + d->header_type = (uint8_t)(reg3 >> 16); + + for (int i = 0; i < 6; i++) { + d->bar[i] = pci_config_read(bus, slot, func, (uint8_t)(0x10 + i * 4)); + } + + uint32_t reg_irq = pci_config_read(bus, slot, func, 0x3C); + d->irq_line = (uint8_t)(reg_irq & 0xFF); + + pci_device_count++; +} + +static void pci_scan_slot(uint8_t bus, uint8_t slot) { + uint32_t reg0 = pci_config_read(bus, slot, 0, 0x00); + if ((reg0 & 0xFFFF) == 0xFFFF) return; + + pci_scan_func(bus, slot, 0); + + uint32_t reg3 = pci_config_read(bus, slot, 0, 0x0C); + uint8_t header_type = (uint8_t)(reg3 >> 16); + if (header_type & 0x80) { + for (uint8_t func = 1; func < 8; func++) { + pci_scan_func(bus, slot, func); + } + } +} + +static void pci_scan_bus(uint8_t bus) { + for (uint8_t slot = 0; slot < 32; slot++) { + pci_scan_slot(bus, slot); + } +} + +void pci_init(void) { + pci_device_count = 0; + + uint32_t reg3 = pci_config_read(0, 0, 0, 0x0C); + uint8_t header_type = (uint8_t)(reg3 >> 16); + + if (header_type & 0x80) { + for (uint8_t func = 0; func < 8; func++) { + uint32_t r = pci_config_read(0, 0, func, 0x00); + if ((r & 0xFFFF) == 0xFFFF) continue; + pci_scan_bus(func); + } + } else { + pci_scan_bus(0); + } + + uart_print("[PCI] Enumerated "); + char buf[8]; + itoa(pci_device_count, buf, 10); + uart_print(buf); + uart_print(" device(s)\n"); + + for (int i = 0; i < pci_device_count; i++) { + struct pci_device* d = &pci_devices[i]; + uart_print(" "); + char hex[12]; + itoa_hex(d->vendor_id, hex); uart_print(hex); + uart_print(":"); + itoa_hex(d->device_id, hex); uart_print(hex); + uart_print(" class="); + itoa_hex(d->class_code, hex); uart_print(hex); + uart_print(":"); + itoa_hex(d->subclass, hex); uart_print(hex); + uart_print("\n"); + } +} + +int pci_get_device_count(void) { + return pci_device_count; +} + +const struct pci_device* pci_get_device(int index) { + if (index < 0 || index >= pci_device_count) return 0; + return &pci_devices[index]; +} + +const struct pci_device* pci_find_device(uint16_t vendor, uint16_t device) { + for (int i = 0; i < pci_device_count; i++) { + if (pci_devices[i].vendor_id == vendor && pci_devices[i].device_id == device) + return &pci_devices[i]; + } + return 0; +} + +const struct pci_device* pci_find_class(uint8_t class_code, uint8_t subclass) { + for (int i = 0; i < pci_device_count; i++) { + if (pci_devices[i].class_code == class_code && pci_devices[i].subclass == subclass) + return &pci_devices[i]; + } + return 0; +} diff --git a/src/drivers/vbe.c b/src/drivers/vbe.c new file mode 100644 index 0000000..a29a6bc --- /dev/null +++ b/src/drivers/vbe.c @@ -0,0 +1,114 @@ +#include "vbe.h" +#include "vmm.h" +#include "uart_console.h" +#include "utils.h" + +#include + +static struct vbe_info g_vbe; +static int g_vbe_ready = 0; + +int vbe_init(const struct boot_info* bi) { + if (!bi || bi->fb_addr == 0 || bi->fb_width == 0 || bi->fb_height == 0 || bi->fb_bpp == 0) { + uart_print("[VBE] No framebuffer provided by bootloader.\n"); + return -1; + } + + g_vbe.phys_addr = bi->fb_addr; + g_vbe.pitch = bi->fb_pitch; + g_vbe.width = bi->fb_width; + g_vbe.height = bi->fb_height; + g_vbe.bpp = bi->fb_bpp; + g_vbe.size = g_vbe.pitch * g_vbe.height; + + uint32_t pages = (g_vbe.size + 0xFFF) >> 12; + uintptr_t virt_base = 0xD0000000U; + + for (uint32_t i = 0; i < pages; i++) { + vmm_map_page((uint64_t)(g_vbe.phys_addr + i * 0x1000), + (uint64_t)(virt_base + i * 0x1000), + VMM_FLAG_PRESENT | VMM_FLAG_RW); + } + + g_vbe.virt_addr = (volatile uint8_t*)virt_base; + g_vbe_ready = 1; + + uart_print("[VBE] Framebuffer "); + char buf[16]; + itoa(g_vbe.width, buf, 10); uart_print(buf); + uart_print("x"); + itoa(g_vbe.height, buf, 10); uart_print(buf); + uart_print("x"); + itoa(g_vbe.bpp, buf, 10); uart_print(buf); + uart_print(" @ 0x"); + itoa_hex(g_vbe.phys_addr, buf); uart_print(buf); + uart_print(" mapped to 0x"); + itoa_hex(virt_base, buf); uart_print(buf); + uart_print("\n"); + + return 0; +} + +int vbe_available(void) { + return g_vbe_ready; +} + +const struct vbe_info* vbe_get_info(void) { + if (!g_vbe_ready) return NULL; + return &g_vbe; +} + +void vbe_put_pixel(uint32_t x, uint32_t y, uint32_t color) { + if (!g_vbe_ready) return; + if (x >= g_vbe.width || y >= g_vbe.height) return; + + uint32_t offset = y * g_vbe.pitch + x * (g_vbe.bpp / 8); + volatile uint8_t* pixel = g_vbe.virt_addr + offset; + + if (g_vbe.bpp == 32) { + *(volatile uint32_t*)pixel = color; + } else if (g_vbe.bpp == 24) { + pixel[0] = (uint8_t)(color & 0xFF); + pixel[1] = (uint8_t)((color >> 8) & 0xFF); + pixel[2] = (uint8_t)((color >> 16) & 0xFF); + } else if (g_vbe.bpp == 16) { + *(volatile uint16_t*)pixel = (uint16_t)color; + } +} + +void vbe_fill_rect(uint32_t x, uint32_t y, uint32_t w, uint32_t h, uint32_t color) { + if (!g_vbe_ready) return; + + uint32_t x_end = x + w; + uint32_t y_end = y + h; + if (x_end > g_vbe.width) x_end = g_vbe.width; + if (y_end > g_vbe.height) y_end = g_vbe.height; + + uint32_t bytes_pp = g_vbe.bpp / 8; + + for (uint32_t row = y; row < y_end; row++) { + volatile uint8_t* row_ptr = g_vbe.virt_addr + row * g_vbe.pitch + x * bytes_pp; + if (g_vbe.bpp == 32) { + volatile uint32_t* p = (volatile uint32_t*)row_ptr; + for (uint32_t col = x; col < x_end; col++) { + *p++ = color; + } + } else { + for (uint32_t col = x; col < x_end; col++) { + uint32_t off = (col - x) * bytes_pp; + if (bytes_pp == 3) { + row_ptr[off] = (uint8_t)(color & 0xFF); + row_ptr[off + 1] = (uint8_t)((color >> 8) & 0xFF); + row_ptr[off + 2] = (uint8_t)((color >> 16) & 0xFF); + } else if (bytes_pp == 2) { + *(volatile uint16_t*)(row_ptr + off) = (uint16_t)color; + } + } + } + } +} + +void vbe_clear(uint32_t color) { + if (!g_vbe_ready) return; + vbe_fill_rect(0, 0, g_vbe.width, g_vbe.height, color); +} diff --git a/src/kernel/elf.c b/src/kernel/elf.c index fb082a4..f76c005 100644 --- a/src/kernel/elf.c +++ b/src/kernel/elf.c @@ -103,7 +103,7 @@ static int elf32_map_user_range(uintptr_t as, uintptr_t vaddr, size_t len, uint3 return 0; } -int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uintptr_t* user_stack_top_out, uintptr_t* addr_space_out) { +int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uintptr_t* user_stack_top_out, uintptr_t* addr_space_out, uintptr_t* heap_break_out) { if (!filename || !entry_out || !user_stack_top_out || !addr_space_out) return -EFAULT; if (!fs_root) return -EINVAL; @@ -155,6 +155,7 @@ int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uint } const elf32_phdr_t* ph = (const elf32_phdr_t*)(file + eh->e_phoff); + uintptr_t highest_seg_end = 0; for (uint16_t i = 0; i < eh->e_phnum; i++) { if (ph[i].p_type != PT_LOAD) continue; @@ -215,6 +216,10 @@ int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uint if (ph[i].p_memsz > ph[i].p_filesz) { memset((void*)(uintptr_t)(ph[i].p_vaddr + ph[i].p_filesz), 0, ph[i].p_memsz - ph[i].p_filesz); } + + if (seg_end > highest_seg_end) { + highest_seg_end = seg_end; + } } const uintptr_t user_stack_base = 0x00800000U; @@ -232,17 +237,21 @@ int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uint *entry_out = (uintptr_t)eh->e_entry; *user_stack_top_out = user_stack_base + user_stack_size; *addr_space_out = new_as; + if (heap_break_out) { + *heap_break_out = (highest_seg_end + 0xFFFU) & ~(uintptr_t)0xFFFU; + } kfree(file); vmm_as_activate(old_as); return 0; } #else -int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uintptr_t* user_stack_top_out, uintptr_t* addr_space_out) { +int elf32_load_user_from_initrd(const char* filename, uintptr_t* entry_out, uintptr_t* user_stack_top_out, uintptr_t* addr_space_out, uintptr_t* heap_break_out) { (void)filename; (void)entry_out; (void)user_stack_top_out; (void)addr_space_out; + (void)heap_break_out; return -1; } #endif diff --git a/src/kernel/init.c b/src/kernel/init.c index e6fcf94..97ef659 100644 --- a/src/kernel/init.c +++ b/src/kernel/init.c @@ -11,6 +11,9 @@ #include "pty.h" #include "persistfs.h" #include "diskfs.h" +#include "procfs.h" +#include "pci.h" +#include "vbe.h" #include "uart_console.h" #include "hal/mm.h" @@ -65,6 +68,9 @@ int init_start(const struct boot_info* bi) { (void)vfs_mount("/tmp", tmp); } + pci_init(); + vbe_init(bi); + tty_init(); pty_init(); @@ -83,6 +89,11 @@ int init_start(const struct boot_info* bi) { (void)vfs_mount("/disk", disk); } + fs_node_t* proc = procfs_create_root(); + if (proc) { + (void)vfs_mount("/proc", proc); + } + int user_ret = arch_platform_start_userspace(bi); if (bi && cmdline_has_token(bi->cmdline, "ring3")) { diff --git a/src/kernel/procfs.c b/src/kernel/procfs.c new file mode 100644 index 0000000..7ced30d --- /dev/null +++ b/src/kernel/procfs.c @@ -0,0 +1,216 @@ +#include "procfs.h" + +#include "process.h" +#include "utils.h" +#include "heap.h" +#include "pmm.h" +#include "timer.h" + +#include + +static fs_node_t g_proc_root; +static fs_node_t g_proc_self; +static fs_node_t g_proc_self_status; +static fs_node_t g_proc_uptime; +static fs_node_t g_proc_meminfo; + +extern struct process* ready_queue_head; + +static int proc_snprintf(char* buf, uint32_t sz, const char* key, uint32_t val) { + if (sz < 2) return 0; + uint32_t w = 0; + const char* p = key; + while (*p && w + 1 < sz) buf[w++] = *p++; + char num[16]; + itoa(val, num, 10); + p = num; + while (*p && w + 1 < sz) buf[w++] = *p++; + if (w + 1 < sz) buf[w++] = '\n'; + buf[w] = 0; + return (int)w; +} + +static uint32_t proc_self_status_read(fs_node_t* node, uint32_t offset, uint32_t size, uint8_t* buffer) { + (void)node; + if (!current_process) return 0; + + char tmp[512]; + uint32_t len = 0; + + len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "Pid:\t", current_process->pid); + len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "PPid:\t", current_process->parent_pid); + len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "Pgrp:\t", current_process->pgrp_id); + len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "Session:\t", current_process->session_id); + + const char* state_str = "unknown\n"; + switch (current_process->state) { + case PROCESS_READY: state_str = "R (ready)\n"; break; + case PROCESS_RUNNING: state_str = "R (running)\n"; break; + case PROCESS_BLOCKED: state_str = "S (blocked)\n"; break; + case PROCESS_SLEEPING: state_str = "S (sleeping)\n"; break; + case PROCESS_ZOMBIE: state_str = "Z (zombie)\n"; break; + } + const char* s = "State:\t"; + while (*s && len + 1 < sizeof(tmp)) tmp[len++] = *s++; + s = state_str; + while (*s && len + 1 < sizeof(tmp)) tmp[len++] = *s++; + + len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "SigPnd:\t", current_process->sig_pending_mask); + len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "SigBlk:\t", current_process->sig_blocked_mask); + len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "HeapStart:\t", (uint32_t)current_process->heap_start); + len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "HeapBreak:\t", (uint32_t)current_process->heap_break); + + if (offset >= len) return 0; + uint32_t avail = len - offset; + if (size > avail) size = avail; + memcpy(buffer, tmp + offset, size); + return size; +} + +static uint32_t proc_uptime_read(fs_node_t* node, uint32_t offset, uint32_t size, uint8_t* buffer) { + (void)node; + uint32_t ticks = get_tick_count(); + uint32_t secs = (ticks * 20) / 1000; + uint32_t frac = ((ticks * 20) % 1000) / 10; + + char tmp[64]; + uint32_t len = 0; + char num[16]; + itoa(secs, num, 10); + const char* p = num; + while (*p && len + 2 < sizeof(tmp)) tmp[len++] = *p++; + if (len + 2 < sizeof(tmp)) tmp[len++] = '.'; + if (frac < 10 && len + 2 < sizeof(tmp)) tmp[len++] = '0'; + itoa(frac, num, 10); + p = num; + while (*p && len + 2 < sizeof(tmp)) tmp[len++] = *p++; + if (len + 1 < sizeof(tmp)) tmp[len++] = '\n'; + if (len < sizeof(tmp)) tmp[len] = 0; + else tmp[sizeof(tmp) - 1] = 0; + + if (offset >= len) return 0; + uint32_t avail = len - offset; + if (size > avail) size = avail; + memcpy(buffer, tmp + offset, size); + return size; +} + +extern void pmm_print_stats(void); + +static uint32_t proc_meminfo_read(fs_node_t* node, uint32_t offset, uint32_t size, uint8_t* buffer) { + (void)node; + + char tmp[256]; + uint32_t len = 0; + + /* Count processes */ + uint32_t nprocs = 0; + if (ready_queue_head) { + struct process* it = ready_queue_head; + const struct process* start = it; + do { + nprocs++; + it = it->next; + } while (it && it != start); + } + + len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "Processes:\t", nprocs); + len += (uint32_t)proc_snprintf(tmp + len, sizeof(tmp) - len, "TickCount:\t", get_tick_count()); + + if (offset >= len) return 0; + uint32_t avail = len - offset; + if (size > avail) size = avail; + memcpy(buffer, tmp + offset, size); + return size; +} + +static fs_node_t* proc_self_finddir(fs_node_t* node, const char* name) { + (void)node; + if (strcmp(name, "status") == 0) return &g_proc_self_status; + return NULL; +} + +static int proc_self_readdir(fs_node_t* node, uint32_t* inout_index, void* buf, uint32_t buf_len) { + (void)node; + if (!inout_index || !buf) return -1; + if (buf_len < sizeof(struct vfs_dirent)) return -1; + + static const char* entries[] = { "status" }; + uint32_t idx = *inout_index; + if (idx >= 1) return 0; + + struct vfs_dirent* d = (struct vfs_dirent*)buf; + d->d_ino = 100 + idx; + d->d_type = 0; + d->d_reclen = sizeof(struct vfs_dirent); + { + const char* s = entries[idx]; + uint32_t j = 0; + while (s[j] && j + 1 < sizeof(d->d_name)) { d->d_name[j] = s[j]; j++; } + d->d_name[j] = 0; + } + *inout_index = idx + 1; + return (int)sizeof(struct vfs_dirent); +} + +static fs_node_t* proc_root_finddir(fs_node_t* node, const char* name) { + (void)node; + if (strcmp(name, "self") == 0) return &g_proc_self; + if (strcmp(name, "uptime") == 0) return &g_proc_uptime; + if (strcmp(name, "meminfo") == 0) return &g_proc_meminfo; + return NULL; +} + +static int proc_root_readdir(fs_node_t* node, uint32_t* inout_index, void* buf, uint32_t buf_len) { + (void)node; + if (!inout_index || !buf) return -1; + if (buf_len < sizeof(struct vfs_dirent)) return -1; + + static const char* entries[] = { "self", "uptime", "meminfo" }; + uint32_t idx = *inout_index; + if (idx >= 3) return 0; + + struct vfs_dirent* d = (struct vfs_dirent*)buf; + d->d_ino = 200 + idx; + d->d_type = (idx == 0) ? 2 : 0; + d->d_reclen = sizeof(struct vfs_dirent); + { + const char* s = entries[idx]; + uint32_t j = 0; + while (s[j] && j + 1 < sizeof(d->d_name)) { d->d_name[j] = s[j]; j++; } + d->d_name[j] = 0; + } + *inout_index = idx + 1; + return (int)sizeof(struct vfs_dirent); +} + +fs_node_t* procfs_create_root(void) { + memset(&g_proc_root, 0, sizeof(g_proc_root)); + strcpy(g_proc_root.name, "proc"); + g_proc_root.flags = FS_DIRECTORY; + g_proc_root.finddir = proc_root_finddir; + g_proc_root.readdir = proc_root_readdir; + + memset(&g_proc_self, 0, sizeof(g_proc_self)); + strcpy(g_proc_self.name, "self"); + g_proc_self.flags = FS_DIRECTORY; + g_proc_self.finddir = proc_self_finddir; + g_proc_self.readdir = proc_self_readdir; + + memset(&g_proc_self_status, 0, sizeof(g_proc_self_status)); + strcpy(g_proc_self_status.name, "status"); + g_proc_self_status.flags = FS_FILE; + g_proc_self_status.read = proc_self_status_read; + + memset(&g_proc_uptime, 0, sizeof(g_proc_uptime)); + strcpy(g_proc_uptime.name, "uptime"); + g_proc_uptime.flags = FS_FILE; + g_proc_uptime.read = proc_uptime_read; + + memset(&g_proc_meminfo, 0, sizeof(g_proc_meminfo)); + strcpy(g_proc_meminfo.name, "meminfo"); + g_proc_meminfo.flags = FS_FILE; + g_proc_meminfo.read = proc_meminfo_read; + + return &g_proc_root; +} diff --git a/src/kernel/pty.c b/src/kernel/pty.c index 78f2f29..024818f 100644 --- a/src/kernel/pty.c +++ b/src/kernel/pty.c @@ -188,6 +188,20 @@ int pty_master_write_kbuf(const void* kbuf, uint32_t len) { int jc = pty_jobctl_write_check(); if (jc < 0) return jc; + enum { SIGINT_NUM = 2, SIGQUIT_NUM = 3, SIGTSTP_NUM = 20 }; + + const uint8_t* bytes = (const uint8_t*)kbuf; + for (uint32_t i = 0; i < len; i++) { + uint8_t ch = bytes[i]; + int sig = 0; + if (ch == 0x03) sig = SIGINT_NUM; + else if (ch == 0x1C) sig = SIGQUIT_NUM; + else if (ch == 0x1A) sig = SIGTSTP_NUM; + if (sig && pty_fg_pgrp != 0) { + process_kill_pgrp(pty_fg_pgrp, sig); + } + } + uintptr_t flags = spin_lock_irqsave(&pty_lock); uint32_t free = rb_free(m2s_head, m2s_tail); uint32_t to_write = len; diff --git a/src/kernel/scheduler.c b/src/kernel/scheduler.c index 7405a56..3eeaa33 100644 --- a/src/kernel/scheduler.c +++ b/src/kernel/scheduler.c @@ -21,6 +21,67 @@ static uint32_t next_pid = 1; static spinlock_t sched_lock = {0}; static uintptr_t kernel_as = 0; +/* ---------- O(1) runqueue ---------- */ +struct prio_queue { + struct process* head; + struct process* tail; +}; + +struct runqueue { + uint32_t bitmap; // bit i set => queue[i] non-empty + struct prio_queue queue[SCHED_NUM_PRIOS]; +}; + +static struct runqueue rq_active_store; +static struct runqueue rq_expired_store; +static struct runqueue* rq_active = &rq_active_store; +static struct runqueue* rq_expired = &rq_expired_store; + +static inline uint32_t bsf32(uint32_t v) { + uint32_t r; + __asm__ volatile("bsf %1, %0" : "=r"(r) : "rm"(v) : "cc"); + return r; +} + +static void rq_enqueue(struct runqueue* rq, struct process* p) { + uint8_t prio = p->priority; + struct prio_queue* pq = &rq->queue[prio]; + p->rq_next = NULL; + p->rq_prev = pq->tail; + if (pq->tail) pq->tail->rq_next = p; + else pq->head = p; + pq->tail = p; + rq->bitmap |= (1U << prio); +} + +static void rq_dequeue(struct runqueue* rq, struct process* p) { + uint8_t prio = p->priority; + struct prio_queue* pq = &rq->queue[prio]; + if (p->rq_prev) p->rq_prev->rq_next = p->rq_next; + else pq->head = p->rq_next; + if (p->rq_next) p->rq_next->rq_prev = p->rq_prev; + else pq->tail = p->rq_prev; + p->rq_next = NULL; + p->rq_prev = NULL; + if (!pq->head) rq->bitmap &= ~(1U << prio); +} + +static struct process* rq_pick_next(void) { + if (rq_active->bitmap) { + uint32_t prio = bsf32(rq_active->bitmap); + return rq_active->queue[prio].head; + } + // Swap active <-> expired + struct runqueue* tmp = rq_active; + rq_active = rq_expired; + rq_expired = tmp; + if (rq_active->bitmap) { + uint32_t prio = bsf32(rq_active->bitmap); + return rq_active->queue[prio].head; + } + return NULL; // only idle task left +} + void thread_wrapper(void (*fn)(void)); static struct process* process_find_locked(uint32_t pid) { @@ -127,6 +188,7 @@ int process_kill(uint32_t pid, int sig) { parent->wait_result_pid = (int)p->pid; parent->wait_result_status = p->exit_status; parent->state = PROCESS_READY; + rq_enqueue(rq_active, parent); } } } @@ -134,6 +196,7 @@ int process_kill(uint32_t pid, int sig) { p->sig_pending_mask |= (1U << (uint32_t)sig); if (p->state == PROCESS_BLOCKED || p->state == PROCESS_SLEEPING) { p->state = PROCESS_READY; + rq_enqueue(rq_active, p); } } @@ -141,6 +204,33 @@ int process_kill(uint32_t pid, int sig) { return 0; } +int process_kill_pgrp(uint32_t pgrp, int sig) { + if (pgrp == 0) return -EINVAL; + if (sig <= 0 || sig >= PROCESS_MAX_SIG) return -EINVAL; + + uintptr_t flags = spin_lock_irqsave(&sched_lock); + int found = 0; + + struct process* it = ready_queue_head; + if (it) { + const struct process* const start = it; + do { + if (it->pgrp_id == pgrp && it->pid != 0 && it->state != PROCESS_ZOMBIE) { + it->sig_pending_mask |= (1U << (uint32_t)sig); + if (it->state == PROCESS_BLOCKED || it->state == PROCESS_SLEEPING) { + it->state = PROCESS_READY; + rq_enqueue(rq_active, it); + } + found = 1; + } + it = it->next; + } while (it && it != start); + } + + spin_unlock_irqrestore(&sched_lock, flags); + return found ? 0 : -ESRCH; +} + int process_waitpid(int pid, int* status_out, uint32_t options) { if (!current_process) return -ECHILD; @@ -227,6 +317,7 @@ void process_exit_notify(int status) { parent->wait_result_pid = (int)current_process->pid; parent->wait_result_status = status; parent->state = PROCESS_READY; + rq_enqueue(rq_active, parent); } } } @@ -270,6 +361,8 @@ struct process* process_fork_create(uintptr_t child_as, const struct registers* proc->parent_pid = current_process ? current_process->pid : 0; proc->session_id = current_process ? current_process->session_id : proc->pid; proc->pgrp_id = current_process ? current_process->pgrp_id : proc->pid; + proc->priority = current_process ? current_process->priority : SCHED_DEFAULT_PRIO; + proc->nice = current_process ? current_process->nice : 0; proc->state = PROCESS_READY; proc->addr_space = child_as; proc->wake_at_tick = 0; @@ -314,6 +407,8 @@ struct process* process_fork_create(uintptr_t child_as, const struct registers* ready_queue_head->prev = proc; ready_queue_tail = proc; + rq_enqueue(rq_active, proc); + spin_unlock_irqrestore(&sched_lock, flags); return proc; } @@ -333,11 +428,16 @@ void process_init(void) { } memset(kernel_proc, 0, sizeof(*kernel_proc)); - + + memset(&rq_active_store, 0, sizeof(rq_active_store)); + memset(&rq_expired_store, 0, sizeof(rq_expired_store)); + kernel_proc->pid = 0; kernel_proc->parent_pid = 0; kernel_proc->session_id = 0; kernel_proc->pgrp_id = 0; + kernel_proc->priority = SCHED_NUM_PRIOS - 1; // idle = lowest priority + kernel_proc->nice = 19; kernel_proc->state = PROCESS_RUNNING; kernel_proc->wake_at_tick = 0; kernel_proc->addr_space = hal_cpu_get_address_space(); @@ -387,6 +487,8 @@ struct process* process_create_kernel(void (*entry_point)(void)) { proc->parent_pid = current_process ? current_process->pid : 0; proc->session_id = current_process ? current_process->session_id : proc->pid; proc->pgrp_id = current_process ? current_process->pgrp_id : proc->pid; + proc->priority = SCHED_DEFAULT_PRIO; + proc->nice = 0; proc->state = PROCESS_READY; proc->addr_space = kernel_as ? kernel_as : (current_process ? current_process->addr_space : 0); proc->wake_at_tick = 0; @@ -424,38 +526,27 @@ struct process* process_create_kernel(void (*entry_point)(void)) { ready_queue_head->prev = proc; ready_queue_tail = proc; + rq_enqueue(rq_active, proc); + spin_unlock_irqrestore(&sched_lock, flags); return proc; } -// Find next READY process +// Find next READY process — O(1) via bitmap struct process* get_next_ready_process(void) { - if (!current_process) return NULL; - if (!current_process->next) return current_process; - - struct process* iterator = current_process->next; + struct process* next = rq_pick_next(); + if (next) return next; - // Scan the full circular list for a READY process. - while (iterator && iterator != current_process) { - if (iterator->state == PROCESS_READY) { - return iterator; - } - iterator = iterator->next; - } - - // If current is ready/running, return it. - if (current_process->state == PROCESS_RUNNING || current_process->state == PROCESS_READY) - return current_process; - - // If EVERYONE is sleeping, we must return the IDLE task (PID 0) - // Assuming PID 0 is always in the list. - // Search specifically for PID 0 - iterator = current_process->next; - while (iterator && iterator->pid != 0) { - iterator = iterator->next; - if (iterator == current_process) break; // Should not happen - } - return iterator ? iterator : current_process; + // Fallback: idle task (PID 0) + if (current_process && current_process->pid == 0) return current_process; + struct process* it = ready_queue_head; + if (!it) return current_process; + const struct process* start = it; + do { + if (it->pid == 0) return it; + it = it->next; + } while (it && it != start); + return current_process; } void schedule(void) { @@ -467,17 +558,44 @@ void schedule(void) { } struct process* prev = current_process; - struct process* next = get_next_ready_process(); - - if (prev == next) { - spin_unlock_irqrestore(&sched_lock, irq_flags); - return; - } - // Only change state to READY if it was RUNNING. - // If it was SLEEPING/BLOCKED, leave it as is. + // Put prev back into expired runqueue if it's still runnable. if (prev->state == PROCESS_RUNNING) { prev->state = PROCESS_READY; + rq_enqueue(rq_expired, prev); + } + + // Pick highest-priority READY process (may swap active/expired). + struct process* next = get_next_ready_process(); + + if (next) { + // next is in rq_active (possibly after swap) — remove it. + rq_dequeue(rq_active, next); + } + + if (!next) { + // Nothing in runqueues. If prev is still runnable, keep it. + if (prev->state == PROCESS_READY) { + rq_dequeue(rq_expired, prev); + next = prev; + } else { + // Fall back to idle (PID 0). + struct process* it = ready_queue_head; + next = it; + if (it) { + const struct process* start = it; + do { + if (it->pid == 0) { next = it; break; } + it = it->next; + } while (it && it != start); + } + } + } + + if (prev == next) { + prev->state = PROCESS_RUNNING; + spin_unlock_irqrestore(&sched_lock, irq_flags); + return; } current_process = next; @@ -487,7 +605,6 @@ void schedule(void) { hal_cpu_set_address_space(current_process->addr_space); } - // For ring3->ring0 transitions, esp0 must point to the top of the kernel stack. if (current_process->kernel_stack) { hal_cpu_set_kernel_stack((uintptr_t)current_process->kernel_stack + 4096); } @@ -496,9 +613,6 @@ void schedule(void) { context_switch(&prev->sp, current_process->sp); - // Do not restore the old IF state after switching stacks. - // The previous context may have entered schedule() with IF=0 (e.g. syscall/ISR), - // and propagating that would prevent timer/keyboard IRQs from firing. hal_cpu_enable_interrupts(); } @@ -544,7 +658,7 @@ void process_wake_check(uint32_t current_tick) { if (iter->state == PROCESS_SLEEPING) { if (current_tick >= iter->wake_at_tick) { iter->state = PROCESS_READY; - // uart_print("Woke up PID "); + rq_enqueue(rq_active, iter); } } iter = iter->next; diff --git a/src/kernel/syscall.c b/src/kernel/syscall.c index f9a0480..41739fd 100644 --- a/src/kernel/syscall.c +++ b/src/kernel/syscall.c @@ -16,6 +16,8 @@ #include "elf.h" #include "stat.h" #include "vmm.h" +#include "pmm.h" +#include "timer.h" #include "hal/cpu.h" @@ -23,9 +25,17 @@ enum { O_NONBLOCK = 0x800, + O_CLOEXEC = 0x80000, }; enum { + FD_CLOEXEC = 1, +}; + +enum { + FCNTL_F_DUPFD = 0, + FCNTL_F_GETFD = 1, + FCNTL_F_SETFD = 2, FCNTL_F_GETFL = 3, FCNTL_F_SETFL = 4, }; @@ -160,7 +170,7 @@ static int syscall_fork_impl(struct registers* regs) { current_process->addr_space = src_as; } - uintptr_t child_as = vmm_as_clone_user(src_as); + uintptr_t child_as = vmm_as_clone_user_cow(src_as); if (!child_as) return -ENOMEM; struct registers child_regs = *regs; @@ -172,11 +182,15 @@ static int syscall_fork_impl(struct registers* regs) { return -ENOMEM; } + child->heap_start = current_process->heap_start; + child->heap_break = current_process->heap_break; + for (int fd = 0; fd < PROCESS_MAX_FILES; fd++) { struct file* f = current_process->files[fd]; if (!f) continue; f->refcount++; child->files[fd] = f; + child->fd_flags[fd] = current_process->fd_flags[fd]; } return (int)child->pid; @@ -480,10 +494,14 @@ static int syscall_pipe2_impl(int* user_fds, uint32_t flags) { if (!current_process) return -ECHILD; if (kfds[0] >= 0 && kfds[0] < PROCESS_MAX_FILES && current_process->files[kfds[0]]) { - current_process->files[kfds[0]]->flags = flags; + current_process->files[kfds[0]]->flags = flags & ~O_CLOEXEC; } if (kfds[1] >= 0 && kfds[1] < PROCESS_MAX_FILES && current_process->files[kfds[1]]) { - current_process->files[kfds[1]]->flags = flags; + current_process->files[kfds[1]]->flags = flags & ~O_CLOEXEC; + } + if (flags & O_CLOEXEC) { + if (kfds[0] >= 0 && kfds[0] < PROCESS_MAX_FILES) current_process->fd_flags[kfds[0]] = FD_CLOEXEC; + if (kfds[1] >= 0 && kfds[1] < PROCESS_MAX_FILES) current_process->fd_flags[kfds[1]] = FD_CLOEXEC; } if (copy_to_user(user_fds, kfds, sizeof(kfds)) < 0) { @@ -638,7 +656,8 @@ static int syscall_execve_impl(struct registers* regs, const char* user_path, co uintptr_t entry = 0; uintptr_t user_sp = 0; uintptr_t new_as = 0; - if (elf32_load_user_from_initrd(path, &entry, &user_sp, &new_as) != 0) { + uintptr_t heap_brk = 0; + if (elf32_load_user_from_initrd(path, &entry, &user_sp, &new_as, &heap_brk) != 0) { ret = -EINVAL; goto out; } @@ -654,6 +673,8 @@ static int syscall_execve_impl(struct registers* regs, const char* user_path, co } current_process->addr_space = new_as; + current_process->heap_start = heap_brk; + current_process->heap_break = heap_brk; vmm_as_activate(new_as); // Build a minimal initial user stack: argc, argv pointers, envp pointers, strings. @@ -699,6 +720,13 @@ static int syscall_execve_impl(struct registers* regs, const char* user_path, co (void)argv_va; (void)envp_va; + for (int i = 0; i < PROCESS_MAX_FILES; i++) { + if (current_process->fd_flags[i] & FD_CLOEXEC) { + (void)fd_close(i); + current_process->fd_flags[i] = 0; + } + } + if (old_as && old_as != new_as) { vmm_as_destroy(old_as); } @@ -848,6 +876,9 @@ static int syscall_open_impl(const char* user_path, uint32_t flags) { kfree(f); return -EMFILE; } + if ((flags & O_CLOEXEC) && current_process) { + current_process->fd_flags[fd] = FD_CLOEXEC; + } return fd; } @@ -861,11 +892,19 @@ static int syscall_fcntl_impl(int fd, int cmd, uint32_t arg) { struct file* f = fd_get(fd); if (!f) return -EBADF; + if (cmd == FCNTL_F_GETFD) { + if (!current_process) return 0; + return (int)current_process->fd_flags[fd]; + } + if (cmd == FCNTL_F_SETFD) { + if (!current_process) return -EINVAL; + current_process->fd_flags[fd] = (uint8_t)(arg & FD_CLOEXEC); + return 0; + } if (cmd == FCNTL_F_GETFL) { return (int)f->flags; } if (cmd == FCNTL_F_SETFL) { - // Minimal: allow toggling O_NONBLOCK only. uint32_t keep = f->flags & ~O_NONBLOCK; uint32_t set = arg & O_NONBLOCK; f->flags = keep | set; @@ -1352,6 +1391,206 @@ static int syscall_sigreturn_impl(struct registers* regs, const struct sigframe* return 0; } +struct timespec { + uint32_t tv_sec; + uint32_t tv_nsec; +}; + +enum { + CLOCK_REALTIME = 0, + CLOCK_MONOTONIC = 1, +}; + +static int syscall_nanosleep_impl(const struct timespec* user_req, struct timespec* user_rem) { + if (!user_req) return -EFAULT; + if (user_range_ok(user_req, sizeof(struct timespec)) == 0) return -EFAULT; + + struct timespec req; + if (copy_from_user(&req, user_req, sizeof(req)) < 0) return -EFAULT; + + if (req.tv_nsec >= 1000000000U) return -EINVAL; + + const uint32_t TICK_MS = 20; + uint32_t ms = req.tv_sec * 1000U + req.tv_nsec / 1000000U; + uint32_t ticks = (ms + TICK_MS - 1) / TICK_MS; + if (ticks == 0 && (req.tv_sec > 0 || req.tv_nsec > 0)) ticks = 1; + + if (ticks > 0) { + process_sleep(ticks); + } + + if (user_rem) { + if (user_range_ok(user_rem, sizeof(struct timespec)) != 0) { + struct timespec rem = {0, 0}; + (void)copy_to_user(user_rem, &rem, sizeof(rem)); + } + } + + return 0; +} + +static int syscall_clock_gettime_impl(uint32_t clk_id, struct timespec* user_tp) { + if (!user_tp) return -EFAULT; + if (user_range_ok(user_tp, sizeof(struct timespec)) == 0) return -EFAULT; + + if (clk_id != CLOCK_REALTIME && clk_id != CLOCK_MONOTONIC) return -EINVAL; + + uint32_t ticks = get_tick_count(); + const uint32_t TICK_MS = 20; + uint32_t total_ms = ticks * TICK_MS; + + struct timespec tp; + tp.tv_sec = total_ms / 1000U; + tp.tv_nsec = (total_ms % 1000U) * 1000000U; + + if (copy_to_user(user_tp, &tp, sizeof(tp)) < 0) return -EFAULT; + return 0; +} + +enum { + PROT_NONE = 0x0, + PROT_READ = 0x1, + PROT_WRITE = 0x2, + PROT_EXEC = 0x4, +}; + +enum { + MAP_SHARED = 0x01, + MAP_PRIVATE = 0x02, + MAP_FIXED = 0x10, + MAP_ANONYMOUS = 0x20, +}; + +static uintptr_t mmap_find_free(uint32_t length) { + if (!current_process) return 0; + const uintptr_t MMAP_BASE = 0x40000000U; + const uintptr_t MMAP_END = 0x7FF00000U; + + for (uintptr_t candidate = MMAP_BASE; candidate + length <= MMAP_END; candidate += 0x1000U) { + int overlap = 0; + for (int i = 0; i < PROCESS_MAX_MMAPS; i++) { + if (current_process->mmaps[i].length == 0) continue; + uintptr_t mb = current_process->mmaps[i].base; + uint32_t ml = current_process->mmaps[i].length; + if (candidate < mb + ml && candidate + length > mb) { + overlap = 1; + candidate = ((mb + ml + 0xFFFU) & ~(uintptr_t)0xFFFU) - 0x1000U; + break; + } + } + if (!overlap) return candidate; + } + return 0; +} + +static uintptr_t syscall_mmap_impl(uintptr_t addr, uint32_t length, uint32_t prot, + uint32_t flags, int fd, uint32_t offset) { + (void)offset; + if (!current_process) return (uintptr_t)-EINVAL; + if (length == 0) return (uintptr_t)-EINVAL; + + if (!(flags & MAP_ANONYMOUS)) return (uintptr_t)-ENOSYS; + if (fd != -1) return (uintptr_t)-EINVAL; + + uint32_t aligned_len = (length + 0xFFFU) & ~(uint32_t)0xFFFU; + + uintptr_t base; + if (flags & MAP_FIXED) { + if (addr == 0 || (addr & 0xFFF)) return (uintptr_t)-EINVAL; + if (addr >= 0xC0000000U) return (uintptr_t)-EINVAL; + base = addr; + } else { + base = mmap_find_free(aligned_len); + if (!base) return (uintptr_t)-ENOMEM; + } + + int slot = -1; + for (int i = 0; i < PROCESS_MAX_MMAPS; i++) { + if (current_process->mmaps[i].length == 0) { slot = i; break; } + } + if (slot < 0) return (uintptr_t)-ENOMEM; + + uint32_t vmm_flags = VMM_FLAG_PRESENT | VMM_FLAG_USER; + if (prot & PROT_WRITE) vmm_flags |= VMM_FLAG_RW; + + for (uintptr_t va = base; va < base + aligned_len; va += 0x1000U) { + void* frame = pmm_alloc_page(); + if (!frame) return (uintptr_t)-ENOMEM; + vmm_map_page((uint64_t)(uintptr_t)frame, (uint64_t)va, vmm_flags); + memset((void*)va, 0, 0x1000U); + } + + current_process->mmaps[slot].base = base; + current_process->mmaps[slot].length = aligned_len; + + return base; +} + +static int syscall_munmap_impl(uintptr_t addr, uint32_t length) { + if (!current_process) return -EINVAL; + if (addr == 0 || (addr & 0xFFF)) return -EINVAL; + if (length == 0) return -EINVAL; + + uint32_t aligned_len = (length + 0xFFFU) & ~(uint32_t)0xFFFU; + + int found = -1; + for (int i = 0; i < PROCESS_MAX_MMAPS; i++) { + if (current_process->mmaps[i].base == addr && + current_process->mmaps[i].length == aligned_len) { + found = i; + break; + } + } + if (found < 0) return -EINVAL; + + for (uintptr_t va = addr; va < addr + aligned_len; va += 0x1000U) { + vmm_unmap_page((uint64_t)va); + } + + current_process->mmaps[found].base = 0; + current_process->mmaps[found].length = 0; + return 0; +} + +static uintptr_t syscall_brk_impl(uintptr_t addr) { + if (!current_process) return 0; + + if (addr == 0) { + return current_process->heap_break; + } + + const uintptr_t X86_KERN_BASE = 0xC0000000U; + const uintptr_t USER_STACK_BASE = 0x00800000U; + + if (addr < current_process->heap_start) return current_process->heap_break; + if (addr >= USER_STACK_BASE) return current_process->heap_break; + if (addr >= X86_KERN_BASE) return current_process->heap_break; + + uintptr_t old_brk = current_process->heap_break; + uintptr_t new_brk = (addr + 0xFFFU) & ~(uintptr_t)0xFFFU; + uintptr_t old_brk_page = (old_brk + 0xFFFU) & ~(uintptr_t)0xFFFU; + + if (new_brk > old_brk_page) { + for (uintptr_t va = old_brk_page; va < new_brk; va += 0x1000U) { + void* frame = pmm_alloc_page(); + if (!frame) { + return current_process->heap_break; + } + vmm_as_map_page(current_process->addr_space, + (uint64_t)(uintptr_t)frame, (uint64_t)va, + VMM_FLAG_PRESENT | VMM_FLAG_RW | VMM_FLAG_USER); + memset((void*)va, 0, 0x1000U); + } + } else if (new_brk < old_brk_page) { + for (uintptr_t va = new_brk; va < old_brk_page; va += 0x1000U) { + vmm_unmap_page((uint64_t)va); + } + } + + current_process->heap_break = addr; + return addr; +} + static void syscall_handler(struct registers* regs) { uint32_t syscall_no = regs->eax; @@ -1667,6 +1906,43 @@ static void syscall_handler(struct registers* regs) { return; } + if (syscall_no == SYSCALL_BRK) { + uintptr_t addr = (uintptr_t)regs->ebx; + regs->eax = (uint32_t)syscall_brk_impl(addr); + return; + } + + if (syscall_no == SYSCALL_NANOSLEEP) { + const struct timespec* req = (const struct timespec*)regs->ebx; + struct timespec* rem = (struct timespec*)regs->ecx; + regs->eax = (uint32_t)syscall_nanosleep_impl(req, rem); + return; + } + + if (syscall_no == SYSCALL_CLOCK_GETTIME) { + uint32_t clk_id = regs->ebx; + struct timespec* tp = (struct timespec*)regs->ecx; + regs->eax = (uint32_t)syscall_clock_gettime_impl(clk_id, tp); + return; + } + + if (syscall_no == SYSCALL_MMAP) { + uintptr_t addr = (uintptr_t)regs->ebx; + uint32_t length = regs->ecx; + uint32_t prot = regs->edx; + uint32_t mflags = regs->esi; + int fd = (int)regs->edi; + regs->eax = (uint32_t)syscall_mmap_impl(addr, length, prot, mflags, fd, 0); + return; + } + + if (syscall_no == SYSCALL_MUNMAP) { + uintptr_t addr = (uintptr_t)regs->ebx; + uint32_t length = regs->ecx; + regs->eax = (uint32_t)syscall_munmap_impl(addr, length); + return; + } + regs->eax = (uint32_t)-ENOSYS; } diff --git a/src/kernel/tty.c b/src/kernel/tty.c index a917e9e..b813000 100644 --- a/src/kernel/tty.c +++ b/src/kernel/tty.c @@ -26,7 +26,9 @@ static struct process* waitq[TTY_WAITQ_MAX]; static uint32_t waitq_head = 0; static uint32_t waitq_tail = 0; -static uint32_t tty_lflag = TTY_ICANON | TTY_ECHO; +static uint32_t tty_lflag = TTY_ICANON | TTY_ECHO | TTY_ISIG; + +static struct winsize tty_winsize = { 24, 80, 0, 0 }; static uint32_t tty_session_id = 0; static uint32_t tty_fg_pgrp = 0; @@ -165,6 +167,8 @@ enum { TTY_TCSETS = 0x5402, TTY_TIOCGPGRP = 0x540F, TTY_TIOCSPGRP = 0x5410, + TTY_TIOCGWINSZ = 0x5413, + TTY_TIOCSWINSZ = 0x5414, }; int tty_ioctl(uint32_t cmd, void* user_arg) { @@ -206,6 +210,7 @@ int tty_ioctl(uint32_t cmd, void* user_arg) { if (cmd == TTY_TCGETS) { struct termios t; + memset(&t, 0, sizeof(t)); uintptr_t flags = spin_lock_irqsave(&tty_lock); t.c_lflag = tty_lflag; spin_unlock_irqrestore(&tty_lock, flags); @@ -217,11 +222,23 @@ int tty_ioctl(uint32_t cmd, void* user_arg) { struct termios t; if (copy_from_user(&t, user_arg, sizeof(t)) < 0) return -EFAULT; uintptr_t flags = spin_lock_irqsave(&tty_lock); - tty_lflag = t.c_lflag & (TTY_ICANON | TTY_ECHO); + tty_lflag = t.c_lflag & (TTY_ICANON | TTY_ECHO | TTY_ISIG); spin_unlock_irqrestore(&tty_lock, flags); return 0; } + if (cmd == TTY_TIOCGWINSZ) { + if (user_range_ok(user_arg, sizeof(struct winsize)) == 0) return -EFAULT; + if (copy_to_user(user_arg, &tty_winsize, sizeof(tty_winsize)) < 0) return -EFAULT; + return 0; + } + + if (cmd == TTY_TIOCSWINSZ) { + if (user_range_ok(user_arg, sizeof(struct winsize)) == 0) return -EFAULT; + if (copy_from_user(&tty_winsize, user_arg, sizeof(tty_winsize)) < 0) return -EFAULT; + return 0; + } + return -EINVAL; } @@ -229,6 +246,56 @@ void tty_input_char(char c) { uintptr_t flags = spin_lock_irqsave(&tty_lock); uint32_t lflag = tty_lflag; + enum { SIGINT_NUM = 2, SIGQUIT_NUM = 3, SIGTSTP_NUM = 20 }; + + if (lflag & TTY_ISIG) { + if (c == 0x03) { + spin_unlock_irqrestore(&tty_lock, flags); + if (lflag & TTY_ECHO) { + uart_print("^C\n"); + } + if (tty_fg_pgrp != 0) { + process_kill_pgrp(tty_fg_pgrp, SIGINT_NUM); + } + return; + } + + if (c == 0x1C) { + spin_unlock_irqrestore(&tty_lock, flags); + if (lflag & TTY_ECHO) { + uart_print("^\\\n"); + } + if (tty_fg_pgrp != 0) { + process_kill_pgrp(tty_fg_pgrp, SIGQUIT_NUM); + } + return; + } + + if (c == 0x1A) { + spin_unlock_irqrestore(&tty_lock, flags); + if (lflag & TTY_ECHO) { + uart_print("^Z\n"); + } + if (tty_fg_pgrp != 0) { + process_kill_pgrp(tty_fg_pgrp, SIGTSTP_NUM); + } + return; + } + } + + if (c == 0x04 && (lflag & TTY_ICANON)) { + if (lflag & TTY_ECHO) { + uart_print("^D"); + } + for (uint32_t i = 0; i < line_len; i++) { + canon_push(line_buf[i]); + } + line_len = 0; + tty_wake_one(); + spin_unlock_irqrestore(&tty_lock, flags); + return; + } + if ((lflag & TTY_ICANON) == 0) { if (c == '\r') c = '\n'; canon_push(c); diff --git a/src/mm/pmm.c b/src/mm/pmm.c index 4b6ad23..b6cbf59 100644 --- a/src/mm/pmm.c +++ b/src/mm/pmm.c @@ -18,6 +18,7 @@ extern uint8_t _end; #define BITMAP_SIZE (MAX_RAM_SIZE / PAGE_SIZE / 8) static uint8_t memory_bitmap[BITMAP_SIZE]; +static uint16_t frame_refcount[MAX_RAM_SIZE / PAGE_SIZE]; static uint64_t total_memory = 0; static uint64_t used_memory = 0; static uint64_t max_frames = 0; @@ -313,6 +314,7 @@ void* pmm_alloc_page(void) { if (!bitmap_test(i)) { bitmap_set(i); + frame_refcount[i] = 1; used_memory += PAGE_SIZE; last_alloc_frame = i + 1; if (last_alloc_frame >= max_frames) last_alloc_frame = 1; @@ -325,6 +327,38 @@ void* pmm_alloc_page(void) { void pmm_free_page(void* ptr) { uintptr_t addr = (uintptr_t)ptr; uint64_t frame = addr / PAGE_SIZE; + if (frame == 0 || frame >= max_frames) return; + + uint16_t rc = frame_refcount[frame]; + if (rc > 1) { + __sync_sub_and_fetch(&frame_refcount[frame], 1); + return; + } + + frame_refcount[frame] = 0; bitmap_unset(frame); used_memory -= PAGE_SIZE; } + +void pmm_incref(uintptr_t paddr) { + uint64_t frame = paddr / PAGE_SIZE; + if (frame == 0 || frame >= max_frames) return; + __sync_fetch_and_add(&frame_refcount[frame], 1); +} + +uint16_t pmm_decref(uintptr_t paddr) { + uint64_t frame = paddr / PAGE_SIZE; + if (frame == 0 || frame >= max_frames) return 0; + uint16_t new_val = __sync_sub_and_fetch(&frame_refcount[frame], 1); + if (new_val == 0) { + bitmap_unset(frame); + used_memory -= PAGE_SIZE; + } + return new_val; +} + +uint16_t pmm_get_refcount(uintptr_t paddr) { + uint64_t frame = paddr / PAGE_SIZE; + if (frame >= max_frames) return 0; + return frame_refcount[frame]; +} diff --git a/src/mm/slab.c b/src/mm/slab.c new file mode 100644 index 0000000..9edd0bd --- /dev/null +++ b/src/mm/slab.c @@ -0,0 +1,82 @@ +#include "slab.h" +#include "pmm.h" +#include "uart_console.h" + +#include + +struct slab_free_node { + struct slab_free_node* next; +}; + +void slab_cache_init(slab_cache_t* cache, const char* name, uint32_t obj_size) { + if (!cache) return; + cache->name = name; + if (obj_size < sizeof(struct slab_free_node)) { + obj_size = sizeof(struct slab_free_node); + } + cache->obj_size = (obj_size + 7U) & ~7U; + cache->objs_per_slab = PAGE_SIZE / cache->obj_size; + cache->free_list = NULL; + cache->total_allocs = 0; + cache->total_frees = 0; + spinlock_init(&cache->lock); +} + +static int slab_grow(slab_cache_t* cache) { + void* page = pmm_alloc_page(); + if (!page) return -1; + + uint8_t* base = (uint8_t*)(uintptr_t)page; + + /* In higher-half kernel the physical page needs to be accessible. + * For simplicity we assume the kernel heap region or identity-mapped + * low memory is used. We map via the kernel virtual address. */ + /* TODO: For pages above 4MB, a proper kernel mapping is needed. + * For now, slab pages come from pmm_alloc_page which returns + * physical addresses. We need to convert to virtual. */ + + /* Use kernel virtual = phys + 0xC0000000 for higher-half */ + uint8_t* vbase = base + 0xC0000000U; + + for (uint32_t i = 0; i < cache->objs_per_slab; i++) { + struct slab_free_node* node = (struct slab_free_node*)(vbase + i * cache->obj_size); + node->next = (struct slab_free_node*)cache->free_list; + cache->free_list = node; + } + + return 0; +} + +void* slab_alloc(slab_cache_t* cache) { + if (!cache) return NULL; + + uintptr_t flags = spin_lock_irqsave(&cache->lock); + + if (!cache->free_list) { + if (slab_grow(cache) < 0) { + spin_unlock_irqrestore(&cache->lock, flags); + return NULL; + } + } + + struct slab_free_node* node = (struct slab_free_node*)cache->free_list; + cache->free_list = node->next; + cache->total_allocs++; + + spin_unlock_irqrestore(&cache->lock, flags); + + return (void*)node; +} + +void slab_free(slab_cache_t* cache, void* obj) { + if (!cache || !obj) return; + + uintptr_t flags = spin_lock_irqsave(&cache->lock); + + struct slab_free_node* node = (struct slab_free_node*)obj; + node->next = (struct slab_free_node*)cache->free_list; + cache->free_list = node; + cache->total_frees++; + + spin_unlock_irqrestore(&cache->lock, flags); +}