獨孤派作業系統 main memory 中正大學 作業系統實驗室 指導教授:羅習五
負責助教 作業目標: 懶人包: 了解Linux的paging機制 了解x86的paging機制 https://goo.gl/cSD93g https://goo.gl/Fy3cXR 和上一次作業相同
設定核心 make menuconfig 進入kernel hacking,並選擇下面二個選項 再次確認 <*> Export kernel pagetable layout to userspace via debugfs [*] Dump the EFI pagetable 這二個選項預設不會打開,因為打開以後可能會造成系統漏洞 再次確認 > cat .config | grep CONFIG_X86_PTDUMP CONFIG_X86_PTDUMP_CORE=y CONFIG_X86_PTDUMP=y
預備編譯核心的環境 $ sudo apt install git flex bison bc libssl-dev gawk libudev-dev ocl-icd-opencl-dev libpci-dev libelf- dev python2.7 libncurses-dev fakeroot kernel-wedge binfmt-support ksh lsscsi binfmt-support libpcre16-3 libpcre3-dev libpcre32-3 libpcrecpp0v5 libsepol1-dev libattr1-dev libblkid-dev libpcre16-3 libpcre3-dev libpcre32-3 libpcrecpp0v5 libselinux1-dev libsepol1-dev uuid-dev debugedit libarchive13 libdw1 liblua5.2-0 liblzo2-2 libnspr4 libnss3 librpm8 librpmbuild8 librpmio8 librpmsign8 rpm rpm-common rpm2cpio spl-dkms kernel-package
編譯核心 sudo update-alternatives --config gcc /*如果編譯kernel 4.0,請選擇gcc 4.8*/ make -j8 /*-j8代表產生8個task讓系統平行化編譯核心*/ /*一般來說,4核心就選8或16編譯的整體速度較快*/ /*如果使用QEMU,下面三個步驟不用做*/ make modules sudo make modules_install sudo make install
核心記憶體 如果是kernel 4.0請使用 右方的指令 pgd是512GB的映射 pud是1GB的映射 pmd是2MB的映射 cat /sys/kernel/debug/kernel_page_tables ---[ User Space ]--- 0x0000000000000000-0xffff800000000000 16777088T pgd ---[ Kernel Space ]--- 0xffff800000000000-0xffff880000000000 8T pgd ---[ Low Kernel Mapping ]--- 0xffff880000000000-0xffff880000099000 612K RW GLB NX pte 0xffff880000099000-0xffff88000009a000 4K ro GLB NX pte 0xffff88000009a000-0xffff88000009b000 4K ro GLB x pte 0xffff88000009b000-0xffff880000200000 1428K RW GLB NX pte 0xffff880000200000-0xffff880001000000 14M RW PSE GLB NX pmd 0xffff880001000000-0xffff880001c00000 12M ro PSE GLB NX pmd 0xffff880001c00000-0xffff880001d12000 1096K ro GLB NX pte 0xffff880001d12000-0xffff880001e00000 952K RW GLB NX pte 0xffff880001e00000-0xffff880002000000 2M ro PSE GLB NX pmd 0xffff880002000000-0xffff8800021d2000 1864K ro GLB NX pte 0xffff8800021d2000-0xffff880002600000 4280K RW GLB NX pte 0xffff880002600000-0xffff88000fc00000 214M RW PSE GLB NX pmd 0xffff88000fc00000-0xffff88000ffe0000 3968K RW GLB NX pte 0xffff88000ffe0000-0xffff880010000000 128K pte 0xffff880010000000-0xffff880040000000 768M pmd 0xffff880040000000-0xffff888000000000 511G pud 0xffff888000000000-0xffffc90000000000 66048G pgd /*後面有些被截斷*/ 核心記憶體 如果是kernel 4.0請使用 右方的指令 pgd是512GB的映射 pud是1GB的映射 pmd是2MB的映射 pte是4KB的映射
核心記憶體 如果是kernel 4.19請使 用右方的指令 pud是1GB的映射 pmd是2MB的映射 pte是4KB的映射 cat /sys/kernel/debug/page_tables/kernel ---[ User Space ]--- 0x0000000000000000-0xffff800000000000 16777088T pgd ---[ Kernel Space ]--- 0xffff800000000000-0xffff880000000000 8T pgd ---[ Low Kernel Mapping ]--- 0xffff880000000000-0xffff880000099000 612K RW GLB NX pte 0xffff880000099000-0xffff88000009a000 4K ro GLB NX pte 0xffff88000009a000-0xffff88000009b000 4K ro GLB x pte 0xffff88000009b000-0xffff880000200000 1428K RW GLB NX pte 0xffff880000200000-0xffff880001000000 14M RW PSE GLB NX pmd 0xffff880001000000-0xffff880001c00000 12M ro PSE GLB NX pmd 0xffff880001c00000-0xffff880001d12000 1096K ro GLB NX pte 0xffff880001d12000-0xffff880001e00000 952K RW GLB NX pte 0xffff880001e00000-0xffff880002000000 2M ro PSE GLB NX pmd 0xffff880002000000-0xffff8800021d2000 1864K ro GLB NX pte 0xffff8800021d2000-0xffff880002600000 4280K RW GLB NX pte 0xffff880002600000-0xffff88000fc00000 214M RW PSE GLB NX pmd 0xffff88000fc00000-0xffff88000ffe0000 3968K RW GLB NX pte 0xffff88000ffe0000-0xffff880010000000 128K pte 0xffff880010000000-0xffff880040000000 768M pmd 0xffff880040000000-0xffff888000000000 511G pud 0xffff888000000000-0xffffc90000000000 66048G pgd /*後面有些被截斷*/ 核心記憶體 如果是kernel 4.19請使 用右方的指令 pud是1GB的映射 pmd是2MB的映射 pte是4KB的映射
paging with multiple page size: x86-Linux,pte page pgd_offset, 9 bits pud_offset, 9 bits pmd_offset, 9 bits pte_offset, 9 bits offset, 12 bits 4k data page cr3 task_struct->mm->pgd
paging with multiple page size: x86-Linux,pmd page pgd_offset, 9 bits pud_offset, 9 bits pmd_offset, 9 bits offset, 12+9 = 21 bits 2M data page cr3 task_struct->mm->pgd
paging with multiple page size: x86-Linux,pud page pgd_offset, 9 bits pud_offset, 9 bits offset, 12+9+9 = 30 bits 1GB data page cr3 task_struct->mm->pgd
paging with multiple page size: x86-Linux ,pgd page (軟體上存在,硬體上沒有) pgd_offset, 9 bits offset, 12+9+9+9 = 39 bits 512 GB data page cr3 task_struct->mm->pgd
Linux如何印出kernel page table 程式碼在 /arch/x86/mm/dump_pagetables.c 以下投影片全部都用kernel 4.10舉例 4.19的程式碼註解比較清楚 但4.10是我們可以直接拿來用QEMU除錯的核心,因此用4.10舉例
dump_pagetables.c的初始化 /arch/x86/mm/dump_pagetables.c static int __init pt_dump_init(void) { //PAGE_OFFSET就是核心的開始位址,注意是virtual address address_markers[LOW_KERNEL_NR].start_address = PAGE_OFFSET; address_markers[VMALLOC_START_NR].start_address = VMALLOC_START; address_markers[VMEMMAP_START_NR].start_address = VMEMMAP_START; } //告訴核心,使用pt_dump_init這個函數初始化這個「功能模組」 __initcall(pt_dump_init);
註冊檔案操作 /arch/x86/mm/debug_pagetables.c static const struct file_operations ptdump_fops = { .owner = THIS_MODULE, .open = ptdump_open, /*當有人要打開這個檔案,就呼叫ptdump_open*/ .read = seq_read, /*當要read*/ .llseek = seq_lseek, /*當要lseek*/ .release = single_release, /*不需要這個檔案時,例如:close*/ /*沒有.write,表示這個檔案不支援寫入*/ }; static int __init pt_dump_debug_init(void){ /*告訴核心當有人要打開kernel_page_tables這個檔案時應該呼叫哪些函數,函數定義在ptdump_fops*/ pe = debugfs_create_file("kernel_page_tables", S_IRUSR, NULL, NULL, &ptdump_fops); } module_init(pt_dump_debug_init);
(gdb) b ptdump_show (QEMU) $ cat /sys/kernel/debug/kernel_page_tables
附錄:記憶體位址轉換 //for kernel mapping only phys_addr = virt_to_phys(virt_addr); virt_addr = phys_to_virt(phys_addr); bus_addr = virt_to_bus(virt_addr); virt_addr = bus_to_virt(bus_addr); #define virt_to_bus virt_to_phys #define bus_to_virt phys_to_virt void *phys_to_virt(phys_addr_t address) { return __va(address); } #define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET)) phys_addr_t virt_to_phys(volatile void *address) { return __pa(address);} #define __pa(x) __phys_addr((unsigned long)(x))
cr2 = read_cr2(); cr3 = read_cr3(); https://elixir.bootlin.com/linux/v4.0/source/arch/x86/mm/dump_pagetables.c
control registers https://en.wikipedia.org/wiki/Control_register 我們只用到CR2與CR3
與MMU相關的暫存器 - CR2與CR3
4KB page
2MB page
1GB page
CR3 & PTE
format of 4KB page table entry
關於x86-64的paging機制 更詳細的資料請參考Intel技術手 冊 Volume 3, Chapter 4.5 https://goo.gl/4w8qxr 閱讀文件時,請注意一下日期, 如右圖紅色的標示
Default fault handlers do_anonymous_page: no page and no file do_linear_fault: vm_ops registered? do_swap_page: page backed by swap do_nonlinear_fault: page backed by file do_wp_page: write protected page (CoW)