mm: fix race in COW logic · tjh.dev/kernel@945754a

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

mm: fix race in COW logic

There is a race in the COW logic. It contains a shortcut to avoid the
COW and reuse the page if we have the sole reference on the page,
however it is possible to have two racing do_wp_page()ers with one
causing the other to mistakenly believe it is safe to take the shortcut
when it is not. This could lead to data corruption.

Process 1 and process2 each have a wp pte of the same anon page (ie.
one forked the other). The page's mapcount is 2. Then they both
attempt to write to it around the same time...

proc1 proc2 thr1 proc2 thr2
CPU0 CPU1 CPU3
do_wp_page() do_wp_page()
trylock_page()
can_share_swap_page()
load page mapcount (==2)
reuse = 0
pte unlock
copy page to new_page
pte lock
page_remove_rmap(page);
trylock_page()
can_share_swap_page()
load page mapcount (==1)
reuse = 1
ptep_set_access_flags (allow W)

write private key into page
read from page
ptep_clear_flush()
set_pte_at(pte of new_page)

Fix this by moving the page_remove_rmap of the old page after the pte
clear and flush. Potentially the entire branch could be moved down
here, but in order to stay consistent, I won't (should probably move all
the *_mm_counter stuff with one patch).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Nick Piggin and committed by

Linus Torvalds 18 years ago 945754a1 672ca28e

+26 -1

1 changed file

expand all

memory.c

+26 -1

mm/memory.c

··· 1785 1785 page_table = pte_offset_map_lock(mm, pmd, address, &ptl); 1786 1786 if (likely(pte_same(*page_table, orig_pte))) { 1787 1787 if (old_page) { 1788 - page_remove_rmap(old_page, vma); 1789 1788 if (!PageAnon(old_page)) { 1790 1789 dec_mm_counter(mm, file_rss); 1791 1790 inc_mm_counter(mm, anon_rss); ··· 1805 1806 update_mmu_cache(vma, address, entry); 1806 1807 lru_cache_add_active(new_page); 1807 1808 page_add_new_anon_rmap(new_page, vma, address); 1809 + 1810 + if (old_page) { 1811 + /* 1812 + * Only after switching the pte to the new page may 1813 + * we remove the mapcount here. Otherwise another 1814 + * process may come and find the rmap count decremented 1815 + * before the pte is switched to the new page, and 1816 + * "reuse" the old page writing into it while our pte 1817 + * here still points into it and can be read by other 1818 + * threads. 1819 + * 1820 + * The critical issue is to order this 1821 + * page_remove_rmap with the ptp_clear_flush above. 1822 + * Those stores are ordered by (if nothing else,) 1823 + * the barrier present in the atomic_add_negative 1824 + * in page_remove_rmap. 1825 + * 1826 + * Then the TLB flush in ptep_clear_flush ensures that 1827 + * no process can access the old page before the 1828 + * decremented mapcount is visible. And the old page 1829 + * cannot be reused until after the decremented 1830 + * mapcount is visible. So transitively, TLBs to 1831 + * old page will be flushed before it can be reused. 1832 + */ 1833 + page_remove_rmap(old_page, vma); 1834 + } 1808 1835 1809 1836 /* Free the old page.. */ 1810 1837 new_page = old_page;

Configure Feed

Configure Feed