Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'futex-fixes' (futex fixes from Thomas Gleixner)

Merge futex fixes from Thomas Gleixner:
"So with more awake and less futex wreckaged brain, I went through my
list of points again and came up with the following 4 patches.

1) Prevent pi requeueing on the same futex

I kept Kees check for uaddr1 == uaddr2 as a early check for private
futexes and added a key comparison to both futex_requeue and
futex_wait_requeue_pi.

Sebastian, sorry for the confusion yesterday night. I really
misunderstood your question.

You are right the check is pointless for shared futexes where the
same physical address is mapped to two different virtual addresses.

2) Sanity check atomic acquisiton in futex_lock_pi_atomic

That's basically what Darren suggested.

I just simplified it to use futex_top_waiter() to find kernel
internal state. If state is found return -EINVAL and do not bother
to fix up the user space variable. It's corrupted already.

3) Ensure state consistency in futex_unlock_pi

The code is silly versus the owner died bit. There is no point to
preserve it on unlock when the user space thread owns the futex.

What's worse is that it does not update the user space value when
the owner died bit is set. So the kernel itself creates observable
inconsistency.

Another "optimization" is to retry an atomic unlock. That's
pointless as in a sane environment user space would not call into
that code if it could have unlocked it atomically. So we always
check whether there is kernel state around and only if there is
none, we do the unlock by setting the user space value to 0.

4) Sanitize lookup_pi_state

lookup_pi_state is ambigous about TID == 0 in the user space value.

This can be a valid state even if there is kernel state on this
uaddr, but we miss a few corner case checks.

I tried to come up with a smaller solution hacking the checks into
the current cruft, but it turned out to be ugly as hell and I got
more confused than I was before. So I rewrote the sanity checks
along the state documentation with awful lots of commentry"

* emailed patches from Thomas Gleixner <tglx@linutronix.de>:
futex: Make lookup_pi_state more robust
futex: Always cleanup owner tid in unlock_pi
futex: Validate atomic acquisition in futex_lock_pi_atomic()
futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)

+160 -53
+160 -53
kernel/futex.c
··· 743 743 raw_spin_unlock_irq(&curr->pi_lock); 744 744 } 745 745 746 + /* 747 + * We need to check the following states: 748 + * 749 + * Waiter | pi_state | pi->owner | uTID | uODIED | ? 750 + * 751 + * [1] NULL | --- | --- | 0 | 0/1 | Valid 752 + * [2] NULL | --- | --- | >0 | 0/1 | Valid 753 + * 754 + * [3] Found | NULL | -- | Any | 0/1 | Invalid 755 + * 756 + * [4] Found | Found | NULL | 0 | 1 | Valid 757 + * [5] Found | Found | NULL | >0 | 1 | Invalid 758 + * 759 + * [6] Found | Found | task | 0 | 1 | Valid 760 + * 761 + * [7] Found | Found | NULL | Any | 0 | Invalid 762 + * 763 + * [8] Found | Found | task | ==taskTID | 0/1 | Valid 764 + * [9] Found | Found | task | 0 | 0 | Invalid 765 + * [10] Found | Found | task | !=taskTID | 0/1 | Invalid 766 + * 767 + * [1] Indicates that the kernel can acquire the futex atomically. We 768 + * came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit. 769 + * 770 + * [2] Valid, if TID does not belong to a kernel thread. If no matching 771 + * thread is found then it indicates that the owner TID has died. 772 + * 773 + * [3] Invalid. The waiter is queued on a non PI futex 774 + * 775 + * [4] Valid state after exit_robust_list(), which sets the user space 776 + * value to FUTEX_WAITERS | FUTEX_OWNER_DIED. 777 + * 778 + * [5] The user space value got manipulated between exit_robust_list() 779 + * and exit_pi_state_list() 780 + * 781 + * [6] Valid state after exit_pi_state_list() which sets the new owner in 782 + * the pi_state but cannot access the user space value. 783 + * 784 + * [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set. 785 + * 786 + * [8] Owner and user space value match 787 + * 788 + * [9] There is no transient state which sets the user space TID to 0 789 + * except exit_robust_list(), but this is indicated by the 790 + * FUTEX_OWNER_DIED bit. See [4] 791 + * 792 + * [10] There is no transient state which leaves owner and user space 793 + * TID out of sync. 794 + */ 746 795 static int 747 796 lookup_pi_state(u32 uval, struct futex_hash_bucket *hb, 748 - union futex_key *key, struct futex_pi_state **ps, 749 - struct task_struct *task) 797 + union futex_key *key, struct futex_pi_state **ps) 750 798 { 751 799 struct futex_pi_state *pi_state = NULL; 752 800 struct futex_q *this, *next; ··· 804 756 plist_for_each_entry_safe(this, next, &hb->chain, list) { 805 757 if (match_futex(&this->key, key)) { 806 758 /* 807 - * Another waiter already exists - bump up 808 - * the refcount and return its pi_state: 759 + * Sanity check the waiter before increasing 760 + * the refcount and attaching to it. 809 761 */ 810 762 pi_state = this->pi_state; 811 763 /* 812 - * Userspace might have messed up non-PI and PI futexes 764 + * Userspace might have messed up non-PI and 765 + * PI futexes [3] 813 766 */ 814 767 if (unlikely(!pi_state)) 815 768 return -EINVAL; ··· 818 769 WARN_ON(!atomic_read(&pi_state->refcount)); 819 770 820 771 /* 821 - * When pi_state->owner is NULL then the owner died 822 - * and another waiter is on the fly. pi_state->owner 823 - * is fixed up by the task which acquires 824 - * pi_state->rt_mutex. 825 - * 826 - * We do not check for pid == 0 which can happen when 827 - * the owner died and robust_list_exit() cleared the 828 - * TID. 772 + * Handle the owner died case: 829 773 */ 830 - if (pid && pi_state->owner) { 774 + if (uval & FUTEX_OWNER_DIED) { 831 775 /* 832 - * Bail out if user space manipulated the 833 - * futex value. 776 + * exit_pi_state_list sets owner to NULL and 777 + * wakes the topmost waiter. The task which 778 + * acquires the pi_state->rt_mutex will fixup 779 + * owner. 834 780 */ 835 - if (pid != task_pid_vnr(pi_state->owner)) 781 + if (!pi_state->owner) { 782 + /* 783 + * No pi state owner, but the user 784 + * space TID is not 0. Inconsistent 785 + * state. [5] 786 + */ 787 + if (pid) 788 + return -EINVAL; 789 + /* 790 + * Take a ref on the state and 791 + * return. [4] 792 + */ 793 + goto out_state; 794 + } 795 + 796 + /* 797 + * If TID is 0, then either the dying owner 798 + * has not yet executed exit_pi_state_list() 799 + * or some waiter acquired the rtmutex in the 800 + * pi state, but did not yet fixup the TID in 801 + * user space. 802 + * 803 + * Take a ref on the state and return. [6] 804 + */ 805 + if (!pid) 806 + goto out_state; 807 + } else { 808 + /* 809 + * If the owner died bit is not set, 810 + * then the pi_state must have an 811 + * owner. [7] 812 + */ 813 + if (!pi_state->owner) 836 814 return -EINVAL; 837 815 } 838 816 839 817 /* 840 - * Protect against a corrupted uval. If uval 841 - * is 0x80000000 then pid is 0 and the waiter 842 - * bit is set. So the deadlock check in the 843 - * calling code has failed and we did not fall 844 - * into the check above due to !pid. 818 + * Bail out if user space manipulated the 819 + * futex value. If pi state exists then the 820 + * owner TID must be the same as the user 821 + * space TID. [9/10] 845 822 */ 846 - if (task && pi_state->owner == task) 847 - return -EDEADLK; 823 + if (pid != task_pid_vnr(pi_state->owner)) 824 + return -EINVAL; 848 825 826 + out_state: 849 827 atomic_inc(&pi_state->refcount); 850 828 *ps = pi_state; 851 - 852 829 return 0; 853 830 } 854 831 } 855 832 856 833 /* 857 834 * We are the first waiter - try to look up the real owner and attach 858 - * the new pi_state to it, but bail out when TID = 0 835 + * the new pi_state to it, but bail out when TID = 0 [1] 859 836 */ 860 837 if (!pid) 861 838 return -ESRCH; ··· 914 839 return ret; 915 840 } 916 841 842 + /* 843 + * No existing pi state. First waiter. [2] 844 + */ 917 845 pi_state = alloc_pi_state(); 918 846 919 847 /* ··· 988 910 return -EDEADLK; 989 911 990 912 /* 991 - * Surprise - we got the lock. Just return to userspace: 913 + * Surprise - we got the lock, but we do not trust user space at all. 992 914 */ 993 - if (unlikely(!curval)) 994 - return 1; 915 + if (unlikely(!curval)) { 916 + /* 917 + * We verify whether there is kernel state for this 918 + * futex. If not, we can safely assume, that the 0 -> 919 + * TID transition is correct. If state exists, we do 920 + * not bother to fixup the user space state as it was 921 + * corrupted already. 922 + */ 923 + return futex_top_waiter(hb, key) ? -EINVAL : 1; 924 + } 995 925 996 926 uval = curval; 997 927 ··· 1037 951 * We dont have the lock. Look up the PI state (or create it if 1038 952 * we are the first waiter): 1039 953 */ 1040 - ret = lookup_pi_state(uval, hb, key, ps, task); 954 + ret = lookup_pi_state(uval, hb, key, ps); 1041 955 1042 956 if (unlikely(ret)) { 1043 957 switch (ret) { ··· 1130 1044 struct task_struct *new_owner; 1131 1045 struct futex_pi_state *pi_state = this->pi_state; 1132 1046 u32 uninitialized_var(curval), newval; 1047 + int ret = 0; 1133 1048 1134 1049 if (!pi_state) 1135 1050 return -EINVAL; ··· 1154 1067 new_owner = this->task; 1155 1068 1156 1069 /* 1157 - * We pass it to the next owner. (The WAITERS bit is always 1158 - * kept enabled while there is PI state around. We must also 1159 - * preserve the owner died bit.) 1070 + * We pass it to the next owner. The WAITERS bit is always 1071 + * kept enabled while there is PI state around. We cleanup the 1072 + * owner died bit, because we are the owner. 1160 1073 */ 1161 - if (!(uval & FUTEX_OWNER_DIED)) { 1162 - int ret = 0; 1074 + newval = FUTEX_WAITERS | task_pid_vnr(new_owner); 1163 1075 1164 - newval = FUTEX_WAITERS | task_pid_vnr(new_owner); 1165 - 1166 - if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)) 1167 - ret = -EFAULT; 1168 - else if (curval != uval) 1169 - ret = -EINVAL; 1170 - if (ret) { 1171 - raw_spin_unlock(&pi_state->pi_mutex.wait_lock); 1172 - return ret; 1173 - } 1076 + if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)) 1077 + ret = -EFAULT; 1078 + else if (curval != uval) 1079 + ret = -EINVAL; 1080 + if (ret) { 1081 + raw_spin_unlock(&pi_state->pi_mutex.wait_lock); 1082 + return ret; 1174 1083 } 1175 1084 1176 1085 raw_spin_lock_irq(&pi_state->owner->pi_lock); ··· 1525 1442 1526 1443 if (requeue_pi) { 1527 1444 /* 1445 + * Requeue PI only works on two distinct uaddrs. This 1446 + * check is only valid for private futexes. See below. 1447 + */ 1448 + if (uaddr1 == uaddr2) 1449 + return -EINVAL; 1450 + 1451 + /* 1528 1452 * requeue_pi requires a pi_state, try to allocate it now 1529 1453 * without any locks in case it fails. 1530 1454 */ ··· 1568 1478 requeue_pi ? VERIFY_WRITE : VERIFY_READ); 1569 1479 if (unlikely(ret != 0)) 1570 1480 goto out_put_key1; 1481 + 1482 + /* 1483 + * The check above which compares uaddrs is not sufficient for 1484 + * shared futexes. We need to compare the keys: 1485 + */ 1486 + if (requeue_pi && match_futex(&key1, &key2)) { 1487 + ret = -EINVAL; 1488 + goto out_put_keys; 1489 + } 1571 1490 1572 1491 hb1 = hash_futex(&key1); 1573 1492 hb2 = hash_futex(&key2); ··· 1643 1544 * rereading and handing potential crap to 1644 1545 * lookup_pi_state. 1645 1546 */ 1646 - ret = lookup_pi_state(ret, hb2, &key2, &pi_state, NULL); 1547 + ret = lookup_pi_state(ret, hb2, &key2, &pi_state); 1647 1548 } 1648 1549 1649 1550 switch (ret) { ··· 2426 2327 /* 2427 2328 * To avoid races, try to do the TID -> 0 atomic transition 2428 2329 * again. If it succeeds then we can return without waking 2429 - * anyone else up: 2330 + * anyone else up. We only try this if neither the waiters nor 2331 + * the owner died bit are set. 2430 2332 */ 2431 - if (!(uval & FUTEX_OWNER_DIED) && 2333 + if (!(uval & ~FUTEX_TID_MASK) && 2432 2334 cmpxchg_futex_value_locked(&uval, uaddr, vpid, 0)) 2433 2335 goto pi_faulted; 2434 2336 /* ··· 2459 2359 /* 2460 2360 * No waiters - kernel unlocks the futex: 2461 2361 */ 2462 - if (!(uval & FUTEX_OWNER_DIED)) { 2463 - ret = unlock_futex_pi(uaddr, uval); 2464 - if (ret == -EFAULT) 2465 - goto pi_faulted; 2466 - } 2362 + ret = unlock_futex_pi(uaddr, uval); 2363 + if (ret == -EFAULT) 2364 + goto pi_faulted; 2467 2365 2468 2366 out_unlock: 2469 2367 spin_unlock(&hb->lock); ··· 2622 2524 ret = futex_wait_setup(uaddr, val, flags, &q, &hb); 2623 2525 if (ret) 2624 2526 goto out_key2; 2527 + 2528 + /* 2529 + * The check above which compares uaddrs is not sufficient for 2530 + * shared futexes. We need to compare the keys: 2531 + */ 2532 + if (match_futex(&q.key, &key2)) { 2533 + ret = -EINVAL; 2534 + goto out_put_keys; 2535 + } 2625 2536 2626 2537 /* Queue the futex_q, drop the hb lock, wait for wakeup. */ 2627 2538 futex_wait_queue_me(hb, &q, to);