switch-linux/kernel
Thomas Gleixner cdf71a10c7 futex: Prevent stale futex owner when interrupted/timeout
Roland Westrelin did a great analysis of a long standing thinko in the
return path of futex_lock_pi.

While we fixed the lock steal case long ago, which was easy to trigger,
we never had a test case which exposed this problem and stupidly never
thought about the reverse lock stealing scenario and the return to user
space with a stale state.

When a blocked tasks returns from rt_mutex_timed_locked without holding
the rt_mutex (due to a signal or timeout) and at the same time the task
holding the futex is releasing the futex and assigning the ownership of
the futex to the returning task, then it might happen that a third task
acquires the rt_mutex before the final rt_mutex_trylock() of the
returning task happens under the futex hash bucket lock. The returning
task returns to user space with ETIMEOUT or EINTR, but the user space
futex value is assigned to this task. The task which acquired the
rt_mutex fixes the user space futex value right after the hash bucket
lock has been released by the returning task, but for a short period of
time the user space value is wrong.

Detailed description is available at:

   https://bugzilla.redhat.com/show_bug.cgi?id=400541

The fix for this is the same as we do when the rt_mutex was acquired by
a higher priority task via lock stealing from the designated new owner.
In that case we already fix the user space value and the internal
pi_state up before we return. This mechanism can be used to fixup the
above corner case as well. When the returning task, which failed to
acquire the rt_mutex, notices that it is the designated owner of the
futex, then it fixes up the stale user space value and the pi_state,
before returning to user space. This happens with the futex hash bucket
lock held, so the task which acquired the rt_mutex is guaranteed to be
blocked on the hash bucket lock. We can access the rt_mutex owner, which
gives us the pid of the new owner, safely here as the owner is not able
to modify (release) it while waiting on the hash bucket lock.

Rename the "curr" argument of fixup_pi_state_owner() to "newowner" to
avoid confusion with current and add the check for the stale state into
the failure path of rt_mutex_trylock() in the return path of
unlock_futex_pi(). If the situation is detected use
fixup_pi_state_owner() to assign everything to the owner of the
rt_mutex.

Pointed-out-and-tested-by: Roland Westrelin <roland.westrelin@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-08 16:21:39 -08:00
..
irq genirq: revert lazy irq disable for simple irqs 2007-12-18 18:05:58 +01:00
power hibernate: fix lockdep report 2007-11-14 18:45:43 -08:00
time clockevents: fix reprogramming decision in oneshot broadcast 2007-12-18 18:05:58 +01:00
.gitignore
acct.c acct: real_parent ppid 2008-01-07 14:55:37 -08:00
audit.c
audit.h
audit_tree.c
auditfilter.c
auditsc.c
capability.c
cgroup.c
cgroup_debug.c
compat.c
configs.c
cpu.c
cpuset.c
delayacct.c
dma.c
exec_domain.c
exit.c wait_task_stopped(): pass correct exit_code to wait_noreap_copyout() 2007-11-29 09:24:55 -08:00
extable.c
fork.c fix clone(CLONE_NEWPID) 2007-12-05 09:21:18 -08:00
futex.c futex: Prevent stale futex owner when interrupted/timeout 2008-01-08 16:21:39 -08:00
futex_compat.c
hrtimer.c hrtimers: avoid overflow for large relative timeouts 2007-12-07 19:16:17 +01:00
itimer.c
kallsyms.c FRV: fix the extern declaration of kallsyms_num_syms 2007-11-29 09:24:54 -08:00
Kconfig.hz
Kconfig.instrumentation Tiny clean-up of OPROFILE/KPROBES configuration 2007-12-06 09:41:12 -08:00
Kconfig.preempt
kexec.c vmcoreinfo: add the array length of "free_list" for filtering free pages 2008-01-08 16:10:36 -08:00
kfifo.c
kmod.c
kprobes.c
ksysfs.c
kthread.c
latency.c
lockdep.c lockdep: make cli/sti annotation warnings clearer 2007-12-07 19:02:47 +01:00
lockdep_internals.h
lockdep_proc.c
Makefile
marker.c
module.c module: fix and elaborate comments 2007-11-19 11:20:43 +11:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
ns_cgroup.c
nsproxy.c
panic.c debug: add end-of-oops marker 2007-12-20 15:01:17 +01:00
params.c Modules: fix memory leak of module names 2007-12-22 23:09:05 -08:00
pid.c
posix-cpu-timers.c
posix-timers.c
printk.c [SERIAL]: Fix section mismatches in Sun serial console drivers. 2007-12-29 01:19:49 -08:00
profile.c
ptrace.c Fix kernel/ptrace.c compile problem (missing "may_attach()") 2008-01-02 13:48:27 -08:00
rcupdate.c
rcutorture.c
relay.c
resource.c
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rtmutex_common.h
rwsem.c sched: mark rwsem functions as __sched for wchan/profiling 2007-12-18 15:21:13 +01:00
sched.c sched: touch softlockup watchdog after idling 2007-12-18 15:21:13 +01:00
sched_debug.c sched: fix gcc warnings 2007-12-30 17:24:35 +01:00
sched_fair.c sched: do not hurt SCHED_BATCH on wakeup 2007-12-18 15:21:13 +01:00
sched_idletask.c
sched_rt.c sched: rt: account the cpu time during the tick 2007-12-20 15:01:17 +01:00
sched_stats.h sched: clean up kernel/sched_stat.h 2007-11-28 15:52:56 +01:00
seccomp.c
signal.c
softirq.c
softlockup.c
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys.c x86: ignore the sys_getcpu() tcache parameter 2007-11-17 16:27:00 +01:00
sys_ni.c
sysctl.c sched: sysctl, proc_dointvec_minmax() expects int values for 2007-12-18 15:21:13 +01:00
sysctl_check.c sysctl: fix ax25 checks 2007-12-17 19:28:17 -08:00
taskstats.c kernel/taskstats.c: fix bogus nlmsg_free() 2007-11-14 18:45:44 -08:00
time.c
timer.c timer: kernel/timer.c section fixes 2007-12-18 18:05:58 +01:00
tsacct.c
uid16.c
user.c sched: don't forget to unlock uids_mutex on error paths 2007-11-26 21:21:49 +01:00
user_namespace.c
utsname.c
utsname_sysctl.c Isolate the UTS namespace's domainname and hostname back 2007-11-29 09:24:53 -08:00
wait.c
workqueue.c