HTC put some hack into the kernel to kill a process after encountering more than 10 page faults !
It was really hard to believe until I discovered this in arch/arm/mm/fault.c:
void
__do_user_fault(struct task_struct *tsk, unsigned long addr,
unsigned int fsr, unsigned int sig, int code,
struct pt_regs *regs)
{
struct siginfo si;
struct task_struct *g, *p, *selected = NULL;
#ifdef CONFIG_DEBUG_USER
if (user_debug & UDBG_SEGV) {
printk(KERN_DEBUG "%s: unhandled page fault (%d) at 0x%08lx, code 0x%03x\n",
tsk->comm, sig, addr, fsr);
show_pte(tsk->mm, addr);
show_regs(regs);
}
#endif
if (sig == SIGSEGV)
tsk->segfault_count++;
if (tsk->segfault_count > 10) {
tsk->segfault_count = 0;
printk(KERN_ERR "unhandled page fault at 0x%08lx, code 0x%03x\n",
addr, fsr);
show_pte(tsk->mm, addr);
show_regs(regs);
do_each_thread(g, p) {
task_lock(p);
if (p == tsk)
selected = g;
task_unlock(p);
} while_each_thread(g, p);
if (selected) {
printk(KERN_ERR "%s: triggered too many segfaults, force killing parent: %s\n",
tsk->comm, selected->comm);
force_sig(SIGKILL, selected);
return;
}
}
tsk->thread.address = addr;
tsk->thread.error_code = fsr;
tsk->thread.trap_no = 14;
si.si_signo = sig;
si.si_errno = 0;
si.si_code = code;
si.si_addr = (void __user *)addr;
force_sig_info(sig, &si, tsk);
}
However, looking at the kernel sources wasn't that easy as it may look like - first, you have to actually get them. One of my coworkers pointed me at this interesting article:
http://www.freedom-to-tinker.com/blog/sjs/htc-willfully-violates-gpl-t-mobiles-new-g2-android-phone - and there's a link in it to a download location.
My kernel is 2.6.32.21-g1e30168, but for some reason, this doesn't work in Germany.
The entire story started about a month ago, where I discovered some very weird crashes while debugging on my device that nobody else in the team had. It was extremely frustrating for me and I kept thinking why me, what am I doing wrong here ?
After some investigation, it pretty much looked like something during variable evaluation was killing the app. We could insert a breakpoint, run to it, but it stopped at the breakpoint and I tried to evaluate some variables, the app crashes ... silently, without anything in adb logcat.
I wrote a simple test app and a soft debugger client application (both can be found in the martins-playground module on github) and soon discovered that we were only crashing during single-threaded invokes.
So what's the difference here ? Well, there's this bug in SDB - single-stepping isn't always disabled during single-threaded invokes. However, this should only impact performance and never cause the app to actually crash. But it gave me the idea ...
As a next step, I wrote a small native C application called NativeTest which installs a SIGSEGV handler and then creates 100 page faults in a loop.
The same thing can also be accomplished with something like this:
for (int i = 0; i < 1000; i++) {
try {
object o = null;
o.GetType ();
} catch {
}
}
Compile something like that with Mono for Android and execute it on the device - it should not crash.
Luckily, Mono's JIT already has a feature called "explicit null checks" and I also have a patch for the Soft Debugger to check some variable instead of using page faults for single-step and breakpoint events.
However, I'm not entirely sure whether this really covers all possible scenarios where a page fault would normally be handled gracefully. And seeing something like this in their kernel also makes me a bit worried that there might be other surprises.