LibreOffice only works in the terminal: the weirdest GNU/Linux issue I've encountered to date
| Tags:
You’re right Richard, this is your fault too.
First, some context. At work, we have Linux workstations with NFS home folders. This is awesome, since you can move basically seamlessly between computers, but can also bring with it many interesting bugs. This is one of them.
When I tried to open LibreOffice today by double-clicking on a file, it didn’t
work. So I ran LibreOffice from my terminal to see its output and, lo and
behold, it worked just fine. And it was quite reliable – when starting
LibreOffice from a terminal, it worked, any other way didn’t – launching it from
dmenu, using i3-msg or from a GUI. All terminals worked, all shells worked.
So I called in my colleagues and we started investigating.
LEdoian (who also helped with most of the
investigation) suggested to look into the .xsession-errors file and the error
found there was not very helpful:
grep: write error: Permission denied
Error: The debug options --record, --backtrace, --strace, and --valgrind cannot be used together.
Please, use them one by one.
At first I thought that some calling script on the way somehow added these
parameters in response to some environment so I started to dig there.
Eventually, I found the /usr/bin/soffice file, which was responsible for
launching LibreOffice.
The error is generated by the following snippet of code:
if echo "$checks" | grep -q "cc" ; then
echo "Error: The debug options --record, --backtrace, --strace, and --valgrind cannot be used together."
echo " Please, use them one by one."
exit 1;
fi
Let’s see, where that $checks variable comes from:
# count number of selected checks; only one is allowed
checks=
EXTRAOPT=
# force the --valgrind option if the VALGRIND variable is set
test -n "$VALGRIND" && EXTRAOPT="--valgrind"
# force the --record option if the RR variable is set
test -n "$RR" && EXTRAOPT="--record"
for arg in "$@" $EXTRAOPT ; do
case "$arg" in
--record)
if which rr >/dev/null 2>&1 ; then
# smoketest may already be recorded => ignore nested
RRCHECK="rr record --nested=ignore"
checks="c$checks"
else
echo "Error: Can't find the tool \"rr\", --record option will be ignored."
exit 1
fi
;;
--backtrace)
if which gdb >/dev/null 2>&1 ; then
GDBTRACECHECK="gdb -nx --command=$sd_prog/gdbtrace --args"
checks="c$checks"
else
echo "Error: Can't find the tool \"gdb\", --backtrace option will be ignored."
exit 1
fi
;;
# (other options redacted)
esac
done
The variable is used, as the error message would suggest, to ensure that two or more conflicting options are not set at the same time. You can also see, that some of the options can be set using environment variables, but there was quite an easy way to check this: I added the following snippet just before the multiple option check:
echo "--------------- Cut here ---------------"
echo "$checks"
echo "$@"
echo "--------------- Stop cutting here ------"
When I launched localc again, I got the expected output:
--------------- Cut here ---------------
--calc
--------------- Stop cutting here ------
Only at this point did I notice the first line of the original error message, which I managed to overlook:
grep: write error: Permission denied
So I added another set of lines to the debug prints:
echo "--------------- Cut here ---------------"
echo "$checks"
echo "$@"
echo "$checks" | grep "cc"
echo $?
echo "--------------- Stop cutting here ------"
This, again, produced the expected output:
--------------- Cut here ---------------
--calc
grep: write error: Permission denied
2
--------------- Stop cutting here ------
This made me question everything I knew about UNIX. I thought that in a POSIX
shell the if statement evaluates the first branch when the command returns 0
and the second one with all other return codes, but clearly, grep returned 2
and sh still went into the first branch. HUH?
Undeterred, we started investigating why does grep get that permission
denied error in the first place. After adding ls -la /proc/self/fd to see what
stdin, stdout and stderr are connected to, we got the following output:
total 0
dr-x------ 2 jan users 0 Oct 31 18:18 .
dr-xr-xr-x 9 jan users 0 Oct 31 18:18 ..
lr-x------ 1 jan users 64 Oct 31 18:18 0 -> pipe:[45923638]
l-wx------ 1 jan users 64 Oct 31 18:18 1 -> /nfs/home/jan/.xsession-errors
l-wx------ 1 jan users 64 Oct 31 18:18 2 -> /nfs/home/jan/.xsession-errors
lr-x------ 1 jan users 64 Oct 31 18:18 3 -> /proc/23830/fd
ls: write error: Permission denied
First of all, stdout and stderr point to .xsession-errors so that’s
probably the culprit, and sure enough, running localc >> .xsession-errors in a
terminal behaves as if it was executed from dmenu – reports the aforementioned
and exits. Second of all, ls also seems to be affected by the issue, but
weirdly, it doesn’t impede its functionality in any way.
So we turned to strace, tracing id (as it is the simplest command we could
think of in a hurry that also exhibited the error) and we saw the first properly
cursed thing of the investigation:
write(1, "uid=3262(jan) gid=1000(users) gr"...,) = 162
close(1) = -1 EACCES (Permission denied)
write(2, "id: ", 4id: ) = 4
write(2, "write error", 11write error) = 11
write(2, ": Permission denied", 19: Permission denied) = 19
write(2, "\n", 1
) = 1
exit_group(1) = ?
+++ exited with 1 +++
The kernel replied with EACCES to a close syscall, which is, unsurprisingly,
not expected behaviour. The system call manual page specifies EBADF, EINTR,
EIO, ENOSPC and EDQUOT. POSIX only allows the first three. We figured this
was probably caused by a main server crash we experienced earlier this week –
the server hosts NFS and the workstation including my session has been running since
before the crash, so we figured that something about NFS didn’t close properly.
But still, why in the name of sanity would the if statement from earlier act in
such a peculiar way? Unless grep for some reason returned 0. But we checked
that, right? Well, not quite. The second grep is called with the -q option.
The man page states:
-q, --quiet, --silent
Quiet; do not write anything to standard output. Exit immediately with zero status if any match is found, even if
an error was detected. Also see the -s or --no-messages option.
At first we were a bit confused by the wording of the help text, thinking it
says (match or error) ⇒ exit 0, but it most likely means the more
reasonable explanation (match and error) ⇒ exit 0 in addition to match ⇒
exit 0. This is confirmed by subjecting grep to any other error:
$ grep -q . /aaa; echo $?
grep: /aaa: No such file or directory
2
But, there seems to be a bug in GNU grep, that causes it to return 0 when it
is run with -q and receives EACCESS as a response to close. This includes
situations in which grep encounters other errors:
$ grep -q > .xsession-errors; echo $?
Usage: grep [OPTION]... PATTERNS [FILE]...
Try 'grep --help' for more information.
grep: write error: Permission denied
0
It also doesn’t happen without the -q option and correctly returns 2 even if
it otherwise would return 0:
$ echo 'cc' | grep "cc" > .xsession-errors; echo $?
grep: write error: Permission denied
2
Coming back to the original snippet, you can now see, why it behaves like it did:
if echo "$checks" | grep -q "cc" ; then
echo "Error: The debug options --record, --backtrace, --strace, and --valgrind cannot be used together."
echo " Please, use them one by one."
exit 1;
fi
When run from dmenu, its output is redirected to .xsession-errors, which is
coincidentally broken by some (possibly) NFS magic. This causes grep to
receive EACCESS in response to a close syscall and, combined with -q a bug
causes it to return 0, when it really shouldn’t.