LibreOffice only works in the terminal: the weirdest GNU/Linux issue I've encountered to date

| Tags: Articles in English

You’re right Richard, this is your fault too.

First, some context. At work, we have Linux workstations with NFS home folders. This is awesome, since you can move basically seamlessly between computers, but can also bring with it many interesting bugs. This is one of them.

When I tried to open LibreOffice today by double-clicking on a file, it didn’t work. So I ran LibreOffice from my terminal to see its output and, lo and behold, it worked just fine. And it was quite reliable – when starting LibreOffice from a terminal, it worked, any other way didn’t – launching it from dmenu, using i3-msg or from a GUI. All terminals worked, all shells worked. So I called in my colleagues and we started investigating.

LEdoian (who also helped with most of the investigation) suggested to look into the .xsession-errors file and the error found there was not very helpful:

grep: write error: Permission denied
Error: The debug options --record, --backtrace, --strace, and --valgrind cannot be used together.
       Please, use them one by one.

At first I thought that some calling script on the way somehow added these parameters in response to some environment so I started to dig there. Eventually, I found the /usr/bin/soffice file, which was responsible for launching LibreOffice.

The error is generated by the following snippet of code:

if echo "$checks" | grep -q "cc" ; then
    echo "Error: The debug options --record, --backtrace, --strace, and --valgrind cannot be used together."
    echo "       Please, use them one by one."
    exit 1;
fi

Let’s see, where that $checks variable comes from:

# count number of selected checks; only one is allowed
checks=
EXTRAOPT=
# force the --valgrind option if the VALGRIND variable is set
test -n "$VALGRIND" && EXTRAOPT="--valgrind"

# force the --record option if the RR variable is set
test -n "$RR" && EXTRAOPT="--record"

for arg in "$@" $EXTRAOPT ; do
    case "$arg" in
        --record)
            if which rr >/dev/null 2>&1 ; then
                # smoketest may already be recorded => ignore nested
                RRCHECK="rr record --nested=ignore"
                checks="c$checks"
            else
                echo "Error: Can't find the tool \"rr\", --record option will be ignored."
                exit 1
            fi
            ;;
        --backtrace)
            if which gdb >/dev/null 2>&1 ; then
                GDBTRACECHECK="gdb -nx --command=$sd_prog/gdbtrace --args"
                checks="c$checks"
            else
                echo "Error: Can't find the tool \"gdb\", --backtrace option will be ignored."
                exit 1
            fi
            ;;
          # (other options redacted)
    esac
done

The variable is used, as the error message would suggest, to ensure that two or more conflicting options are not set at the same time. You can also see, that some of the options can be set using environment variables, but there was quite an easy way to check this: I added the following snippet just before the multiple option check:

echo "--------------- Cut here ---------------"
echo "$checks"
echo "$@"
echo "--------------- Stop cutting here ------"

When I launched localc again, I got the expected output:

--------------- Cut here ---------------

--calc
--------------- Stop cutting here ------

Only at this point did I notice the first line of the original error message, which I managed to overlook:

grep: write error: Permission denied

So I added another set of lines to the debug prints:

echo "--------------- Cut here ---------------"
echo "$checks"
echo "$@"
echo "$checks" | grep "cc"
echo $?
echo "--------------- Stop cutting here ------"

This, again, produced the expected output:

--------------- Cut here ---------------

--calc
grep: write error: Permission denied
2
--------------- Stop cutting here ------

This made me question everything I knew about UNIX. I thought that in a POSIX shell the if statement evaluates the first branch when the command returns 0 and the second one with all other return codes, but clearly, grep returned 2 and sh still went into the first branch. HUH?

Undeterred, we started investigating why does grep get that permission denied error in the first place. After adding ls -la /proc/self/fd to see what stdin, stdout and stderr are connected to, we got the following output:

total 0
dr-x------ 2 jan users  0 Oct 31 18:18 .
dr-xr-xr-x 9 jan users  0 Oct 31 18:18 ..
lr-x------ 1 jan users 64 Oct 31 18:18 0 -> pipe:[45923638]
l-wx------ 1 jan users 64 Oct 31 18:18 1 -> /nfs/home/jan/.xsession-errors
l-wx------ 1 jan users 64 Oct 31 18:18 2 -> /nfs/home/jan/.xsession-errors
lr-x------ 1 jan users 64 Oct 31 18:18 3 -> /proc/23830/fd
ls: write error: Permission denied

First of all, stdout and stderr point to .xsession-errors so that’s probably the culprit, and sure enough, running localc >> .xsession-errors in a terminal behaves as if it was executed from dmenu – reports the aforementioned and exits. Second of all, ls also seems to be affected by the issue, but weirdly, it doesn’t impede its functionality in any way.

So we turned to strace, tracing id (as it is the simplest command we could think of in a hurry that also exhibited the error) and we saw the first properly cursed thing of the investigation:

write(1, "uid=3262(jan) gid=1000(users) gr"...,) = 162
close(1)                                = -1 EACCES (Permission denied)
write(2, "id: ", 4id: )                     = 4
write(2, "write error", 11write error)             = 11
write(2, ": Permission denied", 19: Permission denied)     = 19
write(2, "\n", 1
)                       = 1
exit_group(1)                           = ?
+++ exited with 1 +++

The kernel replied with EACCES to a close syscall, which is, unsurprisingly, not expected behaviour. The system call manual page specifies EBADF, EINTR, EIO, ENOSPC and EDQUOT. POSIX only allows the first three. We figured this was probably caused by a main server crash we experienced earlier this week – the server hosts NFS and the workstation including my session has been running since before the crash, so we figured that something about NFS didn’t close properly.

But still, why in the name of sanity would the if statement from earlier act in such a peculiar way? Unless grep for some reason returned 0. But we checked that, right? Well, not quite. The second grep is called with the -q option.

The man page states:

-q, --quiet, --silent
      Quiet;  do not write anything to standard output.  Exit immediately with zero status if any match is found, even if
      an error was detected.  Also see the -s or --no-messages option.

At first we were a bit confused by the wording of the help text, thinking it says (match or error) ⇒ exit 0, but it most likely means the more reasonable explanation (match and error) ⇒ exit 0 in addition to matchexit 0. This is confirmed by subjecting grep to any other error:

 $ grep -q . /aaa; echo $?
grep: /aaa: No such file or directory
2

But, there seems to be a bug in GNU grep, that causes it to return 0 when it is run with -q and receives EACCESS as a response to close. This includes situations in which grep encounters other errors:

 $ grep -q > .xsession-errors; echo $?
Usage: grep [OPTION]... PATTERNS [FILE]...
Try 'grep --help' for more information.
grep: write error: Permission denied
0

It also doesn’t happen without the -q option and correctly returns 2 even if it otherwise would return 0:

 $ echo 'cc' | grep "cc"  > .xsession-errors; echo $?
grep: write error: Permission denied
2

Coming back to the original snippet, you can now see, why it behaves like it did:

if echo "$checks" | grep -q "cc" ; then
    echo "Error: The debug options --record, --backtrace, --strace, and --valgrind cannot be used together."
    echo "       Please, use them one by one."
    exit 1;
fi

When run from dmenu, its output is redirected to .xsession-errors, which is coincidentally broken by some (possibly) NFS magic. This causes grep to receive EACCESS in response to a close syscall and, combined with -q a bug causes it to return 0, when it really shouldn’t.