Showing posts with label nvidia-tls bug. Show all posts
Showing posts with label nvidia-tls bug. Show all posts

03 March 2012

91. Downgrading nvidia drivers from 295.20 to 290.10 on debian testing

How to downgrade your nvidia drivers

WARNING -- you must use the terminal during these steps. If you don't know how to use e.g. cd, ls and nano or vim you will want to be careful:
1. Make sure that you have internet access even without a graphical environment
2. Make sure that you have a basic understanding of how to navigate in the terminal
3. Print out or write down these instructions before startintg

I typically test all my instructions on several different computers as a form of proof-reading. For various reasons I can't do that with this blog post, so read through the instructions first to understand what they do and that typos won't throw you off.

If you've been having the gnome-shell crash bug
http://verahill.blogspot.com.au/2012/02/debian-testing-wheezy-64-nvidia-bug.html
[ 7011.967820] gnome-shell[32742]: segfault at 10 ip 00007fa1b6d98c0f sp 00007fa1914a1638 error 6 in libnvidia-tls.so.295.20[7fa1b6d98000+3000]
or the evolution crash bug
http://verahill.blogspot.com.au/2012/02/debian-testing-wheezy-64-no-real.html
[22129.426444] evolution[20435]: segfault at 10 ip 00007f2a05bf8c0f sp 00007f29e5725508 error 6 in libnvidia-tls.so.295.20[7f2a05bf8000+3000]
which are both caused by nvidia driver 295.20, here's how to gracefully downgrade to the previous version of the nvidia driver: 290.10. Be aware that evolution crashes occasionally under 290.10 too, but not nearly as consistently as under 25.20 -- chances are that evolution is a bit buggy on its own.

This will make use of the dkms package, which is what you should use anyway. We'll pull the old stuff from a snapshot archive.

1. Setting up your computer 
I prefer not to be forced to boot into X when I'm mucking about with graphics drivers, so:

Edit your /etc/default/grub
find your
GRUB_CMDLINE_LINUX_DEFAULT
line and add "text" to it e.g.
change
GRUB_CMDLINE_LINUX_DEFAULT="nomodeset nouveau.modeset=0"
to
GRUB_CMDLINE_LINUX_DEFAULT="text nomodeset nouveau.modeset=0"
Do
sudo update-grub

Now is a good time to do a reboot to see if you have internet in text-only mode. 
sudo shutdown -r now
To start your graphical environment again do 
startx


2. Set up snapshot archive
To your /etc/apt/sources.list add this line:
deb http://snapshot.debian.org/archive/debian/20120120T092809Z/ wheezy main contrib non-free
Also, create a
/etc/apt/apt.conf.d/60ignore_repo_date_check
file with the following in it:
Acquire
{
Check-Valid-Until "false";
}
Run
sudo apt-get update
Don't install anything yet.

3. Get the nvidia binary driver
Go to e.g. ~/tmp and
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/290.10/NVIDIA-Linux-x86_64-290.10.run
chmod +x NVIDIA-Linux-x86_64-290.10.run

Don't forget where you put it.

4. Remove your existing drivers and packages:
First reboot:
sudo shutdown -r now
You'll now boot into a text-only environment, so you had better printed this out first.

Then
sudo apt-get autoremove nvidia-*
The following packages will be REMOVED:
  diffstat glx-alternative-mesa glx-alternative-nvidia glx-diversions libcublas4 libcuda1 libcudart4 libcufft4
  libcurand4 libcusparse4 libgl1-nvidia-alternatives libgl1-nvidia-glx libglx-nvidia-alternatives libnpp4
  libthrust-dev libvdpau-dev nvidia-alternative nvidia-glx nvidia-installer-cleanup nvidia-kernel-common
  nvidia-kernel-dkms nvidia-support nvidia-vdpau-driver opencl-headers quilt xserver-xorg-video-nvidia
0 upgraded, 0 newly installed, 26 to remove and 3 not upgraded.
After this operation, 435 MB disk space will be freed.
Do you want to continue [Y/n]?Y
[..]
Reboot for good luck:
sudo shutdown -r now

If you do
locate nvidia.ko
chances are you'll find
/lib/modules/3.2.9/updates/dkms/nvidia.ko
where 3.2.9 is the current kernel version.

Do
sudo updatedb
locate nvidia.ko
to make sure that the nvidia.ko is gone from your current kernel.

5. Install your old nvidia driver:
Go to the directory you downloaded the driver in, e.g. ~/tmp
sudo ./NVIDIA-Linux-x86_64-290.10.run
Reboot afterwards:
sudo shutdown -r now

After the reboot do
startx

Did it work? If yes, you're in good shape.
 dmesg | grep nvidia
[    7.540166] nvidia: module license 'NVIDIA' taints kernel.
[    8.509525] nvidia 0000:01:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[    8.509600] nvidia 0000:01:00.0: setting latency timer to 64
6. Setting up the kernel-dkms
You really want to use dkms so that you don't have re-install the graphics driver each time you upgrade your kernel.

First check
apt-cache showpkg nvidia-kernel-dkms
Provides:
295.20-1 - nvidia-kernel-295.20
290.10-1 - nvidia-kernel-290.10
195.36.31-6 - nvidia-kernel-195.36.31

OK, time to get rocking:
sudo apt-get install nvidia-kernel-dkms=290.10-1 nvidia-glx=290.10-1 libgl1-nvidia-glx=290.10-1 xserver-xorg-video-nvidia=290.10-1 nvidia-vdpau-driver=290.10-1 nvidia-alternative=290.10-1

You'll be warned about remove nvidia-install etc. That's fine.

Once the installation is done it's time to put holds on the packages so they don't accidentally upgrade

sudo su
echo "nvidia-kernel-dkms hold"| dpkg --set-selections
echo "nvidia-glx hold"| dpkg --set-selections
echo "libgl1-nvidia-glx hold"| dpkg --set-selections
echo "xserver-xorg-video-nvidia hold"| dpkg --set-selections
echo "nvidia-vdpau-driver hold"| dpkg --set-selections
echo "nvidia-alternative hold"| dpkg --set-selections
exit

7. Cleaning up
Things to do:
a. comment out the snapshot in /etc/apt/sources.list
b. move the /etc/apt/apt.conf.d/60ignore_repo_date_check file out of the way
c. sudo apt-get upgrade
Reading package lists... Done
Building dependency tree    
Reading state information... Done
The following packages have been kept back:
  libgl1-nvidia-glx nvidia-alternative nvidia-glx
  nvidia-kernel-dkms nvidia-vdpau-driver xserver-xorg-video-nvidia
d. edit your /etc/default/grub and remove the "text" item you added.
e. Run sudo update-grub
f. You can now reboot and your computer will be back to normal, sans nvidia 295.20

DONE



8. In the future
Once it is safe to upgrade, all you need to do is

sudo su
echo "nvidia-kernel-dkms install"| dpkg --set-selections
echo "nvidia-glx install"| dpkg --set-selections
echo "libgl1-nvidia-glx install"| dpkg --set-selections
echo "xserver-xorg-video-nvidia install"| dpkg --set-selections
echo "nvidia-vdpau-driver install"| dpkg --set-selections
echo "nvidia-alternative install"| dpkg --set-selections
exit
sudo apt-get update && sudo apt-get upgrade

Links to this post:
http://www.pro-chip.de/linux-mint/161-linux-mint-debian-geforce-gtx-560-ti-treiberproblem.html

28 February 2012

85. Nvidia bug causes evolution to crash/segmentation fault. Temporary and permanent fixes on Debian Testing

The nvidia-tls bug is affecting evolution too...
(and it's not just GNOME - http://www.linuxmintusers.de/index.php?topic=6859.0)


UPDATE: I have two nvidia boxes running debian testing. Only the one with GT 430 is exhibiting problems. My GT 520 box is unaffected.


UPDATE: Here's how to downgrade your drivers:
http://verahill.blogspot.com.au/2012/03/debian-testing-downgrading-nvidia.html

If you don't want to read the entire post, here's the summary:
1. I think the only semi-permanent solution is to downgrade from 295.20 to nvidia driver version 290.10
2. you can run evolution with
strace -o evolution.log evolution
and IT WILL NOT CRASH
3. It doesn't matter whether you use the nvidia binary straight from nvidia, using sgfxi, or use the nvidia-kernel-dkms/glx debian way. Evolution still dies.

PS strace is normally used to track system calls for the purpose trouble shooting. That it prevents evolution from crashing is completely unintended. But it works as a quick-fix.

PPS  What it does:

"The nvidia-tls libraries (/usr/lib/libnvidia-tls.so.x.y.z and /usr/lib/tls/libnvidia-tls.so.x.y.z); these files provide thread local storage support for the NVIDIA OpenGL libraries (libGL, libGLcore, and libglx). Each nvidia-tls library provides support for a particular thread local storage model (such as ELF TLS), and the one appropriate for your system will be loaded at run time."




The symptoms:
Start  evolution, and it will crash with a segmentation fault within the first ten seconds or so

dmesg points to the nvidia bug:

[19690.606196] evolution[13032]: segfault at 10 ip 00007f5a0f53ac0f sp 00007f59ddde6508 error 6 in libnvidia-tls.so.295.20[7f5a0f53a000+3000]
[21476.236668] evolution[18197]: segfault at 10 ip 00007fd4389c2c0f sp 00007fd418d56508 error 6 in libnvidia-tls.so.295.20[7fd4389c2000+3000]
[21513.224145] evolution[18387]: segfault at 10 ip 00007f2cd3e85c0f sp 00007f2cb3a44508 error 6 in libnvidia-tls.so.295.20[7f2cd3e85000+3000]
[21954.867694] evolution[19803]: segfault at 10 ip 00007f1680aa9c0f sp 00007f165bffe508 error 6 in libnvidia-tls.so.295.20[7f1680aa9000+3000]
[22129.426444] evolution[20435]: segfault at 10 ip 00007f2a05bf8c0f sp 00007f29e5725508 error 6 in libnvidia-tls.so.295.20[7f2a05bf8000+3000]


Running
CAMEL_DEBUG=all evolution >& evolution.log
three times had it crash with

First time:

DB SQL operation [BEGIN] started
Camel SQL Exec:
BEGIN
Camel SQL Exec:
COMMIT
DB Operation ended. Time Taken : 0.000060
###########
received: * LSUB (\HasNoChildren) "/" "INBOX"
received: B00005 OK Success
sending : B00006 LIST "" "*"
--> Segmentation fault

Second time:

===========
DB SQL operation [ATTACH DATABASE ':memory:' AS mem] started
Camel SQL Exec:
ATTACH DATABASE ':memory:' AS mem
POP3_STREAM_LINE (25): '-ERR unrecognized command'
DB Operation ended. Time Taken : 0.011516
###########

Database succesfully opened  

===========
DB SQL operation [ATTACH DATABASE ':memory:' AS mem] started
Camel SQL Exec:
ATTACH DATABASE ':memory:' AS mem
DB Operation ended. Time Taken : 0.010961
###########
**
GLib-GIO:ERROR:/tmp/buildd/glib2.0-2.30.2/./gio/gdbusmessage.c:1986:append_value_to_blob: assertion failed: (g_utf8_validate (v, -1, &end) && (end == v + len))
--> Segmentation fault

Third time:

===========
DB SQL operation [BEGIN] started
Camel SQL Exec:
BEGIN
Camel SQL Exec:
COMMIT
DB Operation ended. Time Taken : 0.000070
###########
sending : A00004 SELECT INBOX
--> Segmentation fault

strace:
I can't crash evolution with either strace or valgrind running. Now why is that?


Solution:
Downgrading. Which turns out to be more difficult than one would imagine.

UPDATE: Here's how to downgrade your drivers:
http://verahill.blogspot.com.au/2012/03/debian-testing-downgrading-nvidia.html

If you don't want to downgrade the nvidia drivers:
A temporary solution is, odd as it may seem, to use
strace -o evolution.log evolution
because it just refuses to crash. I don't know why, but it works.