What are some Linux tools to diagnose a system which keeps hanging?

Question

I have a laptop which runs Ubuntu. After running for a while (sometimes 5-10 mins, sometimes more), the system hangs and stops responding to keyboard or mouse inputs. Before I open the system up or take them to a technician who can look at the hardware issues, I was wondering if there are ways I can diagnose the system using linux tools. Can the issue happen due to bad blocks in the partition? The issue started after I used fdisk on the hard drive? Are there ways to confirm this?

Bender · Accepted Answer

- Have you switched to a console screen vs X (control+alt+F1) to see if its actually X that is hanging and not the OS? Hanging 5 to 10 mins after using fdisk sounds like a red herring. If you are able to do this check dmesg. Just type dmesg.
- Do you have sys-rq enabled and if so does the system respond to an emergency sync then emergency dismount then emergency reboot? [0] sysctl -a 2>/dev/null | grep rq
- If you switch to run level 2 does this still occur? [1]
- Is there anything interesting in syslog messages in /var/log? Most notably Xorg.log and messages
- Do you have lm-sensors installed to watch temperature? sensors | grep -Ei ^temp Also check the temp of your drives smartctl -x /dev/sda | grep -Ei ^temp assuming your drive is sda.
That would be my starting point. It could go a million directions from there.
[0] - https://en.wikipedia.org/wiki/Magic_SysRq_key
[1] - https://www.tecmint.com/change-runlevels-targets-in-systemd/

davydm · Answer

first, I'd watch dmesg from boot, to see if something useful pops up:
open a console and type
watch -n 1 "dmesg | tail -n 40"
to watch the last 40 lines that are reported by dmesg (some distros will require you by default to be root to do this, so if you get an error, try that)
Also, you may find information in system logs:
A traditional (non-systemd) distro will store text files under /var/log - often /var/log/messages is what you're looking for - you can watch the tail as above:
tail -f -n 40 /var/log/messages
If your system is running systemd, you would do:
journalctl -f -n 40
(again, you may need to be root to do this)
I recently had lockups that seem to have been thermal in nature - there weren't a lot of related messages - just once I saw something about hitting thermal limit, but that was a bit before the lock-up, so I dismissed it as just the behavior of modern cpus which run full-tilt until they reach thermal throttling. The machine was sent in for repair and had liquid metal thermal paste re-applied, and temps are down by about 10 degrees - really not that much, and temps are still sitting in the 90s, but so far, over the last 1.5 days, no lockup, but I'd need to run for a while longer to verify.

electricant · Answer

Happened to me on what, if I recall correctly, was a thinkpad t61. In the end it was due to outdated BIOS. Updating it stopped the PC from hanging up.
Perhaps check that your BIOS is the latest version first.
And by the way run also a scan of the RAM with memtest.

What are some Linux tools to diagnose a system which keeps hanging?

Happened to me on what, if I recall correctly, was a thinkpad t61. In the end it was due to outdated BIOS. Updating it stopped the PC from hanging up.Perhaps check that your BIOS is the latest version first.And by the way run also a scan of the RAM with memtest.

Happened to me on what, if I recall correctly, was a thinkpad t61. In the end it was due to outdated BIOS. Updating it stopped the PC from hanging up.
Perhaps check that your BIOS is the latest version first.
And by the way run also a scan of the RAM with memtest.