0

I am working on an embedded Linux system (kernel-5.10.188), and use /dev/ttyS2 as serial console and ash in busybox is the login shell.

After logging in to system, I ran top -d 1 in the serial console (I am using mobaxterm in Windows 11 to acess the serial console), it worked well. Then I closed PC's lid (Windows suspended itself).

A few minutes later, I resumed the PC and found the serial connection in mobaxterm is down, I typed 'R' to re-connect the serial console, but there is NO output from the serial console.

I login to the system through adb shell, and I got followings.

  • top -d 1 is process of PID, its status showed
Name:   top
Umask:  0022
State:  S (sleeping)
Tgid:   345
Ngid:   0
Pid:    345
PPid:   210
.....
voluntary_ctxt_switches:        51
nonvoluntary_ctxt_switches:     22

The last two lines showed the same value when cat /proc/345/status for several times. So the process is not running.

By running cat /proc/345/stack, it showed following.

# cat /proc/345/stack
[<0>] wait_woken+0x74/0x94
[<0>] n_tty_write+0x480/0x4f0
[<0>] file_tty_write.isra.36+0x1c8/0x358
[<0>] vfs_write+0x3e8/0x4d8
[<0>] ksys_write+0xe0/0x124
[<0>] syscall_common+0x34/0x58

The process is waiting in vfs_write and n_tty_write (I think it is from something printf or puts from top utility).

I can killed the top process with kill -9 345.
But there is still NO response in the console, so I checked the login shell process.

  • Check the process of 210 (the login shell and the parent of top).
# cat /proc/210/status
Name:   sh
Umask:  0022
State:  S (sleeping)
Tgid:   210
Ngid:   0
Pid:    210
PPid:   1
......
voluntary_ctxt_switches:        236
nonvoluntary_ctxt_switches:     45

# cat /proc/210/stack
[<0>] wait_woken+0x74/0x94
[<0>] n_tty_write+0x480/0x4f0
[<0>] file_tty_write.isra.36+0x1c8/0x358
[<0>] vfs_write+0x3e8/0x4d8
[<0>] ksys_write+0xe0/0x124
[<0>] syscall_common+0x34/0x58

The login shell is also in vfs_write and not being scheduled.
I have to kill -9 210 to bring back the login shell.

I can definely reproduce this issue with Windows suspend/resume. I went through the long list of kernel's commits on tty, but I did NOT find the same issue or the fix.

So what is the cause of this hang in serial console and how to fix it? Or where should I post this issue or bug for help?

1 Answer 1

0

With tests, searching and debugging, I found the root cause of the hang and the fixes to this issue (thanks to https://zhuanlan.zhihu.com/p/706612622).

Firstly, it is NOT a BUG in kernel. It is from usage of communication through tty console.

The hang of top process is trigged by soft-flowcontrol in the tty system. When the PC goes to suspend, Mobaxterm should send XOFF to Linux, which stops transmitting data process. (My Mobaxterm is configured with XON/XOFF as flowcontrol).

The top can be revoked to work by typing Ctrl-Q (XON) in Mobaxterm.

So the solution is one of the followings.

  1. Use 'None' as flowcontrol in MobaXterm.
  2. Use stty -F /dev/ttyS2 -ixoff to disable XOFF in serial console port.
  3. Use Ctrl-Q (XON) to start the transimission in serial console port.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.