Server-side logs indicate that the client is sending stale stateids /
sequence IDs, and the server responds with NFS4ERR_OLD_STATEID.
--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2146310
Title:
NFSv4 client hang in OPEN reclaim path waiting for RPC completion
Status in linux package in Ubuntu:
New
Bug description:
Hi,
We are seeing an NFSv4.1 client hang on Linux kernel 5.15 (Ubuntu
22.04).
The issue starts when the server returns NFS4ERR_EXPIRED. The client
then enters recovery, but reclaim never completes.
The state manager thread is stuck with the following stack:
rpc_wait_bit_killable
__rpc_wait_for_completion_task
nfs4_run_open_task
nfs4_open_recover_helper
nfs4_open_recover
nfs4_do_open_expired
nfs40_open_expired
__nfs4_reclaim_open_state
nfs4_reclaim_open_state
nfs4_do_reclaim
nfs4_state_manager
Meanwhile:
- The server repeatedly returns NFS4ERR_EXPIRED
- The client does not successfully reclaim state
- IO continues and repeatedly fails
RPC stats show:
- ~30M calls
- very low retransmissions (94)
This suggests the issue is unlikely to be caused by network loss or
server unresponsiveness.
Additionally, we have verified that:
- Network connectivity is stable
- The NFS server is operating normally (no restart or failover observed)
Importantly:
- We do observe that RENEW/SEQUENCE-related traffic is being sent from the client
- However, the client still ends up with an expired lease (NFS4ERR_EXPIRED)
This raises the question whether the lease renewal is not being
properly processed or completed on the client side.
Given that we are using NFSv4.1 (where lease renewal is implicit via
SEQUENCE), we would like to understand:
1. Under what conditions could the client still hit NFS4ERR_EXPIRED despite ongoing renew/SEQUENCE activity and a healthy server/network?
2. Is it possible that RPC completion, session slot handling, or sequence handling issues could prevent the lease from being effectively renewed?
3. Could this be a known issue in the NFSv4.1 recovery or session handling path in 5.15?
It appears the client is stuck in the OPEN reclaim path waiting for
RPC completion, and recovery cannot make forward progress.
Are there known fixes or patches in newer kernels (e.g., 5.19 or 6.x)
that address this class of issue?
Any pointers or suggestions would be greatly appreciated.
Thanks
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2146310/+subscriptions
Комментариев нет:
Отправить комментарий