public inbox for goredo-devel@lists.stargrave.org
Atom feed
From: "Niklas Böhm" <mail@jnboehm•com>
To: goredo-devel@lists.cypherpunks.su
Subject: Handling EINTR in unix.FcntlFlock
Date: Sat, 4 Jan 2025 10:59:09 +0100	[thread overview]
Message-ID: <98f44f62-1f44-4375-8cf3-d10b0e1c81f9@jnboehm.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1932 bytes --]

Greetings everyone,

I was using goredo on an NFS and noticed that I sometimes ran into 
issues where my program would fail with the following error:

	run.go:234: interrupted system call /gpfs01/.../folders/.redo/1.zip.lock

After doing some digging, it seems like the problem is that calling 
unix.FcntlFlock with F_SETLKW can be too slow over an NFS and will get 
interrupted (see `man 2 flock`, Section on errors [1]).  Apparently 
there is an automatic restart mechanism [2], but it's also unreliable, 
so I thought it's better to handle it explicitly and basically extend 
the error check from

	if errors.Is(err, unix.EDEADLK) {

to

	if errors.Is(err, unix.EDEADLK) || errors.Is(err, unix.EINTR) {

This seems to resolve the interrupted system call error above. 
Unfortunately I cannot realiably reproduce this error, but since the fix 
is reasonaly easy, I was hoping that it could be incorporated into 
goredo proper.

I have attached the small diff, and here it is also reproduced, in case 
the explicit line is not clear:

diff --git a/run.go b/run.go
index 506fd35..5423b49 100644
--- a/run.go
+++ b/run.go
@@ -227,7 +227,7 @@ func runScript(tgt *Tgt, errs chan error, forced, 
traced bool) error {
                         tracef(CLock, "LOCK_EX: %s", fdLock.Name())
                 LockAgain:
                         if err = unix.FcntlFlock(fdLock.Fd(), 
unix.F_SETLKW, &flock); err != nil {
-                               if errors.Is(err, unix.EDEADLK) {
+                               if errors.Is(err, unix.EDEADLK) || 
errors.Is(err, unix.EINTR) {
                                         time.Sleep(10 * time.Millisecond)
                                         goto LockAgain
                                 }


Cheers and happy belated new year
Nik

[1]: https://www.man7.org/linux/man-pages/man2/fcntl.2.html#ERRORS
[2]: 
https://unix.stackexchange.com/questions/509375/what-is-interrupted-system-call

[-- Attachment #2: goredo-eintr.diff --]
[-- Type: text/x-patch, Size: 493 bytes --]

diff --git a/run.go b/run.go
index 506fd35..5423b49 100644
--- a/run.go
+++ b/run.go
@@ -227,7 +227,7 @@ func runScript(tgt *Tgt, errs chan error, forced, traced bool) error {
 			tracef(CLock, "LOCK_EX: %s", fdLock.Name())
 		LockAgain:
 			if err = unix.FcntlFlock(fdLock.Fd(), unix.F_SETLKW, &flock); err != nil {
-				if errors.Is(err, unix.EDEADLK) {
+				if errors.Is(err, unix.EDEADLK) || errors.Is(err, unix.EINTR) {
 					time.Sleep(10 * time.Millisecond)
 					goto LockAgain
 				}

             reply	other threads:[~2025-01-04 10:16 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-04  9:59 Niklas Böhm [this message]
2025-01-04 12:43 ` Handling EINTR in unix.FcntlFlock Sergey Matveev
2025-01-04 14:04   ` Niklas Böhm
2025-01-07 11:05     ` Sergey Matveev