public inbox for goredo-devel@lists.stargrave.org
Atom feed
* Handling EINTR in unix.FcntlFlock
@ 2025-01-04  9:59 Niklas Böhm
  2025-01-04 12:43 ` Sergey Matveev
  0 siblings, 1 reply; 4+ messages in thread
From: Niklas Böhm @ 2025-01-04  9:59 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1932 bytes --]

Greetings everyone,

I was using goredo on an NFS and noticed that I sometimes ran into 
issues where my program would fail with the following error:

	run.go:234: interrupted system call /gpfs01/.../folders/.redo/1.zip.lock

After doing some digging, it seems like the problem is that calling 
unix.FcntlFlock with F_SETLKW can be too slow over an NFS and will get 
interrupted (see `man 2 flock`, Section on errors [1]).  Apparently 
there is an automatic restart mechanism [2], but it's also unreliable, 
so I thought it's better to handle it explicitly and basically extend 
the error check from

	if errors.Is(err, unix.EDEADLK) {

to

	if errors.Is(err, unix.EDEADLK) || errors.Is(err, unix.EINTR) {

This seems to resolve the interrupted system call error above. 
Unfortunately I cannot realiably reproduce this error, but since the fix 
is reasonaly easy, I was hoping that it could be incorporated into 
goredo proper.

I have attached the small diff, and here it is also reproduced, in case 
the explicit line is not clear:

diff --git a/run.go b/run.go
index 506fd35..5423b49 100644
--- a/run.go
+++ b/run.go
@@ -227,7 +227,7 @@ func runScript(tgt *Tgt, errs chan error, forced, 
traced bool) error {
                         tracef(CLock, "LOCK_EX: %s", fdLock.Name())
                 LockAgain:
                         if err = unix.FcntlFlock(fdLock.Fd(), 
unix.F_SETLKW, &flock); err != nil {
-                               if errors.Is(err, unix.EDEADLK) {
+                               if errors.Is(err, unix.EDEADLK) || 
errors.Is(err, unix.EINTR) {
                                         time.Sleep(10 * time.Millisecond)
                                         goto LockAgain
                                 }


Cheers and happy belated new year
Nik

[1]: https://www.man7.org/linux/man-pages/man2/fcntl.2.html#ERRORS
[2]: 
https://unix.stackexchange.com/questions/509375/what-is-interrupted-system-call

[-- Attachment #2: goredo-eintr.diff --]
[-- Type: text/x-patch, Size: 493 bytes --]

diff --git a/run.go b/run.go
index 506fd35..5423b49 100644
--- a/run.go
+++ b/run.go
@@ -227,7 +227,7 @@ func runScript(tgt *Tgt, errs chan error, forced, traced bool) error {
 			tracef(CLock, "LOCK_EX: %s", fdLock.Name())
 		LockAgain:
 			if err = unix.FcntlFlock(fdLock.Fd(), unix.F_SETLKW, &flock); err != nil {
-				if errors.Is(err, unix.EDEADLK) {
+				if errors.Is(err, unix.EDEADLK) || errors.Is(err, unix.EINTR) {
 					time.Sleep(10 * time.Millisecond)
 					goto LockAgain
 				}

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-01-07 11:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-04  9:59 Handling EINTR in unix.FcntlFlock Niklas Böhm
2025-01-04 12:43 ` Sergey Matveev
2025-01-04 14:04   ` Niklas Böhm
2025-01-07 11:05     ` Sergey Matveev