public inbox for goredo-devel@lists.stargrave.org Atom feed
* Handling EINTR in unix.FcntlFlock
@ 2025-01-04 9:59 Niklas Böhm
2025-01-04 12:43 ` Sergey Matveev
0 siblings, 1 reply; 4+ messages in thread
From: Niklas Böhm @ 2025-01-04 9:59 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 1932 bytes --]
Greetings everyone,
I was using goredo on an NFS and noticed that I sometimes ran into
issues where my program would fail with the following error:
run.go:234: interrupted system call /gpfs01/.../folders/.redo/1.zip.lock
After doing some digging, it seems like the problem is that calling
unix.FcntlFlock with F_SETLKW can be too slow over an NFS and will get
interrupted (see `man 2 flock`, Section on errors [1]). Apparently
there is an automatic restart mechanism [2], but it's also unreliable,
so I thought it's better to handle it explicitly and basically extend
the error check from
if errors.Is(err, unix.EDEADLK) {
to
if errors.Is(err, unix.EDEADLK) || errors.Is(err, unix.EINTR) {
This seems to resolve the interrupted system call error above.
Unfortunately I cannot realiably reproduce this error, but since the fix
is reasonaly easy, I was hoping that it could be incorporated into
goredo proper.
I have attached the small diff, and here it is also reproduced, in case
the explicit line is not clear:
diff --git a/run.go b/run.go
index 506fd35..5423b49 100644
--- a/run.go
+++ b/run.go
@@ -227,7 +227,7 @@ func runScript(tgt *Tgt, errs chan error, forced,
traced bool) error {
tracef(CLock, "LOCK_EX: %s", fdLock.Name())
LockAgain:
if err = unix.FcntlFlock(fdLock.Fd(),
unix.F_SETLKW, &flock); err != nil {
- if errors.Is(err, unix.EDEADLK) {
+ if errors.Is(err, unix.EDEADLK) ||
errors.Is(err, unix.EINTR) {
time.Sleep(10 * time.Millisecond)
goto LockAgain
}
Cheers and happy belated new year
Nik
[1]: https://www.man7.org/linux/man-pages/man2/fcntl.2.html#ERRORS
[2]:
https://unix.stackexchange.com/questions/509375/what-is-interrupted-system-call
[-- Attachment #2: goredo-eintr.diff --]
[-- Type: text/x-patch, Size: 493 bytes --]
diff --git a/run.go b/run.go
index 506fd35..5423b49 100644
--- a/run.go
+++ b/run.go
@@ -227,7 +227,7 @@ func runScript(tgt *Tgt, errs chan error, forced, traced bool) error {
tracef(CLock, "LOCK_EX: %s", fdLock.Name())
LockAgain:
if err = unix.FcntlFlock(fdLock.Fd(), unix.F_SETLKW, &flock); err != nil {
- if errors.Is(err, unix.EDEADLK) {
+ if errors.Is(err, unix.EDEADLK) || errors.Is(err, unix.EINTR) {
time.Sleep(10 * time.Millisecond)
goto LockAgain
}
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: Handling EINTR in unix.FcntlFlock 2025-01-04 9:59 Handling EINTR in unix.FcntlFlock Niklas Böhm @ 2025-01-04 12:43 ` Sergey Matveev 2025-01-04 14:04 ` Niklas Böhm 0 siblings, 1 reply; 4+ messages in thread From: Sergey Matveev @ 2025-01-04 12:43 UTC (permalink / raw) To: goredo-devel [-- Attachment #1: Type: text/plain, Size: 282 bytes --] Greetings, Niklas! *** Niklas Böhm [2025-01-04 10:59]: >[...] I have attached the small diff [...] Thanks for finding the issue and fixing it! Released in 2.6.4. -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Handling EINTR in unix.FcntlFlock 2025-01-04 12:43 ` Sergey Matveev @ 2025-01-04 14:04 ` Niklas Böhm 2025-01-07 11:05 ` Sergey Matveev 0 siblings, 1 reply; 4+ messages in thread From: Niklas Böhm @ 2025-01-04 14:04 UTC (permalink / raw) To: goredo-devel Thank you for the quick response and release. I have updated the package on AUR. Now that I was looking at the implementation, would it maybe make sense to change the sleep time from 10ms to something a bit more randomized? So that two processes don't repeatedly run into the same deadlock error? (And I think it also would not hurt for an NFS to have the file calls de-synchronized.) Could take a uniform distribution between 0 and 20ms so that on average they will sleep for 10ms still. On 1/4/25 1:43 PM, Sergey Matveev wrote: > Greetings, Niklas! > > *** Niklas Böhm [2025-01-04 10:59]: >> [...] I have attached the small diff [...] > > Thanks for finding the issue and fixing it! Released in 2.6.4. > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Handling EINTR in unix.FcntlFlock 2025-01-04 14:04 ` Niklas Böhm @ 2025-01-07 11:05 ` Sergey Matveev 0 siblings, 0 replies; 4+ messages in thread From: Sergey Matveev @ 2025-01-07 11:05 UTC (permalink / raw) To: goredo-devel [-- Attachment #1: Type: text/plain, Size: 626 bytes --] *** Niklas Böhm [2025-01-04 15:04]: >Now that I was looking at the implementation, would it maybe make sense to >change the sleep time from 10ms to something a bit more randomized? I am not sure that goroutine scheduler, OS'es schedulers delays are not jitter-ed enough, but adding randomised sleep time is easy thing, so why not. Added in d0d5eb2605461569e985f9d502162b0f54392873 commit master branch. Not sure if it is so important to make a release for that change, so it will appear in the next one. -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-01-07 11:06 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-01-04 9:59 Handling EINTR in unix.FcntlFlock Niklas Böhm 2025-01-04 12:43 ` Sergey Matveev 2025-01-04 14:04 ` Niklas Böhm 2025-01-07 11:05 ` Sergey Matveev