Multiple calls to redo-* for same target results in multiple .rec entries

public inbox for goredo-devel@lists.stargrave.org
Atom feed

* Multiple calls to redo-* for same target results in multiple .rec entries
@ 2021-10-27 17:18 goredo
  2021-10-31  8:21 ` Sergey Matveev
  0 siblings, 1 reply; 11+ messages in thread
From: goredo @ 2021-10-27 17:18 UTC (permalink / raw)
  To: goredo-devel

Hi,

Thanks for the quick response. :)

I just discovered that calling redo-ifchange / redo-ifcreate multiple times on the same target, multiple entries get created in the .rec file. Order doesn't matter, all four combinations have a similar effect.

At least multiple calls to the same type shouldn't be recorded twice, I guess. Not sure what recording an ifchange and ifcreate dependency simultaneously would actually mean...

Kind regards,
–Michael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multiple calls to redo-* for same target results in multiple .rec entries
  2021-10-27 17:18 Multiple calls to redo-* for same target results in multiple .rec entries goredo
@ 2021-10-31  8:21 ` Sergey Matveev
  2021-11-04 15:35   ` goredo
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Matveev @ 2021-10-31  8:21 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1332 bytes --]

Greetings!

*** goredo [2021-10-27 17:18]:
>At least multiple calls to the same type shouldn't be recorded twice, I guess. Not sure what recording an ifchange and ifcreate dependency simultaneously would actually mean...

Dependencies recording is done in very simple way: when we run some
redo-* commands, then we open temporary .rec file and pass its opened
file descriptor to the redo-* command. That redo-* command just writes
to it, appending records to already opened file. So there is literally
no "aggregator", just append-only file.

I thought that I can replace it with the pipe, that is read by uplevel
redo-* process and it can process it anyhow, removing the duplicates or
do any other kind of checks. But I am not sure what behaviour we desire.
Is it so wrong to have multiple entries? It is rather silly of course,
but, as I assume, it won't break anything, but it clearly shows the
whole timeline of redo-ifchange/redo-ifcreate calls. I thought that we
can warn user that duplicate entries were recorded, but at the same time
there be pretty ordinary use-cases where redo-* is called multiple times
just for convenience and simplicity. So I think it is ok to leave
everything as-is.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multiple calls to redo-* for same target results in multiple .rec entries
  2021-10-31  8:21 ` Sergey Matveev
@ 2021-11-04 15:35   ` goredo
  2021-11-09  9:13     ` Sergey Matveev
  0 siblings, 1 reply; 11+ messages in thread
From: goredo @ 2021-11-04 15:35 UTC (permalink / raw)
  To: goredo-devel

Hi,

For several days, you had me convinced that dependency recording could be kept that simple, but now I've triggered a bug.

Write a default.do.do and create a foo/default.do from it. Try again and redo fails with
```
main.go:484: foo/default.do: Size missing
```

redo implicitly records an ifcreate dependency on default.do (as it was missing in foo/ the first time).

This can, of course, be remedied by not recording .do files as ifcreate dependencies that are the current target. Removing the ifcreate dependency from the rec file fixes the issue.

Still, this had me thinking what it means to have an ifcreate and ifchange dependency on a file, whether the order matters or should matter, and how other redo implementations behave in that case. Especially because redo provides no way to check or remove prior decisions. I'm leaning towards an approach that either records exactly one dependency per target, or where only the last dependency to a specific target is considered. (I.e. last writer wins semantics)

What do you think? goredo should treat dependencies the same as the other implementations, shouldn't it?

Regards,
–Michael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multiple calls to redo-* for same target results in multiple .rec entries
  2021-11-04 15:35   ` goredo
@ 2021-11-09  9:13     ` Sergey Matveev
  2021-11-09 13:43       ` goredo
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Matveev @ 2021-11-09  9:13 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]

Greetings!

*** goredo [2021-11-04 15:35]:
>main.go:484: foo/default.do: Size missing

Fixed in 1.19.0 release. Code does not check that it looks for
"ifchange" dependency. Thanks for the report!

Also there appeared another funny bug: when you redo foo/default.do, it
is passed, ok. But when you redo it again, then that foo/default.do
target itself is used as a .do to rebuild itself. Also fixed in 1.19.0.

>What do you think? goredo should treat dependencies the same as the other implementations, shouldn't it?

I am sure that everyone does it as it wish in practice and there is
completely no common denominator among implementations. goredo initial
design fully resembled github.com/leahneukirchen/redo-c and it has the
same dependency tracking behaviour. So at least there are two of us :-)
and redo-c seems to be quite popular.

And currently anyway I am still not sure if it is a problem (current
state) and if it is, then that behaviour/tracking should we expect.
Current "Size missing" error is error in the code not looking for the
record type.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multiple calls to redo-* for same target results in multiple .rec entries
  2021-11-09  9:13     ` Sergey Matveev
@ 2021-11-09 13:43       ` goredo
  2021-11-10 10:47         ` Sergey Matveev
  2021-11-10 12:22         ` redo-stamp Sergey Matveev
  0 siblings, 2 replies; 11+ messages in thread
From: goredo @ 2021-11-09 13:43 UTC (permalink / raw)
  To: goredo-devel

Thanks!

I was just wondering about the exact semantics.

For instance, I can force a non-empty target always OOD by creating an -ifcreate dependency on itself. Because the output is now clearly there, which signals OOD. If this is intended, could you document that in the FAQ? Because it's hard for newcomers to grasp the idioms of redo.

I also was wondering what redo-stamp currently does, exactly. apenwarr/redo uses it to achieve the behaviour that the output of a target can be independent of it's hash. They use it like the following:
```
redo-ifchange $input_files
cmd $input_files >$3
for f in $input_files; do
  redo-stamp <$f
done
```

So, you may use the output for further computation, but the notion of the target being OOD depends entirely on the input files. This circumvents problems when cmd produces non-reproducible output, ie. including time stamps or PIDs.

Supporting redo-stamp would mean, whenever a .rec file contains a stamp entry, change times, size and hash are ignored in favor of the stamps hash.

Would you consider this? This is a feature that implicit output hashing cannot recreate.

Kind regards,
–Michael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multiple calls to redo-* for same target results in multiple .rec entries
  2021-11-09 13:43       ` goredo
@ 2021-11-10 10:47         ` Sergey Matveev
  2021-11-10 12:22         ` redo-stamp Sergey Matveev
  1 sibling, 0 replies; 11+ messages in thread
From: Sergey Matveev @ 2021-11-10 10:47 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1531 bytes --]

*** goredo [2021-11-09 13:43]:
>For instance, I can force a non-empty target always OOD by creating an -ifcreate dependency on itself. Because the output is now clearly there, which signals OOD. If this is intended, could you document that in the FAQ? Because it's hard for newcomers to grasp the idioms of redo.

First of all -- I am definitely could be probably wrong at assumptions
and idioms :-). Of course ability of redo-ifcreate to OOD self is not
something previously thought. It is just a side effect. Just nothing
explicitly prevents you from doing ifcreate-dependency. Of course redo
implementation could forbid it explicitly, but I do not know why and how
it would harm.

I think that if target's author made redo-ifcreate on something already
existing -- it is problem (if it is a problem at all) of the .do author.
Well, ok, redo tool can help to catch as many mistakes or strange things
as much as it can, like warning about simultaneous stdout+$3 output,
like touching the $1 directly. Seems that redo-ifcreate to already
(currently) existing file is anyway something strange -- goredo can
print warning that it records ifcreate-dependency to already existing
file. But just a warning, because anyway that file can appear after a
microsecond after the check, while .do-target is still not completed.
Will add that, because it seems to be harmless, but possibly helpful to
somebody.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: redo-stamp
  2021-11-09 13:43       ` goredo
  2021-11-10 10:47         ` Sergey Matveev
@ 2021-11-10 12:22         ` Sergey Matveev
  1 sibling, 0 replies; 11+ messages in thread
From: Sergey Matveev @ 2021-11-10 12:22 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 6315 bytes --]

*** goredo [2021-11-09 13:43]:
>I also was wondering what redo-stamp currently does, exactly. apenwarr/redo uses it to achieve the behaviour that the output of a target can be independent of it's hash. They use it like the following:

Initially goredo tried to fully resemble behaviour of apenwarr/redo and
redo-stamp had (should had) completely the same behaviour. But soon I
came to the confidence that redo-stamp is just useless and completely
unnecessary thing and complication.

The main difference between apenwarr's and my view on redo is that I am
confident that it is ok to always (cryptographically) checksum target.
https://redo.readthedocs.io/en/latest/FAQImpl/#why-not-always-use-checksum-based-dependencies-instead-of-timestamps
http://www.goredo.cypherpunks.ru/FAQ.html
In my practice, there were huge quantity of .do-s ending with something
like "command -v redo-stamp > /dev/null || exit 0 ; redo-stamp <$3". I
realized (and I assume that applies to most redo users using it for
software building) that redo-stamping is the thing that is nearly always
wished for. apenwarr/redo's documentation states somewhere that mainly
always-checksumming is useful to make less false-positive OOD decisions.
That is true. But I am confident that hashing can be considered pretty
cheap operation. Even if it is sometimes slowing something down, it
greatly simplified .do-files and overall redo implementation.

apenwarr/redo basically has to ways of determining if the target is changed:

* either it has different mtime+size+whatever metainformation
* or it used redo-stamp and has different hash

goredo, as redo-c, has single way:

* it has different hash
* and just as an optimization, that check can be skipped, if ctime is
  the same (goredo's REDO_INODE_NO_TRUST=1 can forcefully distrust
  everything related to inode's metainformation and hash checking will
  be done anyway -- most trustworthy OOD)
* and as another optimization, target is OOD if its size differs

1. Can we trust mtime+other metainformation guaranteed changing if
   underlying file was definitely changed? According to
   https://apenwarr.ca/log/20181113 it is good enough in practice, but
   can be broken on some FUSEd filesystems. So if we want to have strong
   confidence of guaranteed OOD determination, then we should check the
   hash -- it will by definitely different is something is changed
   (let's forget about possible hash collisions of long enough strong
   cryptographic hash -- its probability is negligible)
2. Or we can use more "reliable" ctime check (again, that can also fail
   on strange/broken FUSE filesystems/drivers for example).
   apenwarr/redo does not use ctime, because it could create too many
   false positives (like changing the number of hard links). But ctime
   can also be broken/untrusted, so cryptographic hashing again will
   save us here

As I saw, as I understand, redo-stamp is used mainly with redo-always
targets. Because redo-always will anyway change inode enough to satisfy
OOD decision, people use redo-stamp to skip false-positive OOD decision
and resource-wasting rebuilding. redo-c/goredo's OOD determination based
on inodes/hashes is very simple from implementation point of view.
redo-always+redo-stamp hugely complicates overall logic and code. I look
at redo-stamp as some kind of a hack to prevent redo-always targets to
OOD everything they touches (that redo-always is intended to do by
definition).

And I came to conclusion that redo-always itself is just an ugly idea.
Not the redo-always itself, but huge complications aimed to skip
rebuilding of everything all the time, because OOD definitely should say
"it is OOD, because it depends on always-target, that is always OOD by
definition". redo-always just should be used. At least as a way many
people (I saw and I assume) uses: to create some kind of target:
    redo-always
    env | sort
    command -v redo-stamp > /dev/null || exit 0 ; redo-stamp <$3
    # command check is for compatibility with implementations without redo-stamp
I used to do that all the time. But I tired of that stamps (for
preventing rebuilding of literally everything, because everything
depends on environment variables, for example) and of all of that
complications introduced with redo-always. For me, that is just harmful
idea (redo-always). All of that I tried to note in
http://www.goredo.cypherpunks.ru/FAQ.html

Another issue with hashes/stamps is that you do not always want to
checksum the target's value itself. If someone decides that hash of
unexistent target equals to empty string, and if redo implementation
creates resulting file even if nothing was sent to stdout, then of
course there is not way make that target always OOD (possibly that was
the reason people invented redo-always?). But with goredo (and redo-c,
as I remember) there is not problems: if nothing was sent to stdout,
then no output file is created -- unexistent file is always OOD. But if
you wish to explicitly create an empty file, then you can just always
touch "$3". Constant hashing won't harm you here anyhow.

If you really really wish to check only for some metainformation (only
check for mtime), then nothing prevents you to create some intermediate
target that contains output of (stat -f %m $1) and depend not on the
(probably) huge file, but on that intermediate metainformation file
having only the necessary data you wish to check.

>redo-ifchange $input_files
>cmd $input_files >$3
>for f in $input_files; do
>  redo-stamp <$f
>done

I do not understand where is the catch :-). redo-ifchange "$input_files"
clearly explicitly states: rebuild that target (do cmd $input_files) and
everyone who depends on it, if any of $input_files are changed. If
$input_files are not changed, then that target won't be OOD, won't be
rebuild and noone who depends on it won't be rebuild too (if it is the
only dependency of course). In you example redo-stamps literally tells:
this target is OOD if hash of all $input_files data is changed.
redo-ifchange $input_files (with implicit hashing) tells exactly that
too. Is not it?

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263  6422 AE1A 8109 E498 57EF

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* redo-stamp
@ 2025-03-31 17:58 Christian G. Warden
  2025-04-02  7:45 ` redo-stamp Sergey Matveev
  0 siblings, 1 reply; 11+ messages in thread
From: Christian G. Warden @ 2025-03-31 17:58 UTC (permalink / raw)
  To: goredo-devel

I'm exploring switching from apenwarr's redo to goredo, and found
that one of my common conventions doesn't work.

I frequently use redo for data pipelines.  This typically involves
retrieving some data from a remote resource.  Fetching lots of data
can be time consuming.  Working with data that's an hour or a day old
if often fine.

So the convention I follow in data.csv.do looks like this:
redo-ifchange date
... fetch data > $3

And date.do looks like this:
redo-always
date +%Y%m%d | redo-stamp

When I run `redo data.csv`, data is only fetched if I haven't already
fetched data today.

If I'm doing the same analysis with data from multiple remote sources,
I'll similarly have data.csv.do include `redo-ifchange user`, where
user.do looks like:
redo-always
force active | redo-stamp

So if I change my active user, data.csv will be out of date.

I can of course generate `date` and `user` files rather than use
redo-stamp, but is there any reason to intentionally not support this
functionality?  Any other suggestions?

Thanks,
Christian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: redo-stamp
  2025-03-31 17:58 redo-stamp Christian G. Warden
@ 2025-04-02  7:45 ` Sergey Matveev
  2025-04-02 10:54   ` redo-stamp spacefrogg
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Matveev @ 2025-04-02  7:45 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1947 bytes --]

Greetings!

*** Christian G. Warden [2025-03-31 12:58]:
>I can of course generate `date` and `user` files rather than use
>redo-stamp, but is there any reason to intentionally not support this
>functionality?  Any other suggestions?

Your case with the date.do can (and should) be easily made solely by
honestly generating the "date" file:

    data.csv.do:
        redo-ifchange date
        ... fetch data > $3

    date.do:
        redo-always
        date +%Y%m%d

That works in goredo as expected. redo-ifchange means "redo that given
target (data.csv) if date-file is changed". "date | redo-stamp" does not
lead to changing the contents of "date" file, as its stdout was empty.

As far as I remember, redo-stamp is just a hack to skip checksum/hash
computation of the file to determine if it is really changed.
apenwarr/redo thinks that "date" is changed if its inode's
metainformation (some of its fields) is altered, without checking if its
contents are still the same (unless redo-stamp was used of course).
Unlike apenwarr/redo, goredo always checks its contents (by comparing
cryptographic hash) if inode if altered.

You can treat default goredo's behaviour as "always feeding target's
output to redo-stamp". apenwarr/redo redo-stamp's it only if explicitly
asked for. I assume that:

    date.do:
        redo-always
        date +%Y%m%d >$3
        redo-stamp <$3

will work the same expected way both in apenwarr/redo and in goredo.
I am convinced that redo-stamp was just a hack to skip relatively
expensive SHA1 computation. And that hack should not exist at all.
It adds unnecessary complications to the out-of-date decision code.
redo-stamp command was left in goredo only to be able to write targets
that have to have redo-stamp-hack and be run under apenwarr/redo too.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: redo-stamp
  2025-04-02  7:45 ` redo-stamp Sergey Matveev
@ 2025-04-02 10:54   ` spacefrogg
  2025-04-03 22:45     ` redo-stamp Andrew Chambers
  0 siblings, 1 reply; 11+ messages in thread
From: spacefrogg @ 2025-04-02 10:54 UTC (permalink / raw)
  To: goredo-devel

I agree with Sergey's assessment.

> It adds unnecessary complications to the out-of-date decision code.
> redo-stamp command was left in goredo only to be able to write targets
> that have to have redo-stamp-hack and be run under apenwarr/redo too.

I want to add one thing, though. `redo-stamp` does add one small 
convenience.
It allows you to create the hash over *some* data that describes the 
identity of your target,
while using the original output for further computation. This makes 
working with noisy data
more convenient (e.g. time stamped). But I grant you that this is a 
small convenience
with the potential of confusing tool chain users down the line.

–Michael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: redo-stamp
  2025-04-02 10:54   ` redo-stamp spacefrogg
@ 2025-04-03 22:45     ` Andrew Chambers
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Chambers @ 2025-04-03 22:45 UTC (permalink / raw)
  To: goredo-devel

For a while I had wondered about an extension to goredo such as 'redo-impure' that disables stamps and reverts back to timestamps.

-- 
  Andrew Chambers

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-04-03 23:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-27 17:18 Multiple calls to redo-* for same target results in multiple .rec entries goredo
2021-10-31  8:21 ` Sergey Matveev
2021-11-04 15:35   ` goredo
2021-11-09  9:13     ` Sergey Matveev
2021-11-09 13:43       ` goredo
2021-11-10 10:47         ` Sergey Matveev
2021-11-10 12:22         ` redo-stamp Sergey Matveev
2025-03-31 17:58 redo-stamp Christian G. Warden
2025-04-02  7:45 ` redo-stamp Sergey Matveev
2025-04-02 10:54   ` redo-stamp spacefrogg
2025-04-03 22:45     ` redo-stamp Andrew Chambers