public inbox for goredo-devel@lists.stargrave.org Atom feed
* Multiple calls to redo-* for same target results in multiple .rec entries @ 2021-10-27 17:18 goredo 2021-10-31 8:21 ` Sergey Matveev 0 siblings, 1 reply; 11+ messages in thread From: goredo @ 2021-10-27 17:18 UTC (permalink / raw) To: goredo-devel Hi, Thanks for the quick response. :) I just discovered that calling redo-ifchange / redo-ifcreate multiple times on the same target, multiple entries get created in the .rec file. Order doesn't matter, all four combinations have a similar effect. At least multiple calls to the same type shouldn't be recorded twice, I guess. Not sure what recording an ifchange and ifcreate dependency simultaneously would actually mean... Kind regards, –Michael ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multiple calls to redo-* for same target results in multiple .rec entries 2021-10-27 17:18 Multiple calls to redo-* for same target results in multiple .rec entries goredo @ 2021-10-31 8:21 ` Sergey Matveev 2021-11-04 15:35 ` goredo 0 siblings, 1 reply; 11+ messages in thread From: Sergey Matveev @ 2021-10-31 8:21 UTC (permalink / raw) To: goredo-devel [-- Attachment #1: Type: text/plain, Size: 1332 bytes --] Greetings! *** goredo [2021-10-27 17:18]: >At least multiple calls to the same type shouldn't be recorded twice, I guess. Not sure what recording an ifchange and ifcreate dependency simultaneously would actually mean... Dependencies recording is done in very simple way: when we run some redo-* commands, then we open temporary .rec file and pass its opened file descriptor to the redo-* command. That redo-* command just writes to it, appending records to already opened file. So there is literally no "aggregator", just append-only file. I thought that I can replace it with the pipe, that is read by uplevel redo-* process and it can process it anyhow, removing the duplicates or do any other kind of checks. But I am not sure what behaviour we desire. Is it so wrong to have multiple entries? It is rather silly of course, but, as I assume, it won't break anything, but it clearly shows the whole timeline of redo-ifchange/redo-ifcreate calls. I thought that we can warn user that duplicate entries were recorded, but at the same time there be pretty ordinary use-cases where redo-* is called multiple times just for convenience and simplicity. So I think it is ok to leave everything as-is. -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multiple calls to redo-* for same target results in multiple .rec entries 2021-10-31 8:21 ` Sergey Matveev @ 2021-11-04 15:35 ` goredo 2021-11-09 9:13 ` Sergey Matveev 0 siblings, 1 reply; 11+ messages in thread From: goredo @ 2021-11-04 15:35 UTC (permalink / raw) To: goredo-devel Hi, For several days, you had me convinced that dependency recording could be kept that simple, but now I've triggered a bug. Write a default.do.do and create a foo/default.do from it. Try again and redo fails with ``` main.go:484: foo/default.do: Size missing ``` redo implicitly records an ifcreate dependency on default.do (as it was missing in foo/ the first time). This can, of course, be remedied by not recording .do files as ifcreate dependencies that are the current target. Removing the ifcreate dependency from the rec file fixes the issue. Still, this had me thinking what it means to have an ifcreate and ifchange dependency on a file, whether the order matters or should matter, and how other redo implementations behave in that case. Especially because redo provides no way to check or remove prior decisions. I'm leaning towards an approach that either records exactly one dependency per target, or where only the last dependency to a specific target is considered. (I.e. last writer wins semantics) What do you think? goredo should treat dependencies the same as the other implementations, shouldn't it? Regards, –Michael ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multiple calls to redo-* for same target results in multiple .rec entries 2021-11-04 15:35 ` goredo @ 2021-11-09 9:13 ` Sergey Matveev 2021-11-09 13:43 ` goredo 0 siblings, 1 reply; 11+ messages in thread From: Sergey Matveev @ 2021-11-09 9:13 UTC (permalink / raw) To: goredo-devel [-- Attachment #1: Type: text/plain, Size: 1194 bytes --] Greetings! *** goredo [2021-11-04 15:35]: >main.go:484: foo/default.do: Size missing Fixed in 1.19.0 release. Code does not check that it looks for "ifchange" dependency. Thanks for the report! Also there appeared another funny bug: when you redo foo/default.do, it is passed, ok. But when you redo it again, then that foo/default.do target itself is used as a .do to rebuild itself. Also fixed in 1.19.0. >What do you think? goredo should treat dependencies the same as the other implementations, shouldn't it? I am sure that everyone does it as it wish in practice and there is completely no common denominator among implementations. goredo initial design fully resembled github.com/leahneukirchen/redo-c and it has the same dependency tracking behaviour. So at least there are two of us :-) and redo-c seems to be quite popular. And currently anyway I am still not sure if it is a problem (current state) and if it is, then that behaviour/tracking should we expect. Current "Size missing" error is error in the code not looking for the record type. -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multiple calls to redo-* for same target results in multiple .rec entries 2021-11-09 9:13 ` Sergey Matveev @ 2021-11-09 13:43 ` goredo 2021-11-10 10:47 ` Sergey Matveev 2021-11-10 12:22 ` redo-stamp Sergey Matveev 0 siblings, 2 replies; 11+ messages in thread From: goredo @ 2021-11-09 13:43 UTC (permalink / raw) To: goredo-devel Thanks! I was just wondering about the exact semantics. For instance, I can force a non-empty target always OOD by creating an -ifcreate dependency on itself. Because the output is now clearly there, which signals OOD. If this is intended, could you document that in the FAQ? Because it's hard for newcomers to grasp the idioms of redo. I also was wondering what redo-stamp currently does, exactly. apenwarr/redo uses it to achieve the behaviour that the output of a target can be independent of it's hash. They use it like the following: ``` redo-ifchange $input_files cmd $input_files >$3 for f in $input_files; do redo-stamp <$f done ``` So, you may use the output for further computation, but the notion of the target being OOD depends entirely on the input files. This circumvents problems when cmd produces non-reproducible output, ie. including time stamps or PIDs. Supporting redo-stamp would mean, whenever a .rec file contains a stamp entry, change times, size and hash are ignored in favor of the stamps hash. Would you consider this? This is a feature that implicit output hashing cannot recreate. Kind regards, –Michael ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Multiple calls to redo-* for same target results in multiple .rec entries 2021-11-09 13:43 ` goredo @ 2021-11-10 10:47 ` Sergey Matveev 2021-11-10 12:22 ` redo-stamp Sergey Matveev 1 sibling, 0 replies; 11+ messages in thread From: Sergey Matveev @ 2021-11-10 10:47 UTC (permalink / raw) To: goredo-devel [-- Attachment #1: Type: text/plain, Size: 1531 bytes --] *** goredo [2021-11-09 13:43]: >For instance, I can force a non-empty target always OOD by creating an -ifcreate dependency on itself. Because the output is now clearly there, which signals OOD. If this is intended, could you document that in the FAQ? Because it's hard for newcomers to grasp the idioms of redo. First of all -- I am definitely could be probably wrong at assumptions and idioms :-). Of course ability of redo-ifcreate to OOD self is not something previously thought. It is just a side effect. Just nothing explicitly prevents you from doing ifcreate-dependency. Of course redo implementation could forbid it explicitly, but I do not know why and how it would harm. I think that if target's author made redo-ifcreate on something already existing -- it is problem (if it is a problem at all) of the .do author. Well, ok, redo tool can help to catch as many mistakes or strange things as much as it can, like warning about simultaneous stdout+$3 output, like touching the $1 directly. Seems that redo-ifcreate to already (currently) existing file is anyway something strange -- goredo can print warning that it records ifcreate-dependency to already existing file. But just a warning, because anyway that file can appear after a microsecond after the check, while .do-target is still not completed. Will add that, because it seems to be harmless, but possibly helpful to somebody. -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: redo-stamp 2021-11-09 13:43 ` goredo 2021-11-10 10:47 ` Sergey Matveev @ 2021-11-10 12:22 ` Sergey Matveev 1 sibling, 0 replies; 11+ messages in thread From: Sergey Matveev @ 2021-11-10 12:22 UTC (permalink / raw) To: goredo-devel [-- Attachment #1: Type: text/plain, Size: 6315 bytes --] *** goredo [2021-11-09 13:43]: >I also was wondering what redo-stamp currently does, exactly. apenwarr/redo uses it to achieve the behaviour that the output of a target can be independent of it's hash. They use it like the following: Initially goredo tried to fully resemble behaviour of apenwarr/redo and redo-stamp had (should had) completely the same behaviour. But soon I came to the confidence that redo-stamp is just useless and completely unnecessary thing and complication. The main difference between apenwarr's and my view on redo is that I am confident that it is ok to always (cryptographically) checksum target. https://redo.readthedocs.io/en/latest/FAQImpl/#why-not-always-use-checksum-based-dependencies-instead-of-timestamps http://www.goredo.cypherpunks.ru/FAQ.html In my practice, there were huge quantity of .do-s ending with something like "command -v redo-stamp > /dev/null || exit 0 ; redo-stamp <$3". I realized (and I assume that applies to most redo users using it for software building) that redo-stamping is the thing that is nearly always wished for. apenwarr/redo's documentation states somewhere that mainly always-checksumming is useful to make less false-positive OOD decisions. That is true. But I am confident that hashing can be considered pretty cheap operation. Even if it is sometimes slowing something down, it greatly simplified .do-files and overall redo implementation. apenwarr/redo basically has to ways of determining if the target is changed: * either it has different mtime+size+whatever metainformation * or it used redo-stamp and has different hash goredo, as redo-c, has single way: * it has different hash * and just as an optimization, that check can be skipped, if ctime is the same (goredo's REDO_INODE_NO_TRUST=1 can forcefully distrust everything related to inode's metainformation and hash checking will be done anyway -- most trustworthy OOD) * and as another optimization, target is OOD if its size differs 1. Can we trust mtime+other metainformation guaranteed changing if underlying file was definitely changed? According to https://apenwarr.ca/log/20181113 it is good enough in practice, but can be broken on some FUSEd filesystems. So if we want to have strong confidence of guaranteed OOD determination, then we should check the hash -- it will by definitely different is something is changed (let's forget about possible hash collisions of long enough strong cryptographic hash -- its probability is negligible) 2. Or we can use more "reliable" ctime check (again, that can also fail on strange/broken FUSE filesystems/drivers for example). apenwarr/redo does not use ctime, because it could create too many false positives (like changing the number of hard links). But ctime can also be broken/untrusted, so cryptographic hashing again will save us here As I saw, as I understand, redo-stamp is used mainly with redo-always targets. Because redo-always will anyway change inode enough to satisfy OOD decision, people use redo-stamp to skip false-positive OOD decision and resource-wasting rebuilding. redo-c/goredo's OOD determination based on inodes/hashes is very simple from implementation point of view. redo-always+redo-stamp hugely complicates overall logic and code. I look at redo-stamp as some kind of a hack to prevent redo-always targets to OOD everything they touches (that redo-always is intended to do by definition). And I came to conclusion that redo-always itself is just an ugly idea. Not the redo-always itself, but huge complications aimed to skip rebuilding of everything all the time, because OOD definitely should say "it is OOD, because it depends on always-target, that is always OOD by definition". redo-always just should be used. At least as a way many people (I saw and I assume) uses: to create some kind of target: redo-always env | sort command -v redo-stamp > /dev/null || exit 0 ; redo-stamp <$3 # command check is for compatibility with implementations without redo-stamp I used to do that all the time. But I tired of that stamps (for preventing rebuilding of literally everything, because everything depends on environment variables, for example) and of all of that complications introduced with redo-always. For me, that is just harmful idea (redo-always). All of that I tried to note in http://www.goredo.cypherpunks.ru/FAQ.html Another issue with hashes/stamps is that you do not always want to checksum the target's value itself. If someone decides that hash of unexistent target equals to empty string, and if redo implementation creates resulting file even if nothing was sent to stdout, then of course there is not way make that target always OOD (possibly that was the reason people invented redo-always?). But with goredo (and redo-c, as I remember) there is not problems: if nothing was sent to stdout, then no output file is created -- unexistent file is always OOD. But if you wish to explicitly create an empty file, then you can just always touch "$3". Constant hashing won't harm you here anyhow. If you really really wish to check only for some metainformation (only check for mtime), then nothing prevents you to create some intermediate target that contains output of (stat -f %m $1) and depend not on the (probably) huge file, but on that intermediate metainformation file having only the necessary data you wish to check. >redo-ifchange $input_files >cmd $input_files >$3 >for f in $input_files; do > redo-stamp <$f >done I do not understand where is the catch :-). redo-ifchange "$input_files" clearly explicitly states: rebuild that target (do cmd $input_files) and everyone who depends on it, if any of $input_files are changed. If $input_files are not changed, then that target won't be OOD, won't be rebuild and noone who depends on it won't be rebuild too (if it is the only dependency of course). In you example redo-stamps literally tells: this target is OOD if hash of all $input_files data is changed. redo-ifchange $input_files (with implicit hashing) tells exactly that too. Is not it? -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* redo-stamp @ 2025-03-31 17:58 Christian G. Warden 2025-04-02 7:45 ` redo-stamp Sergey Matveev 0 siblings, 1 reply; 11+ messages in thread From: Christian G. Warden @ 2025-03-31 17:58 UTC (permalink / raw) To: goredo-devel I'm exploring switching from apenwarr's redo to goredo, and found that one of my common conventions doesn't work. I frequently use redo for data pipelines. This typically involves retrieving some data from a remote resource. Fetching lots of data can be time consuming. Working with data that's an hour or a day old if often fine. So the convention I follow in data.csv.do looks like this: redo-ifchange date ... fetch data > $3 And date.do looks like this: redo-always date +%Y%m%d | redo-stamp When I run `redo data.csv`, data is only fetched if I haven't already fetched data today. If I'm doing the same analysis with data from multiple remote sources, I'll similarly have data.csv.do include `redo-ifchange user`, where user.do looks like: redo-always force active | redo-stamp So if I change my active user, data.csv will be out of date. I can of course generate `date` and `user` files rather than use redo-stamp, but is there any reason to intentionally not support this functionality? Any other suggestions? Thanks, Christian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: redo-stamp 2025-03-31 17:58 redo-stamp Christian G. Warden @ 2025-04-02 7:45 ` Sergey Matveev 2025-04-02 10:54 ` redo-stamp spacefrogg 0 siblings, 1 reply; 11+ messages in thread From: Sergey Matveev @ 2025-04-02 7:45 UTC (permalink / raw) To: goredo-devel [-- Attachment #1: Type: text/plain, Size: 1947 bytes --] Greetings! *** Christian G. Warden [2025-03-31 12:58]: >I can of course generate `date` and `user` files rather than use >redo-stamp, but is there any reason to intentionally not support this >functionality? Any other suggestions? Your case with the date.do can (and should) be easily made solely by honestly generating the "date" file: data.csv.do: redo-ifchange date ... fetch data > $3 date.do: redo-always date +%Y%m%d That works in goredo as expected. redo-ifchange means "redo that given target (data.csv) if date-file is changed". "date | redo-stamp" does not lead to changing the contents of "date" file, as its stdout was empty. As far as I remember, redo-stamp is just a hack to skip checksum/hash computation of the file to determine if it is really changed. apenwarr/redo thinks that "date" is changed if its inode's metainformation (some of its fields) is altered, without checking if its contents are still the same (unless redo-stamp was used of course). Unlike apenwarr/redo, goredo always checks its contents (by comparing cryptographic hash) if inode if altered. You can treat default goredo's behaviour as "always feeding target's output to redo-stamp". apenwarr/redo redo-stamp's it only if explicitly asked for. I assume that: date.do: redo-always date +%Y%m%d >$3 redo-stamp <$3 will work the same expected way both in apenwarr/redo and in goredo. I am convinced that redo-stamp was just a hack to skip relatively expensive SHA1 computation. And that hack should not exist at all. It adds unnecessary complications to the out-of-date decision code. redo-stamp command was left in goredo only to be able to write targets that have to have redo-stamp-hack and be run under apenwarr/redo too. -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: redo-stamp 2025-04-02 7:45 ` redo-stamp Sergey Matveev @ 2025-04-02 10:54 ` spacefrogg 2025-04-03 22:45 ` redo-stamp Andrew Chambers 0 siblings, 1 reply; 11+ messages in thread From: spacefrogg @ 2025-04-02 10:54 UTC (permalink / raw) To: goredo-devel I agree with Sergey's assessment. > It adds unnecessary complications to the out-of-date decision code. > redo-stamp command was left in goredo only to be able to write targets > that have to have redo-stamp-hack and be run under apenwarr/redo too. I want to add one thing, though. `redo-stamp` does add one small convenience. It allows you to create the hash over *some* data that describes the identity of your target, while using the original output for further computation. This makes working with noisy data more convenient (e.g. time stamped). But I grant you that this is a small convenience with the potential of confusing tool chain users down the line. –Michael ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: redo-stamp 2025-04-02 10:54 ` redo-stamp spacefrogg @ 2025-04-03 22:45 ` Andrew Chambers 0 siblings, 0 replies; 11+ messages in thread From: Andrew Chambers @ 2025-04-03 22:45 UTC (permalink / raw) To: goredo-devel For a while I had wondered about an extension to goredo such as 'redo-impure' that disables stamps and reverts back to timestamps. -- Andrew Chambers ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-04-03 23:21 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-10-27 17:18 Multiple calls to redo-* for same target results in multiple .rec entries goredo 2021-10-31 8:21 ` Sergey Matveev 2021-11-04 15:35 ` goredo 2021-11-09 9:13 ` Sergey Matveev 2021-11-09 13:43 ` goredo 2021-11-10 10:47 ` Sergey Matveev 2021-11-10 12:22 ` redo-stamp Sergey Matveev 2025-03-31 17:58 redo-stamp Christian G. Warden 2025-04-02 7:45 ` redo-stamp Sergey Matveev 2025-04-02 10:54 ` redo-stamp spacefrogg 2025-04-03 22:45 ` redo-stamp Andrew Chambers