Memory exhaustion issues

public inbox for goredo-devel@lists.stargrave.org
Atom feed

* Memory exhaustion issues
@ 2024-08-19 10:12 spacefrogg
  2024-09-05 18:49 ` Sergey Matveev
  0 siblings, 1 reply; 6+ messages in thread
From: spacefrogg @ 2024-08-19 10:12 UTC (permalink / raw)
  To: goredo-devel

Hi,

In a fairly large project in the order of several 100k build targets I 
face memory exhaustion issues.  The build closure has roughly the 
following shape:
- ca. 50 immediate dependencies max. in a single redo-ifchange call at 
several levels of the dependency graph
- max dependency chain length of 14 levels
- 1 Python process per active target

The build is run with a single active job `-j 1`.

The following happens: For each target, the Python process eventually 
waits on a redo-ifchange call to finish its sub targets.  One could 
expect that with a single job, only one process per dependency depth 
level (+1 for each redo-if*) would be created until reaching the end of 
the chain.  But this is not what happens.  Whenever a target calls 
redo-if*, redo activates a next-in-line target, which is not necessarily 
in the chain of the same process.  Instead it is likely some other 
target from an earlier redo-if* call up the dependency chain.  This 
leads to a large tree of processes, because the current chain is not 
finished. Instead it is waiting on a redo-if* call while another chain 
opens up – also eventually waiting on a redo-if* call and so forth.  
This creates hundreds of new processes effectively sleeping all the time 
(which is fine).  Due to them being active processes, they consume a lot 
of memory until exhaustion (which is not fine).

I believe the high fan out of 50 direct dependencies accommodates the 
growth of the active process tree, because it makes selecting the next 
target on a deeper level more unlikely.

It is obvious that using Python immediately contributes to the severity 
of the issue but a) a change is out of the question for several reasons 
and b) it is just uncovering a more fundamental issue which would have 
hit the next guy on a project 3 times the size or so.

There is no way, using the current redo-if* interface or any of its 
command-line options to control this development.

My suggestion would be that, if in any way possible, redo should try to 
follow the dependency graph in depth-first-order to minimize the number 
of half-executed targets waiting on redo-if* calls.

Thanks for reading!
–Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Memory exhaustion issues
  2024-08-19 10:12 Memory exhaustion issues spacefrogg
@ 2024-09-05 18:49 ` Sergey Matveev
  2024-09-05 19:49   ` goredo
  0 siblings, 1 reply; 6+ messages in thread
From: Sergey Matveev @ 2024-09-05 18:49 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1084 bytes --]

Greetings!

*** spacefrogg [2024-08-19 12:12]:
>My suggestion would be that, if in any way possible, redo should try to
>follow the dependency graph in depth-first-order to minimize the number of
>half-executed targets waiting on redo-if* calls.

This is not possible, because, unlike Make, targets that are going to be
built are not known in advance. We just can *assume* what will be built,
according to our previous run, but we can not (have no right) to build
them, if they are not explicitly requested by redo-* command. Each
redo-* invocation literally tells redo to "here, build those targets
too", and those targets also tells redo to build another ones. All of
that are dynamically generated by invokers of redo-* commands. And we
have to wait for their finishing, because as a rule we are inside some
.do file, that will have other commands after redo* calls. Personally I
see no way to change that behaviour somehow. That is the redo's nature.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Memory exhaustion issues
  2024-09-05 18:49 ` Sergey Matveev
@ 2024-09-05 19:49   ` goredo
  2024-09-08 10:11     ` Sergey Matveev
  0 siblings, 1 reply; 6+ messages in thread
From: goredo @ 2024-09-05 19:49 UTC (permalink / raw)
  To: goredo-devel

Hi,

My apologies, this has gotten a long message.

I see that I was not precise enough in my suggestion. I will try to sketch the issue and maybe we find a better resolve than the current implementation. Given, we run redo with a single job, just for the argument.

We start building a target that depends directly on 10 targets, each of them also depends on 10 more targets, none of the targets repeat. This leaves us with a dependency tree of depth 3 and 111 targets total. Let us name the target on the first level target 1, on the next level targets 10-19 and the third level 100-199. Now, because we are building with one job, we enter target 1, hit redo-ifchange 10 11 12 ... 19, and block until the call returns. Right now we have three open processes, redo, target 1 and redo-ifchange.

Right now redo is free to run anyone of targets 10-19. Let's say it picks target 10. Starts executing until it hits redo-ifchange 100 ... 109. Blocks again. Now, redo can pick any target from 11-19 and 100-109. We have two more processes open.

Now, it is important at this point that we both can agree that any of the 19 targets can be selected. If you don't agree, chime in here.

Let's continue and say that redo selects target 11, runs until redo-ifchange 110-119, picks target 12, ..., until it picks target 19 which blocks on targets 190-199. Each new 2-digit target opens two more processes that must stay open until enough of the 3-digit targets are finished. We have 23 processes open and blocked.

Let's stop here and jump back to an alternative version. Red can pick from targets 11-19 and 100-109 again and selects target 100, executes it, returns and picks 101 and so on until 109 is finished. It must pick one of 11-19 now, takes 11. Now, we can repeat this argument for each of the two-digit targets. In all these runs, we had no more than six processes running, redo, target 1, redo-ifchange, one 2-digit target, redo-ifchange and one 3-digit target.

In my first example, redo executed known and ready targets breadth first. The targets coming from the top-most redo-ifchange call (2-digit targets) were picked first. In my second example, redo executed targets depth-first. Of all known and ready targets, it prefered those of the most recent or the deepest redo-ifchange call.

I hope I could show that I was not cheating and that the depth-first order was, in fact, saving on the number of processes.

Why is that important to me? I have a project which needs to build 170k targets, distributed over a graph that is 13 layers deep. I observed that redo opens hundreds of targets from higher layers that all block on redo-ifchange.

I hope I could get my point across. I am hopeful that redo could implement some sort of priority queue that favours targets of the last redo-ifchange call over more remote ones.

Thanks for reading. :)

–Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Memory exhaustion issues
  2024-09-05 19:49   ` goredo
@ 2024-09-08 10:11     ` Sergey Matveev
  2024-09-09 10:49       ` spacefrogg
  0 siblings, 1 reply; 6+ messages in thread
From: Sergey Matveev @ 2024-09-08 10:11 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1403 bytes --]

Greetings!

*** goredo [2024-09-05 21:49]:
>Right now redo is free to run anyone of targets 10-19. Let's say it picks target 10. Starts executing until it hits redo-ifchange 100 ... 109. Blocks again. Now, redo can pick any target from 11-19 and 100-109. We have two more processes open.

I see you point, ok. But how can we run targets in parallel? The very
first process has redo-ifchange with 10 targets. It runs one of them.
But it does not know when (or if at all) that running job will run any
other targets further. So if we want to build two jobs in parallel, when
we should run the second job? We do not know if already running one will
run jobs further. If we will wait for its completion, then all spent
time we were running only a single target, without even trying to run
another job, wasted time.

If goredo runs redo-ifchange with thousand of targets, then it runs
runScript() function inside itself for each of those target. But it does
not create additional process until it gets the job token. If we run
with two jobs (-j 2) at most, and if already two jobs are running, then
no additional process will be created, until any of those jobs finishes
and returns job token back. Of course all of that will not work, if
number of allowed jobs is infinite.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Memory exhaustion issues
  2024-09-08 10:11     ` Sergey Matveev
@ 2024-09-09 10:49       ` spacefrogg
  2024-09-10  6:50         ` Sergey Matveev
  0 siblings, 1 reply; 6+ messages in thread
From: spacefrogg @ 2024-09-09 10:49 UTC (permalink / raw)
  To: goredo-devel

Hi,

> I see you point, ok. But how can we run targets in parallel? The very 
> first process has redo-ifchange with 10 targets. It runs one of them.

I see no problem there. If redo has job tokens left, it spends them as I 
described earlier. Now, that I think more about it, I think it is a 
rather simple graph traversal problem on the dependency graph.

In a graph like this:
a -> b0 -> c00 -> d000
                -> d001
         -> c01 -> d010
                -> d011
   -> b1 -> c10 -> d100
                -> d101
         -> c11 -> d110
                -> d111

Say, redo is started with 'a' and 2 job tokens. Then it picks 'b0' and 
'b1', blocks on 'b1' first and selects 'c10'. Then it blocks on 'b0'. At 
this point, 'c00', 'c01' and 'c11' are available. I would like redo to 
now prefer tasks that are dependencies of 'b1', because it has already 
started 'c10'. Preferring everything coming from the 'b1' sub tree helps 
reducing the number of blocked processes. So, it starts 'c11'. Then it 
blocks on 'c10' and starts 'd100'.
When it blocks 'c11', it, again, prefers 'd101' over 'd110' or 'd111' 
for the same reason. The path a->b1->c10->d100 is already active and 
finishing 'c10' should be preferred over opening anything that goes via 
'c11' or 'b0' (which is still blocked on its 'c0*' targets).

So, the general rules seem to be as follows.
Each target is annotated with the depth at which is was discovered. 
Start at level 0, which are the explicit arguments to redo. With each 
redo-ifchange the levels of its argument list are updated with the 
current level + 1. For example, in the following  graph:
x -> y0 -> z00
   -> z00
   -> v0
Both 'x' and 'y0' depend on 'z00'. 'y0', 'v0' and 'z00' initially get 
level 1. Later 'z00' updated to level 2. This makes 'z00' more valuable 
than 'v0', because it appears deeper in the hierarchy and should 
preferred. (Because finishing 'z00' potentially closes a long chain of 
blocked processes.)

The second important property is: Each node is ordered by order of 
discovery. You may still randomize the order of arguments to 
redo-ifchange, but this may not change the order of already discovered 
targets. Already known targets are just kept in their place and new 
targets are added to the end. This order is important to break ties if 
two targets have the same level.

Coming back to the previous example with 2 job tokens. The order of 
discovered targets annotated with levels looks like this:
a    :0
b0   :1
b1   :1
c10  :2 # because it blocked on b1 first. So these are discovered first.
c11  :2
c00  :2 # discovered later
c01  :2
d000 :3
d001 :3
d110 :3
d111 :3
...

Deeper levels always win. So, after 'a' discovered 'b0' and 'b1' and 
started both, it discovered 'c10' and 'c11' at level 2. They are the new 
deepest level and 'c10' gets executed. Now, at the time 'b0' blocks and 
adds 'c00' and 'c01' to the list, 'c11' is still available and level 2 
is still the highest priority. So, 'c11' is executed first, because it 
is higher on the list. By ordering targets on the same level this way, 
you are grouping targets that are on the same level and come from the 
same redo-ifchange call together. This ensures that, on the same level, 
you compute the dependencies of one redo-ifchange call, first, 
potentially finishing the task altogether and closing the process early. 
This also ensures that you compute deeper dependencies first, again, 
preferring to finish already opened processes before opening new ones on 
shallower levels.

Again, you can order targets on the same level any way you want as long 
as the order does not change later and targets from the same 
redo-ifchange call are grouped together. This also means that if, say, 
'c10' is discovered again on the same level, it is *not* moved in the 
list to group the targets together.

> Of course all of that will not work, if number of allowed jobs is 
> infinite.

True, if you allow infinite jobs, you are also not caring about memory, 
because you cannot know how much memory each process will take. By allow 
infinitely many processes, you already say that you think you have 
enough memory for all of them.

All of this has been very specific, but I hope you get the idea. I am 
sure you have better implementations in mind that fulfill the same 
rules.

Thanks for reading,
–Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Memory exhaustion issues
  2024-09-09 10:49       ` spacefrogg
@ 2024-09-10  6:50         ` Sergey Matveev
  0 siblings, 0 replies; 6+ messages in thread
From: Sergey Matveev @ 2024-09-10  6:50 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 784 bytes --]

Greetings!

Ok, I agreed that if we can give job tokens taking the depth/priority in
consideration, then your idea should work. Currently that is not
possible easily, because there is single pipe, where job tokens is a
single byte and who catches its first -- it will start the next job.
Some kind of centralised arbiter will be required, that will know about
each awaiting job (and its depth/priority) and will distribute job
tokens not randomly.

Seems to be not a hard task and seems that even compatibility to Make's
jobserver should not be broken. Will try to look and implement that on a
weekend, but not sure about free time.

Thanks for suggestion!

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-09-10  6:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-19 10:12 Memory exhaustion issues spacefrogg
2024-09-05 18:49 ` Sergey Matveev
2024-09-05 19:49   ` goredo
2024-09-08 10:11     ` Sergey Matveev
2024-09-09 10:49       ` spacefrogg
2024-09-10  6:50         ` Sergey Matveev