Skip to content

Refactored tokenizer and parser output to help reduce the amount of memory needed.#7602

Merged
erictraut merged 1 commit intomainfrom
reduceTokenOverhead
Apr 2, 2024
Merged

Refactored tokenizer and parser output to help reduce the amount of memory needed.#7602
erictraut merged 1 commit intomainfrom
reduceTokenOverhead

Conversation

@erictraut
Copy link
Copy Markdown
Collaborator

@erictraut erictraut commented Apr 2, 2024

@rchiodo, this is an alternate approach to optimizing the tokenizer data structures. With this change, the tokens for a file are not cached in memory unless the file is currently open in the editor. They are recomputed on demand if/when needed. This should theoretically reduce the memory usage more than your proposed change. I think this change is less complex than yours — and therefore easier to maintain.

Could you please run your memory benchmarks on this branch to confirm the memory savings and quantify the savings relative to your proposed change? We can then use this data to inform which change is preferable.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2024

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@rchiodo
Copy link
Copy Markdown
Collaborator

rchiodo commented Apr 2, 2024

Thanks, will do.

@rchiodo
Copy link
Copy Markdown
Collaborator

rchiodo commented Apr 2, 2024

So I renamed the token TextRangeCollection to a TokenCollection class so I could compare them (this allows it to show up in the heap snapshot output).

In the worst case scenario where I open the file that has a ton of tokens, the compressed version is better:

Non pinned tokens (just one reference for the single open file):
image

Compressed tokens (11 references for all the files parsed):
image

However if I close that file and open a different one, the amount of memory goes way down (non-pinned version):
image

Whereas the compressed version still holds onto all compressed tokens.

I'm going to try with Pylance to see what the difference there is.

@rchiodo
Copy link
Copy Markdown
Collaborator

rchiodo commented Apr 2, 2024

With pylance I tried loading the example project that would crash before.

The non-pinned version would get close to the memory limit but would gc and move back away from crashing.

The compressed version never got close to the memory limit. However it did hold onto a lot more memory.

Token collection retained size in the non pinned version:
image

Token collection retained size in compressed version:
image

Non pinned version seems better in an actual project.

You can see the total memory allocated is smaller:

Non pinned tokens:

image

Token compression:

image

@rchiodo
Copy link
Copy Markdown
Collaborator

rchiodo commented Apr 2, 2024

I'm going to do some perf analysis of the non pinned version to see if there's any perf impact.

@rchiodo
Copy link
Copy Markdown
Collaborator

rchiodo commented Apr 2, 2024

Perf analysis of the non-pinned version shows it has zero impact on a couple of different projects. I'd imagine the only real impact would be in having to GC more often.

@erictraut
Copy link
Copy Markdown
Collaborator Author

Thanks for the analysis.

One thing that's still bothering me... Why are we seeing heap exhaustion in the first place? We have code in place to monitor heap usage and deallocate cached information (including all tokenizer output, parser output, and type cache data) when we exceed a high-water mark for memory usage. The only reason we should ever run out of heap space is if a single operation requires that much memory and does not "come up for air" to check whether the heap is approaching exhaustion, and I have a hard time believing that's happening here. What operation are you performing when you see this occur? The only other explanation is that we have some bug where we're missing a heap exhaustion check along some code path or we're failing to deallocate all of the cached information under certain circumstances. It would be good to understand this root cause.

@rchiodo
Copy link
Copy Markdown
Collaborator

rchiodo commented Apr 2, 2024

Thanks for the analysis.

One thing that's still bothering me... Why are we seeing heap exhaustion in the first place? We have code in place to monitor heap usage and deallocate cached information (including all tokenizer output, parser output, and type cache data) when we exceed a high-water mark for memory usage. The only reason we should ever run out of heap space is if a single operation requires that much memory and does not "come up for air" to check whether the heap is approaching exhaustion, and I have a hard time believing that's happening here. What operation are you performing when you see this occur? The only other explanation is that we have some bug where we're missing a heap exhaustion check along some code path or we're failing to deallocate all of the cached information under certain circumstances. It would be good to understand this root cause.

I believe the reason is because we have 3 threads running and the heap information doesn't accurately reflect the usage of the other threads. Each thread thinks it only has 1GB or so allocated. This seems to be a limitation that happens when using pointer compression (it only happens in VS code). Without using VS code's node exe, each thread gets its own heap and we don't run out of memory. The heap stats function in the VS code case doesn't report the usage of all threads.

When I did the initial investigation on the microsoft/pylance-release#5440 bug, I had implemented something that would force a cache cleanup in the middle of binding and would share heap stats information using a SharedArrayBuffer. That did help, but it was slow and dumping parse results while binding other files would just make some parse trees not work.

@rchiodo
Copy link
Copy Markdown
Collaborator

rchiodo commented Apr 2, 2024

@heejaechang and I both did independent analysis of the shared memory problem. He had the same results. In VS code, the different worker threads don't each get their own 4GB, they share it. Meaning our heap high water mark is always wrong when there are multiple threads involved.

So maybe we should also modify the heap analysis code to share the total memory allocated between threads?

I had written something to do that and it seemed to work. That would help with this issue too.

@erictraut
Copy link
Copy Markdown
Collaborator Author

Meaning our heap high water mark is always wrong when there are multiple threads involved.

That sounds like a big problem. Without solving this, we'll continue to see heap exhaustion issues under certain conditions. I think we need to invest in a solution for this underlying problem rather than playing whack-a-mole. Reducing memory usage is good in general, but it's not addressing the core problem here.

@rchiodo
Copy link
Copy Markdown
Collaborator

rchiodo commented Apr 2, 2024

Meaning our heap high water mark is always wrong when there are multiple threads involved.

That sounds like a big problem. Without solving this, we'll continue to see heap exhaustion issues under certain conditions. I think we need to invest in a solution for this underlying problem rather than playing whack-a-mole. Reducing memory usage is good in general, but it's not addressing the core problem here.

Okay I'll submit a change that uses a SharedArrayBuffer to share memory usage between threads to see what you think.

@heejaechang
Copy link
Copy Markdown
Collaborator

That sounds like a big problem. Without solving this, we'll continue to see heap exhaustion issues under certain conditions. I think we need to invest in a solution for this underlying problem rather than playing whack-a-mole. Reducing memory usage is good in general, but it's not addressing the core problem here.

we introduced a new option - https://github.com/microsoft/pylance-release/blob/main/TROUBLESHOOTING.md#pylance-is-crashing - that lets users to use their own node for language server rather than using vscode as node to run language server. that should solve the issue generally for users that has big workspace/workspaces needing more than 4GB.

we didn't make it a default since 1. using vscode as node is generally more convenient and 2. uses less memory and 3. a bit faster (10%~20%) through pointer compression. and we believe most of users, 4GB is enough.

@erictraut
Copy link
Copy Markdown
Collaborator Author

I don't think it makes sense to say "if pylance is crashing, try this option". Pylance should not crash by default.

The heap usage monitoring logic should prevent heap exhaustion if the information it receives is accurate. The SharedArrayBuffer approach sounds promising.

@heejaechang
Copy link
Copy Markdown
Collaborator

heejaechang commented Apr 2, 2024

I also did some investigation on microsoft/pylance-release#5440 using chrome/edge's heap snapshot.

among all heaps usage, about 50% is text and tokens. about 25% is used by text. and another 25% is used by tokens.

so, @rich is trying to reduce token's memory footprint. and I am also experimenting on reducing text's footprint.

...

about on why our memory pressure didn't work for the issue is that, that particular library has a lot of imports transitively. so import the top module caused hundreds of files to be parsed and bound and they have several over 20MB texts.

so the issue is the existing cache dump only works between top level operations, it can't dump stuff from memory in the middle of import look ups.

@heejaechang
Copy link
Copy Markdown
Collaborator

I don't think it makes sense to say "if pylance is crashing, try this option". Pylance should not crash by default.

with pointer compression that put hard 4GB limit, crash will still happen for some people. or even if they don't crash, UX will be bad since we will continuously dump and reparse/bind files.

probably, best option is convincing vscode to not do pointer compression but that comes with its own drawbacks such as using more memory for the most of people that 4GB is enough for them.

@erictraut
Copy link
Copy Markdown
Collaborator Author

about 50% is text

By "text", I assume you mean the contents of source files that have been tokenized & parsed? We could use the same technique that I implemented in this branch but apply it to the file contents in addition to the tokenizer output. This would require that we reload the text on demand (from disk) if and when it's needed.

that particular library has a lot of imports transitively

Ah yes, I remember this one now. We had talked about eliminating the need to follow imports transitively at the expense of not having the correct icon in the completion suggestion menu. That seems like a good tradeoff to me. You mentioned in this issue that you fixed the problem, but you never responded to my question about which fix you implemented.

@heejaechang
Copy link
Copy Markdown
Collaborator

I am also experimenting on reducing text's footprint

by the way, the experiment I am doing is introducing ISourceText and making ParseResults and AnalyzeNodeFileInfo to hold onto the ISourceText rather than string directly.

according to the experiment I did, node with pointer compression has 4GB limit for managed heap. but array buffer heap gets its own 4GB heap. so I am planning to put text in the array buffer heap instead of managed heap where regular js objects live.

I looked through code where text is being used once it is parsed/bound. and this is only place - https://github.com/microsoft/pyright/blob/main/packages/pyright-internal/src/analyzer/typeEvaluator.ts#L26689 - except in LS side.

so, there is no reason to hold onto the text in the managed heap consuming 25% of total heap.

my plan is putting those text in the array buffer and convert to string when needed for closed files (for things like find all references and etc) and cache string for open files.

@heejaechang
Copy link
Copy Markdown
Collaborator

heejaechang commented Apr 2, 2024

You mentioned in microsoft/pylance-release#4241 (comment) that you fixed the problem, but you never responded to my question about which fix you implemented.

ah, I linked to the code in pyrx and I guess that isn't visible to you. this is where I have put it - https://github.com/microsoft/pyright/blob/main/packages/pyright-internal/src/languageService/completionProvider.ts#L637

and

We had talked about eliminating the need to follow imports transitively at the expense of not having the correct icon in the completion suggestion menu

this is just one code path that reaches import look up through resolve alias call. there are many code path to import look up such as importing symbols through * in binder or resolving imported symbols in type evals.

that code path all potentially can hit the same issue. so if we want to fix the root cause, we probably need to figure out a way to dump memory in the middle of this import look up chain.

@heejaechang
Copy link
Copy Markdown
Collaborator

This would require that we reload the text on demand (from disk) if and when it's needed.

I think this is a bit dangerous since now text and parse tree's position info can get out of sync. I think putting it in the array buffer and its own heap is safer since it can guarantee data integrity. otherwise, we just need to live with possible race or put some kind of other way to check data integrity such as content hash somewhere and we also need to deal with invalid case. so putting it array buffer seems simpler.

also, another reason we want ISourceText abstraction is so that pylance can abstract it out for other features (such as synthesize text)

@erictraut erictraut merged commit 5fd8830 into main Apr 2, 2024
@erictraut erictraut deleted the reduceTokenOverhead branch April 3, 2024 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants