Save some memory in token creation #7434

rchiodo · 2024-03-09T00:10:30Z

While investigating this issue:
microsoft/pylance-release#5440

Erik and I noticed that the escapedValue for a bunch of different tokens was not actually allocated from the original string. It was an entirely new string, meaning a new allocation.

It seems that it doesn't need to be that way. It can just point to the original string.

I verified this uses less memory by snapshotting a run of pylance with a file with a lot of string literals in it.

In the original version, we end up with a parse tree like so:

You can see that it's allocated 18.9 MB for the tokens themselves.

If we expand into these tokens, you can see one of the largest ones has its own copy of the giant literal string at the top of the file:

That token has 32K allocated to hold that large literal string.

After this change, the amount of memory for the same run goes down by 800K:

And if we find that same token with the escaped value, it's now a sliced value, which is really just a reference to the original string:

rchiodo · 2024-03-09T00:16:01Z

packages/pyright-internal/src/analyzer/parseTreeUtils.ts

            let escapedString = node.token.escapedValue;
            if ((flags & PrintExpressionFlags.DoNotLimitStringLength) === 0) {
                const maxStringLength = 32;
-                escapedString = escapedString.substring(0, maxStringLength);


These slices aren't really necessary, it's just that substring has been deprecated. I figured I'd switch on the files I was looking at. I can put these back if we want to not bother with these changes.

rchiodo · 2024-03-09T00:16:42Z

packages/pyright-internal/src/parser/tokenizer.ts

-        let escapedValueParts: number[] = [];
+        const start = this._cs.position;
+        let escapedValueLength = 0;
+        const getEscapedValue = () => this._cs.getText().slice(start, start + escapedValueLength);


This is where the real change happens. Instead of building the escaped value from the the char codes, it can just be built from the literal string itself.

rchiodo · 2024-03-09T00:17:23Z

packages/pyright-internal/src/parser/tokenizer.ts

            }
        }

-        // String.fromCharCode.apply crashes (stack overflow) if passed an array


This is also unnecessary now since we're just slicing into the original string.

rchiodo · 2024-03-09T01:08:18Z

packages/pyright-internal/src/parser/stringTokenUtils.ts

        let curChar = getEscapedCharacter();
        if (curChar === Char.EndOfText) {
-            return completeUnescapedString(output);
+            return completeUnescapedString(output, escapedString);


Found this one too. Netted another 200K in memory savings with the test app I'm using

erictraut

A relatively simple change that results in pretty significant memory savings. Love it!

I left one small comment. Looks like we can remove some additional code.

erictraut · 2024-03-09T01:40:04Z

packages/pyright-internal/src/parser/tokenizer.ts

-        // a string literal or a docstring, so this should be fine.
-        if (escapedValueParts.length > maxStringTokenLength) {
-            escapedValueParts = escapedValueParts.slice(0, maxStringTokenLength);
-            flags |= StringTokenFlags.ExceedsMaxSize;


Are you removing the generation of ExceedsMaxSize? If this is no longer a limitation, let's completely remove ExceedsMaxSize from the StringTokenFlags enumeration along with any references to it.

Thanks, will do.

github-actions · 2024-03-11T16:27:00Z

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

Save some memory in token creation

fca09aa

rchiodo requested review from debonte and erictraut March 9, 2024 00:10

rchiodo commented Mar 9, 2024

View reviewed changes

This comment has been minimized.

Sign in to view

rchiodo commented Mar 9, 2024

View reviewed changes

Found another 200k

1ddc1c7

rchiodo commented Mar 9, 2024

View reviewed changes

This comment has been minimized.

Sign in to view

rchiodo added 2 commits March 8, 2024 17:19

Fix wrong way to check for undefined

16dc249

Back out unnecessary regex change

a85949d

This comment has been minimized.

Sign in to view

erictraut approved these changes Mar 9, 2024

View reviewed changes

debonte approved these changes Mar 9, 2024

View reviewed changes

rchiodo and others added 2 commits March 11, 2024 09:16

Remove 'ExceedsMaxSize'

ccda816

Merge branch 'main' into rchiodo/escaped_value_memory

6d664bb

rchiodo merged commit 06bc912 into microsoft:main Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save some memory in token creation #7434

Save some memory in token creation #7434

Uh oh!

rchiodo commented Mar 9, 2024

Uh oh!

rchiodo Mar 9, 2024

Uh oh!

rchiodo Mar 9, 2024

Uh oh!

This comment has been minimized.

rchiodo Mar 9, 2024

Uh oh!

rchiodo Mar 9, 2024

Uh oh!

This comment has been minimized.

This comment has been minimized.

erictraut left a comment

Uh oh!

erictraut Mar 9, 2024

Uh oh!

rchiodo Mar 11, 2024

Uh oh!

github-actions bot commented Mar 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Save some memory in token creation #7434

Save some memory in token creation #7434

Uh oh!

Conversation

rchiodo commented Mar 9, 2024

Uh oh!

rchiodo Mar 9, 2024

Choose a reason for hiding this comment

Uh oh!

rchiodo Mar 9, 2024

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

rchiodo Mar 9, 2024

Choose a reason for hiding this comment

Uh oh!

rchiodo Mar 9, 2024

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

erictraut left a comment

Choose a reason for hiding this comment

Uh oh!

erictraut Mar 9, 2024

Choose a reason for hiding this comment

Uh oh!

rchiodo Mar 11, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants