Skip to content

Conversation

@rchiodo
Copy link
Collaborator

@rchiodo rchiodo commented Mar 9, 2024

While investigating this issue:
microsoft/pylance-release#5440

Erik and I noticed that the escapedValue for a bunch of different tokens was not actually allocated from the original string. It was an entirely new string, meaning a new allocation.

It seems that it doesn't need to be that way. It can just point to the original string.

I verified this uses less memory by snapshotting a run of pylance with a file with a lot of string literals in it.

In the original version, we end up with a parse tree like so:

image

You can see that it's allocated 18.9 MB for the tokens themselves.

If we expand into these tokens, you can see one of the largest ones has its own copy of the giant literal string at the top of the file:

image

That token has 32K allocated to hold that large literal string.

After this change, the amount of memory for the same run goes down by 800K:

image

And if we find that same token with the escaped value, it's now a sliced value, which is really just a reference to the original string:

image

@rchiodo rchiodo requested review from debonte and erictraut March 9, 2024 00:10
let escapedString = node.token.escapedValue;
if ((flags & PrintExpressionFlags.DoNotLimitStringLength) === 0) {
const maxStringLength = 32;
escapedString = escapedString.substring(0, maxStringLength);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These slices aren't really necessary, it's just that substring has been deprecated. I figured I'd switch on the files I was looking at. I can put these back if we want to not bother with these changes.

let escapedValueParts: number[] = [];
const start = this._cs.position;
let escapedValueLength = 0;
const getEscapedValue = () => this._cs.getText().slice(start, start + escapedValueLength);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the real change happens. Instead of building the escaped value from the the char codes, it can just be built from the literal string itself.

@github-actions

This comment has been minimized.

}
}

// String.fromCharCode.apply crashes (stack overflow) if passed an array
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also unnecessary now since we're just slicing into the original string.

let curChar = getEscapedCharacter();
if (curChar === Char.EndOfText) {
return completeUnescapedString(output);
return completeUnescapedString(output, escapedString);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this one too. Netted another 200K in memory savings with the test app I'm using

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Collaborator

@erictraut erictraut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A relatively simple change that results in pretty significant memory savings. Love it!

I left one small comment. Looks like we can remove some additional code.

// a string literal or a docstring, so this should be fine.
if (escapedValueParts.length > maxStringTokenLength) {
escapedValueParts = escapedValueParts.slice(0, maxStringTokenLength);
flags |= StringTokenFlags.ExceedsMaxSize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you removing the generation of ExceedsMaxSize? If this is no longer a limitation, let's completely remove ExceedsMaxSize from the StringTokenFlags enumeration along with any references to it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will do.

@github-actions
Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@rchiodo rchiodo merged commit 06bc912 into microsoft:main Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants