-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Save some memory in token creation #7434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| let escapedString = node.token.escapedValue; | ||
| if ((flags & PrintExpressionFlags.DoNotLimitStringLength) === 0) { | ||
| const maxStringLength = 32; | ||
| escapedString = escapedString.substring(0, maxStringLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These slices aren't really necessary, it's just that substring has been deprecated. I figured I'd switch on the files I was looking at. I can put these back if we want to not bother with these changes.
| let escapedValueParts: number[] = []; | ||
| const start = this._cs.position; | ||
| let escapedValueLength = 0; | ||
| const getEscapedValue = () => this._cs.getText().slice(start, start + escapedValueLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the real change happens. Instead of building the escaped value from the the char codes, it can just be built from the literal string itself.
This comment has been minimized.
This comment has been minimized.
| } | ||
| } | ||
|
|
||
| // String.fromCharCode.apply crashes (stack overflow) if passed an array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also unnecessary now since we're just slicing into the original string.
| let curChar = getEscapedCharacter(); | ||
| if (curChar === Char.EndOfText) { | ||
| return completeUnescapedString(output); | ||
| return completeUnescapedString(output, escapedString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found this one too. Netted another 200K in memory savings with the test app I'm using
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
erictraut
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A relatively simple change that results in pretty significant memory savings. Love it!
I left one small comment. Looks like we can remove some additional code.
| // a string literal or a docstring, so this should be fine. | ||
| if (escapedValueParts.length > maxStringTokenLength) { | ||
| escapedValueParts = escapedValueParts.slice(0, maxStringTokenLength); | ||
| flags |= StringTokenFlags.ExceedsMaxSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you removing the generation of ExceedsMaxSize? If this is no longer a limitation, let's completely remove ExceedsMaxSize from the StringTokenFlags enumeration along with any references to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will do.
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
While investigating this issue:
microsoft/pylance-release#5440
Erik and I noticed that the
escapedValuefor a bunch of different tokens was not actually allocated from the original string. It was an entirely new string, meaning a new allocation.It seems that it doesn't need to be that way. It can just point to the original string.
I verified this uses less memory by snapshotting a run of pylance with a file with a lot of string literals in it.
In the original version, we end up with a parse tree like so:
You can see that it's allocated 18.9 MB for the tokens themselves.
If we expand into these tokens, you can see one of the largest ones has its own copy of the giant literal string at the top of the file:
That token has 32K allocated to hold that large literal string.
After this change, the amount of memory for the same run goes down by 800K:
And if we find that same token with the escaped value, it's now a
slicedvalue, which is really just a reference to the original string: