-
-
Notifications
You must be signed in to change notification settings - Fork 48
Description
Upgrade from 0.15.1
I have a CLI command that finishes in less than 1 second. After the upgrade, the command freezes without any changes to the source code. I'm not sure what's happening under the hood, as there are no error logs or anything similar.
Basically, it reads the parquet files from S3, joins them with another data frame (loaded from the database), performs some aggregations (groupBy(), sum(), rank()), renames the columns, and then calls collect() and write(to_output())
One thing I've tried to debug is that the command uses collect() to display the output and it might lead to memory issues. Once it is removed, the command completes but takes about 12 seconds and double the memory usage compare the 0.15.1 with collect()