Skip to content

Performance issue with 0.16.1 #1649

@jenky

Description

@jenky

Upgrade from 0.15.1

I have a CLI command that finishes in less than 1 second. After the upgrade, the command freezes without any changes to the source code. I'm not sure what's happening under the hood, as there are no error logs or anything similar.

Basically, it reads the parquet files from S3, joins them with another data frame (loaded from the database), performs some aggregations (groupBy(), sum(), rank()), renames the columns, and then calls collect() and write(to_output())

One thing I've tried to debug is that the command uses collect() to display the output and it might lead to memory issues. Once it is removed, the command completes but takes about 12 seconds and double the memory usage compare the 0.15.1 with collect()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions