Newest 'dask' Questions - Stack Overflow

1 vote

0 answers

40 views

Modin + Dask distributed: AttributeError: type object 'ABCMeta' has no attribute 'deploy_axis_func'

I'm trying to use Modin with a Dask LocalCluster to parallelize pandas DataFrame operations in a Django application (Python 3.13). Even with processes=False (thread-based workers, same process), the ...

Atul Jaiswal

11

asked Apr 17 at 20:28

0 votes

0 answers

77 views

Sentence Transformer Stuck at Loading (Google Cloud Instance)

I use this code to load sentence transformer in a GCP VM instance (no GPU). This is a dask plugin used on dask worker.: class NLPSetup(WorkerPlugin): def __init__(self, bucket_uri): self....

cuneyttyler

1,414

asked Feb 28 at 11:04

1 vote

1 answer

47 views

How can I get dask to schedule these tasks on different specialized workers?

I have a flow based on a dictionary of tasks to their dependent tasks. I loop through each task which has already had all dependencies submitted and submit them, which eventually exhausts the tasks. ...

Andy III

31

asked Feb 12 at 13:34

Best practices

0 votes

1 replies

18 views

scatter(x, broadcast=True) vs replicate(x)

I am trying to understand the difference in Dask between scatter(x, broadcast=True) and replicate(x). Both seem to provide a way to ensure copies of data is available in all nodes. Are they actually ...

Frames Catherine White

28.5k

asked Jan 23 at 7:57

0 votes

1 answer

74 views

How should I be using Dask .compute() to perform relatively simple operations

I'm trying to use Dask to do some relatively simple computations and operations that I was doing with Pandas but on a larger dataset. I have approximately 1500 .csv files that range in size from 1KB ...

Chris Bennett

33

asked Jan 8 at 18:13

3 votes

1 answer

91 views

Dask client connects successfully but no workers are available [closed]

I am using Dask for some processing. The client starts successfully, but I am seeing zero workers. This is how I am creating the client: client = Client("tls://localhost:xxxx") This is the ...

martian muonhunter

31

asked Dec 25, 2025 at 11:20

0 votes

0 answers

61 views

TokenizationError when loading h5py dataset as dask dataframe

My goal is to process (sklearn Pipeline) a large HDF file that doesn't fit into RAM. The core data is an irregular multivariate time-series (a very long 2D array). It could be split columnwise to fit ...

Axel

1

asked Dec 4, 2025 at 10:37

3 votes

1 answer

75 views

task works on local, but errors on Dask cluster: "SystemError: error return without exception set"

I have the following codes that pass an array to the task and submit to Dask cluster. The Dask cluster is running in Docker with several Dask workers. Docker starts with： scheduler： docker run -d \ -...

eric feng

31

asked Nov 26, 2025 at 7:28

3 votes

0 answers

91 views

How to optimize NetCDF files and dask for processing long-term climataological indices with xclim (ex. SPI using 30-day rolling window)?

I am trying to analyze the 30 day standardized precipitation index for a multi-state range of the southeastern US for the year 2016. I'm using xclim to process a direct pull of gridded daily ...

helpmeplease

31

asked Nov 17, 2025 at 2:25

0 votes

0 answers

30 views

Introducing new dimension in xarray apply_ufunc

There has been at least one other question regarding the introduction of new dimensions in the output of xarray.apply_ufunc; I have two problems with this answer: First, I feel like the answer avoids ...

derM

13.9k

asked Nov 16, 2025 at 12:19

0 votes

0 answers

56 views

Dask distributed stores old version of my code

I am analysing some data using dask distributed on a SLURM cluster. I am also using jupyter notebook. I am changing my codebase frequently and running jobs. Recently, a lot of my jobs started to crash....

Yatharth

27

asked Nov 9, 2025 at 20:06

2 votes

0 answers

90 views

How to drop rows with a boolean mask in xarray/dask without .compute() blowing up memory?

I’m trying to subset a large xarray.Dataset backed by Dask and save it back to Zarr, but I’m running into a major memory problem when attempting to drop rows with a boolean mask. Here’s a minimal ...

Gary Frewin

585

asked Oct 17, 2025 at 7:41

-1 votes

1 answer

61 views

How to connect to Dask Gateway Server from inside a Docker container?

I have a method that connects my app to a Dask Gateway Server def set_up_dask(dashboard=False, num_workers=4, min_workers=4, max_workers=50): gateway = Gateway("http://127.0.0.1:8000") ...

BallpenMan

205

asked Sep 29, 2025 at 13:37

0 votes

0 answers

60 views

How to properly use joblib files in Dask?

from joblib import load ntrees_16_model = load(r"ntrees_quantile_16_model_watermask.joblib") ntrees_50_model = load(r"ntrees_quantile_50_model_watermask.joblib") ntrees_84_model = ...

Adriano Matos

427

asked Sep 22, 2025 at 18:40

0 votes

0 answers

86 views

Why does XGBoost training (with DMatrix) write heavily to disk instead of using RAM?

I am training an XGBoost model in Python on a dataset with approximately 20k features and 30M records. The features are sparse, and I am using xgboost.DMatrix for training. Problem During training, ...

cool_heisenberg

51

asked Sep 19, 2025 at 14:39

Collectives™ on Stack Overflow

Modin + Dask distributed: AttributeError: type object 'ABCMeta' has no attribute 'deploy_axis_func'

Sentence Transformer Stuck at Loading (Google Cloud Instance)

How can I get dask to schedule these tasks on different specialized workers?

scatter(x, broadcast=True) vs replicate(x)

How should I be using Dask .compute() to perform relatively simple operations

Dask client connects successfully but no workers are available [closed]

TokenizationError when loading h5py dataset as dask dataframe

task works on local, but errors on Dask cluster: "SystemError: error return without exception set"

How to optimize NetCDF files and dask for processing long-term climataological indices with xclim (ex. SPI using 30-day rolling window)?

Introducing new dimension in xarray apply_ufunc

Dask distributed stores old version of my code

How to drop rows with a boolean mask in xarray/dask without .compute() blowing up memory?

How to connect to Dask Gateway Server from inside a Docker container?

How to properly use joblib files in Dask?

Why does XGBoost training (with DMatrix) write heavily to disk instead of using RAM?

Hot Network Questions