Skip to main content
Filter by
Sorted by
Tagged with
1 vote
0 answers
40 views

I'm trying to use Modin with a Dask LocalCluster to parallelize pandas DataFrame operations in a Django application (Python 3.13). Even with processes=False (thread-based workers, same process), the ...
Atul Jaiswal's user avatar
0 votes
0 answers
77 views

I use this code to load sentence transformer in a GCP VM instance (no GPU). This is a dask plugin used on dask worker.: class NLPSetup(WorkerPlugin): def __init__(self, bucket_uri): self....
cuneyttyler's user avatar
  • 1,414
1 vote
1 answer
47 views

I have a flow based on a dictionary of tasks to their dependent tasks. I loop through each task which has already had all dependencies submitted and submit them, which eventually exhausts the tasks. ...
Andy III's user avatar
Best practices
0 votes
1 replies
18 views

I am trying to understand the difference in Dask between scatter(x, broadcast=True) and replicate(x). Both seem to provide a way to ensure copies of data is available in all nodes. Are they actually ...
Frames Catherine White's user avatar
0 votes
1 answer
74 views

I'm trying to use Dask to do some relatively simple computations and operations that I was doing with Pandas but on a larger dataset. I have approximately 1500 .csv files that range in size from 1KB ...
Chris Bennett's user avatar
3 votes
1 answer
91 views

I am using Dask for some processing. The client starts successfully, but I am seeing zero workers. This is how I am creating the client: client = Client("tls://localhost:xxxx") This is the ...
martian muonhunter's user avatar
0 votes
0 answers
61 views

My goal is to process (sklearn Pipeline) a large HDF file that doesn't fit into RAM. The core data is an irregular multivariate time-series (a very long 2D array). It could be split columnwise to fit ...
Axel's user avatar
  • 1
3 votes
1 answer
75 views

I have the following codes that pass an array to the task and submit to Dask cluster. The Dask cluster is running in Docker with several Dask workers. Docker starts with: scheduler: docker run -d \ -...
eric feng's user avatar
3 votes
0 answers
91 views

I am trying to analyze the 30 day standardized precipitation index for a multi-state range of the southeastern US for the year 2016. I'm using xclim to process a direct pull of gridded daily ...
helpmeplease's user avatar
0 votes
0 answers
30 views

There has been at least one other question regarding the introduction of new dimensions in the output of xarray.apply_ufunc; I have two problems with this answer: First, I feel like the answer avoids ...
derM's user avatar
  • 13.9k
0 votes
0 answers
56 views

I am analysing some data using dask distributed on a SLURM cluster. I am also using jupyter notebook. I am changing my codebase frequently and running jobs. Recently, a lot of my jobs started to crash....
Yatharth's user avatar
2 votes
0 answers
90 views

I’m trying to subset a large xarray.Dataset backed by Dask and save it back to Zarr, but I’m running into a major memory problem when attempting to drop rows with a boolean mask. Here’s a minimal ...
Gary Frewin's user avatar
-1 votes
1 answer
61 views

I have a method that connects my app to a Dask Gateway Server def set_up_dask(dashboard=False, num_workers=4, min_workers=4, max_workers=50): gateway = Gateway("http://127.0.0.1:8000") ...
BallpenMan's user avatar
0 votes
0 answers
60 views

from joblib import load ntrees_16_model = load(r"ntrees_quantile_16_model_watermask.joblib") ntrees_50_model = load(r"ntrees_quantile_50_model_watermask.joblib") ntrees_84_model = ...
Adriano Matos's user avatar
0 votes
0 answers
86 views

I am training an XGBoost model in Python on a dataset with approximately 20k features and 30M records. The features are sparse, and I am using xgboost.DMatrix for training. Problem During training, ...
cool_heisenberg's user avatar

15 30 50 per page
1
2 3 4 5
309