Newest 'dask' Questions - Stack Overflow

0 votes

0 answers

41 views

TokenizationError when loading h5py dataset as dask dataframe

My goal is to process (sklearn Pipeline) a large HDF file that doesn't fit into RAM. The core data is an irregular multivariate time-series (a very long 2D array). It could be split columnwise to fit ...

Axel

1

asked Dec 4 at 10:37

2 votes

0 answers

59 views

task works on local, but errors on Dask cluster: "SystemError: error return without exception set"

I have the following codes that pass an array to the task and submit to Dask cluster. The Dask cluster is running in Docker with several Dask workers. Docker starts with： scheduler： docker run -d \ -...

eric feng

21

asked Nov 26 at 7:28

2 votes

0 answers

67 views

How to optimize NetCDF files and dask for processing long-term climataological indices with xclim (ex. SPI using 30-day rolling window)?

I am trying to analyze the 30 day standardized precipitation index for a multi-state range of the southeastern US for the year 2016. I'm using xclim to process a direct pull of gridded daily ...

helpmeplease

21

asked Nov 17 at 2:25

0 votes

0 answers

25 views

Introducing new dimension in xarray apply_ufunc

There has been at least one other question regarding the introduction of new dimensions in the output of xarray.apply_ufunc; I have two problems with this answer: First, I feel like the answer avoids ...

derM

13.8k

asked Nov 16 at 12:19

0 votes

0 answers

43 views

Dask distributed stores old version of my code

I am analysing some data using dask distributed on a SLURM cluster. I am also using jupyter notebook. I am changing my codebase frequently and running jobs. Recently, a lot of my jobs started to crash....

Yatharth

27

asked Nov 9 at 20:06

3 votes

0 answers

84 views

How to drop rows with a boolean mask in xarray/dask without .compute() blowing up memory?

I’m trying to subset a large xarray.Dataset backed by Dask and save it back to Zarr, but I’m running into a major memory problem when attempting to drop rows with a boolean mask. Here’s a minimal ...

Gary Frewin

595

asked Oct 17 at 7:41

0 votes

1 answer

54 views

How to connect to Dask Gateway Server from inside a Docker container?

I have a method that connects my app to a Dask Gateway Server def set_up_dask(dashboard=False, num_workers=4, min_workers=4, max_workers=50): gateway = Gateway("http://127.0.0.1:8000") ...

BallpenMan

225

asked Sep 29 at 13:37

0 votes

0 answers

49 views

How to properly use joblib files in Dask?

from joblib import load ntrees_16_model = load(r"ntrees_quantile_16_model_watermask.joblib") ntrees_50_model = load(r"ntrees_quantile_50_model_watermask.joblib") ntrees_84_model = ...

Adriano Matos

407

asked Sep 22 at 18:40

0 votes

0 answers

66 views

Why does XGBoost training (with DMatrix) write heavily to disk instead of using RAM?

I am training an XGBoost model in Python on a dataset with approximately 20k features and 30M records. The features are sparse, and I am using xgboost.DMatrix for training. Problem During training, ...

cool_heisenberg

51

asked Sep 19 at 14:39

0 votes

2 answers

69 views

Issues getting PyCaret/Fugue to work with DASK Backend

I am trying to use PyCaret with Fugue for a DASK backend and I'm running into an issue. Using the following: pycaret 3.3.2 fugue 0.9.1 dask ...

yojimbo

11

asked Sep 11 at 13:03

1 vote

1 answer

80 views

How to reduce xarray.coarsen with majority vote?

I'm currently trying to resample a large geotiff file to a coarser resolution. This file contains classes of tree species (indicated by integer values) at each pixel, so I want to resample each block (...

dtm34

13

asked Sep 4 at 10:44

0 votes

1 answer

66 views

High RAM usage when using Datashader with dasked xarray

I have a dasked xarray which is about 150k x 90k with chunk size of 8192 x 8192. I am working on a Window virtual machine which has 100gb RAM and 16 cores. I want to plot it using the Datashader ...

Nanoputian

105

asked Sep 2 at 1:45

0 votes

0 answers

30 views

Is it possible to use dask distributed to pandas with apply working with multiprocessing?

I need advice from you. Right now i do some computation with pandas library. Program is using multiprocessing and df.apply. The simple example showing my idea is here: import multiprocessing import ...

luki

309

asked Aug 5 at 20:03

0 votes

0 answers

52 views

Combing two .nc files with different dimensions using Icechunk, Virtualizarr, and Xarray

My overall goal is the set up a virtual dataset of ERA5 data using Icechunk. As a smaller test example, I'm trying to pull all the data located in the 194001 ERA5 folder. I've been mostly able to ...

Kieran Bartels

1

asked Jul 16 at 18:36

0 votes

1 answer

70 views

Dask large outer join with gzip files

I'm working with an omics dataset (1000+ files) which is a folder of about ~1GB of .txt.gz files which are tab separated. They each look roughly like this for a patient ABC: pos ABC_count1 ABC_count2 ...

AnthonyML

77

asked Jul 11 at 2:07

Collectives™ on Stack Overflow

TokenizationError when loading h5py dataset as dask dataframe

task works on local, but errors on Dask cluster: "SystemError: error return without exception set"

How to optimize NetCDF files and dask for processing long-term climataological indices with xclim (ex. SPI using 30-day rolling window)?

Introducing new dimension in xarray apply_ufunc

Dask distributed stores old version of my code

How to drop rows with a boolean mask in xarray/dask without .compute() blowing up memory?

How to connect to Dask Gateway Server from inside a Docker container?

How to properly use joblib files in Dask?

Why does XGBoost training (with DMatrix) write heavily to disk instead of using RAM?

Issues getting PyCaret/Fugue to work with DASK Backend

How to reduce xarray.coarsen with majority vote?

High RAM usage when using Datashader with dasked xarray

Is it possible to use dask distributed to pandas with apply working with multiprocessing?

Combing two .nc files with different dimensions using Icechunk, Virtualizarr, and Xarray

Dask large outer join with gzip files

Hot Network Questions