Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
41 views

My goal is to process (sklearn Pipeline) a large HDF file that doesn't fit into RAM. The core data is an irregular multivariate time-series (a very long 2D array). It could be split columnwise to fit ...
Axel's user avatar
  • 1
2 votes
0 answers
59 views

I have the following codes that pass an array to the task and submit to Dask cluster. The Dask cluster is running in Docker with several Dask workers. Docker starts with: scheduler: docker run -d \ -...
eric feng's user avatar
2 votes
0 answers
67 views

I am trying to analyze the 30 day standardized precipitation index for a multi-state range of the southeastern US for the year 2016. I'm using xclim to process a direct pull of gridded daily ...
helpmeplease's user avatar
0 votes
0 answers
25 views

There has been at least one other question regarding the introduction of new dimensions in the output of xarray.apply_ufunc; I have two problems with this answer: First, I feel like the answer avoids ...
derM's user avatar
  • 13.8k
0 votes
0 answers
43 views

I am analysing some data using dask distributed on a SLURM cluster. I am also using jupyter notebook. I am changing my codebase frequently and running jobs. Recently, a lot of my jobs started to crash....
Yatharth's user avatar
3 votes
0 answers
84 views

I’m trying to subset a large xarray.Dataset backed by Dask and save it back to Zarr, but I’m running into a major memory problem when attempting to drop rows with a boolean mask. Here’s a minimal ...
Gary Frewin's user avatar
0 votes
1 answer
54 views

I have a method that connects my app to a Dask Gateway Server def set_up_dask(dashboard=False, num_workers=4, min_workers=4, max_workers=50): gateway = Gateway("http://127.0.0.1:8000") ...
BallpenMan's user avatar
0 votes
0 answers
49 views

from joblib import load ntrees_16_model = load(r"ntrees_quantile_16_model_watermask.joblib") ntrees_50_model = load(r"ntrees_quantile_50_model_watermask.joblib") ntrees_84_model = ...
Adriano Matos's user avatar
0 votes
0 answers
66 views

I am training an XGBoost model in Python on a dataset with approximately 20k features and 30M records. The features are sparse, and I am using xgboost.DMatrix for training. Problem During training, ...
cool_heisenberg's user avatar
0 votes
2 answers
69 views

I am trying to use PyCaret with Fugue for a DASK backend and I'm running into an issue. Using the following: pycaret 3.3.2 fugue 0.9.1 dask ...
yojimbo's user avatar
  • 11
1 vote
1 answer
80 views

I'm currently trying to resample a large geotiff file to a coarser resolution. This file contains classes of tree species (indicated by integer values) at each pixel, so I want to resample each block (...
dtm34's user avatar
  • 13
0 votes
1 answer
66 views

I have a dasked xarray which is about 150k x 90k with chunk size of 8192 x 8192. I am working on a Window virtual machine which has 100gb RAM and 16 cores. I want to plot it using the Datashader ...
Nanoputian's user avatar
0 votes
0 answers
30 views

I need advice from you. Right now i do some computation with pandas library. Program is using multiprocessing and df.apply. The simple example showing my idea is here: import multiprocessing import ...
luki's user avatar
  • 309
0 votes
0 answers
52 views

My overall goal is the set up a virtual dataset of ERA5 data using Icechunk. As a smaller test example, I'm trying to pull all the data located in the 194001 ERA5 folder. I've been mostly able to ...
Kieran Bartels's user avatar
0 votes
1 answer
70 views

I'm working with an omics dataset (1000+ files) which is a folder of about ~1GB of .txt.gz files which are tab separated. They each look roughly like this for a patient ABC: pos ABC_count1 ABC_count2 ...
AnthonyML's user avatar

15 30 50 per page
1
2 3 4 5
309