Skip to main content
Filter by
Sorted by
Tagged with
Advice
0 votes
4 replies
48 views

I want to connect to a Snowflake database from the Data Bricks notebook. I have an RSA key(.pem file) and I don't want to use a traditional method like username and password as it is not as secure as ...
Prafulla's user avatar
0 votes
0 answers
49 views

I'm using Databricks SQL and have SQL UDFs for GeoIP / ISP lookups. Each UDF branches on IPv4 vs IPv6 using a CASE expression like: CASE WHEN ip_address LIKE '%:%:%' THEN -- IPv6 path ... ...
YJCMS's user avatar
  • 1
-1 votes
0 answers
47 views

I am building a data profiling tool to iterate through all tables in our data warehouse (a mix of Hive and MySQL tables) to identify and extract all possible values for "Enum-like" columns. ...
Unnamed's user avatar
1 vote
0 answers
71 views

Why do I get multiple warnings WARN delta_kernel::engine::default::json] read_json receiver end of channel dropped before sending completed when scanning (pl.scan_delta(temp_path) a delta table that ...
gaut's user avatar
  • 6,048
1 vote
1 answer
24 views

I have a class that extends SparkListener and has access to SparkContext. I'm wondering if there is any way to check in onApplicationEnd whether the Spark application stopped because of an error or ...
tnazarew's user avatar
0 votes
1 answer
44 views

When using the Delta format, it is possible to time-travel to a specific version of the table. In my case, some of these versions are corrupted. I would like to delete/remove/drop them. For instance, ...
pgrandjean's user avatar
0 votes
0 answers
28 views

I am working on a custom materialization in dbt using the dbt-spark adapter (writing to Delta tables on S3). The goal is to handle a hybrid SCD Type 1 and Type 2 strategy. The Logic I compare the ...
HoanggLB2k2's user avatar
2 votes
0 answers
43 views

I have the following setup: Kubernetes cluster with Spark Connect 4.0.1 and MLflow tracking server 3.5.0 MLFlow tracking server should serve all artifacts and is configured this way: --backend-store-...
hage's user avatar
  • 6,213
0 votes
1 answer
57 views

I have a spark job that runs daily to load data from S3. These data are composed of thousands of gzip files. However, in some cases, there is one or two corrupted files in S3, and it causes the whole ...
Nakeuh's user avatar
  • 1,933
-1 votes
2 answers
47 views

In Azure VM, I have installed standalone Spark 4.0. On the same VM I have Python 3.11 with Jupyter deployed. In my notebook I submitted the following program: from pyspark.sql import SparkSession ...
Ziggy's user avatar
  • 43
1 vote
1 answer
114 views

I am very new in Spark (specifically, have just started with learning), and I have encountered a recursion error in a very simple code. Background: Spark Version 3.5.7 Java Version 11.0.29 (Eclipse ...
GINzzZ100's user avatar
-1 votes
0 answers
373 views

I'm working on a large-scale ETL pipeline processing ~500GB daily across multiple data sources. We're currently using Great Expectations for data quality validation, but facing performance bottlenecks ...
Vijay Savaliya's user avatar
2 votes
1 answer
187 views

I’m trying to create a Delta Lake table in MinIO using Spark 4.0.0 inside a Docker container. I’ve added the required JARs: delta-spark_2.13-4.0.0.jar delta-storage-4.0.0.jar hadoop-aws-3.3.6.jar aws-...
Tutu ツ's user avatar
  • 155
0 votes
0 answers
27 views

Long story short, my team was hired to take on some legacy code and it was running around 5ish hours. We began making some minor changes that shouldn't have affected the runtimes in any significant ...
Ben Fuqua's user avatar
3 votes
1 answer
102 views

I’m experiencing data loss when writing a large DataFrame to Redis using the Spark-Redis connector. Details: I have a DataFrame with millions of rows. Writing to Redis works correctly for small ...
gianfranco de siena's user avatar

15 30 50 per page
1
2 3 4 5
5510