Skip to main content
rmoff u/rmoff avatar

rmoff

u/rmoff

Feed options
Hot
New
Top
View
Card
Compact

What is Write-Audit-Publish - and why is it useful for data engineers?
r/dataengineering icon
r/dataengineering
A banner for the subreddit

News & discussion on Data Engineering topics, including but not limited to: data pipelines, databases, data formats, storage, data modeling, data governance, cleansing, NoSQL, distributed systems, streaming, batch, Big Data, and workflow engines.


Weekly visitors Weekly contributions
What is Write-Audit-Publish - and why is it useful for data engineers?
Blog

Write-Audit-Publish (WAP) is a pattern in data engineering that gives teams greater control over data quality. It was popularized by Netflix back in 2017 in a talk by Michelle Winters at the DataWorks Summit called “Whoops the Numbers are wrong! Scaling Data Quality @ Netflix.”

The name is fairly self-descriptive:

What is WAP not (in this context)?

Why is WAP useful for data engineers?

The data engineering world has always lagged behind its software engineering brethren.

Concepts like source control were well established in software engineering for a decade or more before data engineers realised that there might just be something in the idea of not emailing around files called DIM_DATE_V1_FINAL_REVISED_v2_PROD.sql. (In fairness, it took a shift away from the old mindset of the established vendors too, in parallel with the emergence of the modern data stack for things to really click).

Write-Audit-Publish is very similar—or perhaps the same, if you squint—to the idea of Blue-Green deployments in the software engineering world. As data engineers we can learn a lot from established and proven patterns, and the Blue-Green one is a good example of this.

Why wouldn’t we want to adopt this, perhaps other than inertia and fear of something new? WAP is a perfect fit for both regular data pipelines as well as one-off data processing jobs.

Wanna Read More?

👉🏻Check out the full article that I wrote: Data Engineering Patterns: Write-Audit-Publish (WAP)

---

Full disclosure: I work for Treeverse, the company behind lakeFS.



🤖Building a Telegram bot with Apache Kafka, Go, and ksqlDB [SLIDES/CODE/RECORDING]
r/golang icon
r/golang
A banner for the subreddit

Ask questions and post articles about the Go programming language and related tools, events etc.


Weekly visitors Weekly contributions
🤖Building a Telegram bot with Apache Kafka, Go, and ksqlDB [SLIDES/CODE/RECORDING]
🤖Building a Telegram bot with Apache Kafka, Go, and ksqlDB [SLIDES/CODE/RECORDING]

I had the pleasure of presenting this ✨brand new talk✨ at DataEngBytes conference today, and am delighted to share with you now the slides, code, and recording:

🔗 Building a Telegram bot with Apache Kafka, Go, and ksqlDB

upvotes

Blogged: Building a Telegram Bot Powered by Apache Kafka and ksqlDB
r/java icon
r/java

News, Technical discussions, research papers and assorted things of interest related to the Java programming language NO programming help, NO learning Java related questions, NO installing or downloading Java questions, NO JVM languages - Exclusively Java


Weekly visitors Weekly contributions
Blogged: Building a Telegram Bot Powered by Apache Kafka and ksqlDB

r/apachekafka icon

Talk and share advice about the most popular distributed log, Apache Kafka and its ecosystem. This includes Apache Kafka itself, and compatible implementations of the protocol.


Weekly visitors Weekly contributions
r/apachekafka

Talk and share advice about the most popular distributed log, Apache Kafka and its ecosystem. This includes Apache Kafka itself, and compatible implementations of the protocol.


Weekly visitors Weekly contributions

[Mod notice] Sockpuppets are not welcome on this sub

rmoff
commented

( Anyone is welcome to engage with good intentions - I wrote some notes up here: https://rmoff.net/2026/01/23/interacting-with-developers-on-reddit/ )


[Mod notice] Sockpuppets are not welcome on this sub
r/apachekafka icon
r/apachekafka

Talk and share advice about the most popular distributed log, Apache Kafka and its ecosystem. This includes Apache Kafka itself, and compatible implementations of the protocol.


Weekly visitors Weekly contributions
[Mod notice] Sockpuppets are not welcome on this sub

The mod team have noticed an increase in sockpuppet accounts shilling for certain vendors. This behaviour is not tolerated, and will result in mod action.

If you are a vendor engaging a marketing agency who do this, please ask them to stop.


r/dataengineering icon
A banner for the subreddit

News & discussion on Data Engineering topics, including but not limited to: data pipelines, databases, data formats, storage, data modeling, data governance, cleansing, NoSQL, distributed systems, streaming, batch, Big Data, and workflow engines.


Weekly visitors Weekly contributions
r/dataengineering
A banner for the subreddit

News & discussion on Data Engineering topics, including but not limited to: data pipelines, databases, data formats, storage, data modeling, data governance, cleansing, NoSQL, distributed systems, streaming, batch, Big Data, and workflow engines.


Weekly visitors Weekly contributions

Reading 'Fundamentals of data engineering' has gotten me confused

rmoff
commented

Bear in mind the book is ~4 years old. A lot has changed since then.


Cosplaying as a webdev with Claude Code in January 2026
r/ClaudeCode icon
r/ClaudeCode
A banner for the subreddit

a community where claude code enthusiasts build, share, and solve together.


Weekly visitors Weekly contributions
Cosplaying as a webdev with Claude Code in January 2026
Showcase

Along with half the world, I've been experimenting with what you can do with Claude Code. I've written up some notes + tips&tricks here:

I've also written a bit of a beard-stroking post about how we use LLMs as developers:

Would love feedback on either :)