Chris (@criccomini) / X

Chris

13.6K posts

Chris

@criccomini

37% context left

Sunnyvale, CA

Joined April 2009

Pinned
Chris
@criccomini
Apr 27
Spent the past week on SlateDB's DST harness. It was a bit of a slog. The more state I explored, the more false positives I encountered.
Deterministic Simulation Testing Is Really Hard
From rng.md
15K
Chris
@criccomini
Mar 16, 2024
I got a chance to sit in on some @ycombinator pitches this week. A few thoughts: 1⃣ I have AI fatigue--SO MUCH. Very little of it is deep tech; mostly applying OpenAI FM to stuff. Investors in this space: I have no idea how you do this. I feel like there's a lot of $ to be lost.
488K
Chris
@criccomini
May 20, 2019
Successful intern projects: 1. High value if completed. 2. Low risk if not completed. 3. Able to finish in allotted time (2-3 months). 4. Exciting to work on and talk about. Anything else I'm missing?
Chris
@criccomini
Feb 20, 2023
Embedded DBs are having a renaissance. RDBMS: SQLite OLAP: DuckDB Graph: KuzuDB Search: Chroma The developer experience is so good on these. Things just work. Really cool to see.
79K
Chris
@criccomini
Dec 5, 2019
My @InfoQ talk 🎙️ on the "Future of Data Engineering" is up! I cover the six stages of data pipeline maturity: 0. None 1. Batch 2. Realtime 3. Integration 4. Automation 5. Decentralization Check it out! 👀 (I'm so sorry for the link picture)
Future of Data Engineering
From infoq.com
Chris
@criccomini
Aug 14, 2024
It's out! I've been working with @paulgb, @vigneshc, the team @responsive_apps, and others to put together an LSM storage engine built on object storage. Contributors, users, and feedback would all be great!
GitHub - slatedb/slatedb: A cloud native embedded storage engine built on object storage.
From github.com
23K
Chris
@criccomini
Oct 25, 2023
Some interesting infra projects: WarpStream Turbopuffer LanceDB Neon AWS Neptune TigerBeetle Modal Materialize Tabular (Iceberg) DuckDB/Motherduck Arrow Data Fusion/Substrate gvisor KIP-932 (Kafka) VeniceDB Bauplan Buf schema registry Apicurio
103K
Chris
@criccomini
Feb 6, 2024
TIL about Apache DafaFusion Comet. @Apple has replaced @ApacheSpark's guts with @ApacheArrow DataFusion. And they're donating it. 🤯 github.com/apache/arrow-d… This is an alternative to @MetaOpenSource's Velox Spark implementation. facebookincubator.github.io/velox/spark_fu… /ht @philippemnoel
43K
Chris
@criccomini
Aug 20, 2021
Replying to @sethrosen
“Reddit’s database has two tables” “Instead, they keep a Thing Table and a Data Table. Everything in Reddit is a Thing: users, links, comments, subreddits, awards, etc. Things keep common attribute like up/down votes, a type, and creation date” 🥴 kevin.burke.dev/kevin/reddits-…
Chris
@criccomini
Nov 26, 2023
This is the future. Kafka writing Parquet to S3 (via tiered storage). Instant data lake.
Gunnar Morling 🌍
@gunnarmorling
Nov 26, 2023
"KIP-1008: ParKa - the Marriage of Parquet and Kafka" That's an interesting proposal: writing #Kafka segments as #Parquet files. Can see the appeal for data lake ingest; wondering though how well the columnar file structure plays with Kafka semantics 🤔. cwiki.apache.org/confluence/dis…
53K
Chris
@criccomini
Sep 18, 2024
Uber's actually doing the thing. uber.com/blog/datamesh If they keep going, this could be a first-class reference architecture.
16K
Chris
@criccomini
Jan 23, 2023
DBs are getting totally ripped apart right now and I love it. Query engines (trino, duck), storage (s3, gcs), and indexing (iceberg, hudi) all separate.
Gunnar Morling 🌍
@gunnarmorling
Jan 23, 2023
"Querying SQLite databases with DuckDB" Enjoyed watching this fast-paced video by @markhneedham demoing how to use #DuckDB's query engine to run analytics queries against data in a #SQLite file. 5:50 well spent 🦆! youtube.com/watch?v=ogge3k…
58K
Chris
@criccomini
Jan 5, 2023
I'm open sourcing Recap, a dead simple data catalog for engineers! Unlike traditional catalogs, Recap is built to power infrastructure and tools that need metadata. Read the docs: docs.recap.cloud Or dive straight into the Github repo: github.com/recap-cloud/re…
47K
Chris
@criccomini
Aug 28, 2024
Big news: I'm helping with @martinkl with a second edition of Designing Data-Intensive Applications! An early release of the first 3 chapters is now available (O'Reilly Learning subscribers only at this point) and we're hoping to finish it next year.
Designing Data-Intensive Applications, 2nd Edition
From oreilly.com
8.2K