Lamin’s cover photo
Lamin

Lamin

Software Development

Munich, Bavaria 597 followers

Open data framework for biology. Context and memory for datasets and models at scale.

About us

Query, trace & validate with a lineage-native lakehouse that supports bio-formats, registries & ontologies. From the creators of Scanpy. Scaling AI for complex biological systems means scaling its iterative cycle: laboratory results inform models, models design the next round of experiments. While leading ML and Comp Bio teams in the industry we experienced firsthand that — beyond data generation itself — the primary bottleneck was closing the feedback loop across heterogeneous modalities at scale. We needed a way to query, trace, and validate complex biological datasets with context and memory to effectively learn from mistakes and train the next generation of models. We founded Lamin in 2022 to build an open-source, zero-lock-in solution to this problem. We started with a "git for R&D data" that enabled traceability & versioning alongside a biological data catalog. Today, LaminDB has evolved into a lineage-native lakehouse: a programmable context and memory layer for biological R&D that scales—just like git—from personal projects to pharma-scale enterprise deployments. We are grateful to work with thousands of scientists across academia, biotech, and global pharma. Together, we build a more reliable foundation for data-driven research at scale. Headquartered in Munich and New York City, we are always looking for people who are excited about our mission — please reach out!

Website
https://lamin.ai
Industry
Software Development
Company size
11-50 employees
Headquarters
Munich, Bavaria
Type
Privately Held
Founded
2022

Locations

Employees at Lamin

Updates

  • Lamin reposted this

    When spatial datasets accumulate across experiments and technologies, managing, querying, and training models on them becomes a major challenge. To address this, we built support for scverse's SpatialData format into LaminDB, enabling cross-dataset queries, dataset validation, and lineage tracking. The main challenge was extending pandera-based schema validation to the complicated structure of SpatialData; Parquet and AnnData are easier! Blog: https://lnkd.in/dV5T5Ht2 Code: https://lnkd.in/dsAGM3BN Led by Lukas Heumos in collaboration with Altana Namsaraeva, Tim Treis, Mark Keller, Wouter-Michiel Vierdag, Luca Marconato, Lea Zimmermann, and Sunny Sun.

    • No alternative text description for this image
  • Lamin reposted this

    Hi friends, I wrote a guest post for Lamin on using the open-source LaminR package in an R workflow with the PBMC 3k dataset. The link is in the comments. What drew me in was a familiar handoff problem: coming back to an analysis, or inheriting one from someone else, and not being fully sure what was done, in what order, or in which environment. I have some "scar tissue" here from my consulting projects over the years. In the post, I walk through a standard single-cell workflow and show what LaminR stores along the way. What I found most interesting was the provenance side of it: being able to inspect the code, report, and environment tied to a data object later. That feels increasingly important as datasets, tools, and teams get more complex (e.g. when some of your "colleagues" are OpenClaw agents). Thanks to Luke Zappia, Robrecht Cannoodt, and Alex Wolf for collaborating on this.

    • No alternative text description for this image
  • We partnered with Jesse Johnson to help us explain the relation between biology’s sparse measurements and the data lakehouse concept. Jesse is both a technical expert — having led data platform work across Google, Verily, Sanofi, Cellarity, and others — and a thoughtful analyst at the interface of biology, AI, and data infrastructure. TLDR: Existing data infrastructure can't make sparse measurements across millions of features queryable. Warehouses are too rigid, data lakes can't be queried, and tabular lakehouses don't understand the formats. Biology needs a data lakehouse with support for bio-formats and registries. The link to the post is in the comments!

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • Lamin reposted this

    Two years ago we partnered with Mark Keller from Nils Gehlenborg’s Lab at Harvard to make Vitessce work seamlessly with LaminDB for interactive visualization of multimodal + spatial datasets. The integration has found much use across academia, biotech, and pharma — so we wrote up on design principles & use cases. This was a team effort involving Altana Namsaraeva, Chaichontat Sriworarat & Sunny Sun in addition to Mark. Read the post: https://lnkd.in/dBQr3z4E

    • No alternative text description for this image
  • Lamin reposted this

    What should the shared memory layer for agents and humans look like? Will it live in embeddings or in records? A high-level note. The shared memory layer. I started working on improving the interface of machine learning and data infra for scRNA-seq in 2016 with scanpy & anndata. The goal was to reduce the cognitive load & time scientists spend on wrangling data. After moving to lead computational teams at Cellarity, working closely with wetlab and automation, it became clear that the biggest gains were in how humans and models learn and improve together. I felt it was clear that this would best be achieved through a "shared memory layer", a query layer that would allow teams to close larger-scale feedback loops. Now we have these very useful AI agents -- which behave like a mix of junior colleagues, general experts, and small models -- and I wonder whether this changed that premise. For a moment, I worried the premise had changed and I’d been working on the wrong problem. But in the next moment it was clear that it didn't because this new abundance of this peculiar intelligent agency is completely lost if it's not exposed to the right context. On top of that, it suffers from short context windows & context rot. Similar to how junior colleagues benefit from supervision with rich & relevant context that lets them find the right way to think about a problem, agents perform remarkable work if they have context. And they produce unusable slop if they don't. Good memory for agents. I don't think anyone debates that this is the key bottleneck of current AI systems, so I want to spend most of this note on what an optimal "shared memory layer" for agents and humans could look like. I think that's particularly relevant from the angle that a good part of the magic of agents is -- unlike the merely intelligent compute machine that is an LLM -- their ability to autonomously retrieve context if it's presented to them in the right way. Agents can solve their own biggest bottleneck if "context engineering" is done right. In this note I want to refer to the "layer" that achieves this as the "shared memory layer". This complements the procedural side of context engineering. The past year revealed that for now the "right way to present context to agents" seems to be files in storage paired with established API-based systems of record. It doesn't seem to be tensors, vector databases, or RAG systems. I found that interesting because I was never in the "symbolic camp" when it came to developing machine learning models. So I asked Gemini 3 to clear this up: Can you disentangle why the symbolic representations in markdown notes & standard SQL databases, i.e., words and tables representing related concepts, are preferable for persisting thoughts whereas we use tensors for modeling/enabling thoughts in the first place? Here is what it responded: https://lnkd.in/druiUApg

Similar pages

Browse jobs

Funding