ProteinQure - Medium

Peptide binders for all

Lucas Siow — Wed, 13 Dec 2023 22:00:46 GMT

Meet PQ Librarian: AI-driven Library Scoring

Today, we’re thrilled to introduce a game-changer in the world of peptide discovery — PQ Librarian. This tool is the result of our experience working for years at the intersection of AI and display screening research. ProteinQure invites you to join the revolution and make data-driven decisions in drug discovery a reality.

Want to try it for yourself? Visit: https://librarian.proteinqure.com/

How to find the right peptide library for your target: A User’s Guide

Ready to dive into the world of accelerated peptide drug discovery? PQ Librarian is now at your fingertips with a sleek online interface, ready to make early-stage drug discovery a whole lot more efficient,

All it takes is a human gene name, UniProt ID, or full name of a protein, and you’re set. We’ve pre-computed the viability of commonly available libraries for the entire human proteome!
The result? Color-coded scores for various peptide libraries against your chosen target. We output color-coded scores for multiple peptide libraries against this target. Green for high hit potential, yellow for caution, and red for a low chance of success — simple, right?

Fig 1. Screenshot of using PQ Librarian for three different protein targets (KRAS, SCN9A and BCL2A1)

Let’s test it out with “KRAS”. Predictions range from the optimistic green for our proprietary libraries to the cautious yellow for fully randomized linear and cyclic peptide libraries. Specifically, we define a Green target as an approximately 80% chance to find at least one hit with an affinity better than 1 µM. Note for some targets you will see Red as well.

It is important to note that the ProteinQure score is the maximum of multiple peptide libraries each with a unique scaffold.

Why PQ Librarian Matters: Our Motivation

Peptides are the unsung heroes of the therapeutic modalities, yet many library screening projects hit roadblocks. Display screening projects like phage display or mRNA display fail for unknown reasons, go unreported, and data is not fed back to improve the probability of success of peptide screening in the future. Enter PQ Librarian, combining Large Language Models, predictive neural networks, and training on a huge database of peptide-receptor interactions. We’ve also validated it on proprietary data from 20+ targets!

Not all hits are created equal! Our computational-designed libraries are not your average players. They are optimized for affinity, specificity, solubility, charge, and other properties that set them aside from fully randomized libraries. That being said, a simple library can be a useful starting point. That is why we are making PQ Librarian available to all — to help everyone get started with peptides.

Become part of a community that simplifies the complex, and let’s redefine the future of peptide library screening together — welcome to PQ Librarian!

Solving the Puzzle: Your Peptide Quest Begins! (Contact us at bd@proteinqure.com)

Peptide drug development is like a puzzle — every project is a unique challenge waiting to be conquered. But here’s the fun part: there’s no one-size-fits-all solution. Since we founded ProteinQure, our team has been on a wild ride, tackling the trickiest puzzles in peptide drug design. Some problems require more bespoke solutions. Reach out for any of the following:

Tailored Libraries: We specialize in designing biased combinatorial libraries for specific targets and applications, such as membrane proteins or drug delivery (like siRNA).
Enhanced Scoring Solutions: Our capabilities extend to scoring bespoke peptide libraries including non-canonical amino acid-containing libraries (like mRNA display) with novel scaffolds.
Custom Target Scoring: We can score libraries against any antigen sequence, including proteins with mutations or alternative splicing, as well as higher precision library scoring against specific epitopes of targets (e.g. the extracellular domain of a target protein).
Target Identification: We can use AI-driven methods to suggest possible targets across the entire human proteome if a specific cell-binding peptide sequence is known.

FAQ: PQ Librarian in Practice

Q) What are some technical assumptions being made in your model?

We’re looking forward to sharing behind-the-scenes details on how PQ Librarian was created, but for now, here are some important points.

Targets are evaluated as monomers with no post-translational modifications
We do not evaluate the binding of all peptides in a combinatorial library (often larger than 10⁹). We subsample probabilistically within a library to predict binding scores.
Peptide binding prediction is predicted in the absence of any linker or phage coat protein. In some circumstances, we expect this would result in overestimation of the binding score to some targets.
PQ Librarian is not making any predictions about solubility, specificity, or other developability properties (we have other tools for that!)

Q) My target was found in your database, but all libraries had a “Red” score. Is there any hope for hit identification?

Yes. Contact us and we can share custom solutions for challenging targets, along with feasibility evaluations for those solutions (subject to Confidential Disclosure Agreements, of course). See “Solving the Puzzle”.

Q) My target was not found in your database, can you provide a score?

Yes. Contact us and we would be happy to score any protein antigen sequence. However, we cannot score oligomeric targets where binding may be expected to occur at the interface of two proteins. See “Solving the Puzzle”.

Q) My target was found in your database, but all libraries had a “Green” score. Does this mean that I should screen all libraries?

Yes, if you have the resources. Starting with low-cost, easily accessible libraries can be a good starting point. However, there is always a potential to produce hits that are non-specific, insoluble, or include other developability liabilities. Given the choice between multiple libraries, you should seek a library with the potential to generate drug-like properties beyond binding. Contact us to screen ProteinQure’s proprietary libraries to maximize downstream success in hit validation and lead optimization.

Peptide binders for all was originally published in ProteinQure on Medium, where people are continuing the conversation by highlighting and responding to this story.

Differences in breast cancer through a biochemical lens

Mark Fingerhuth — Fri, 27 Oct 2023 16:46:21 GMT

October is Breast Cancer Awareness Month. Breast cancer is different for everyone, and this heterogeneity originates from differences in genetics and other features of cancer cells, affecting how tumors grow, spread, and respond to treatment. This article is intended to raise awareness about common types of breast cancer, which help doctors provide more personalized treatments. Advancements in research and technology are paving the way for more effective breast cancer treatments for unique cancer subtypes, offering hope for improved outcomes and a brighter future in the fight against this disease.

Breast cancer is a complex disease that affects millions of individuals worldwide, and it comes in various forms. One crucial aspect of breast cancer classification revolves around the presence or absence of specific receptors on cancer cells. These receptors — HER2, ER, and PR — play a pivotal role in determining the course of the disease and guiding treatment decisions.

In this article, we’ll delve into the different types of breast cancer in relation to these receptors and explore how this knowledge is transforming the way we approach breast cancer treatment.

HER2, ER, and PR Receptors

HER2 (human epidermal growth factor receptor 2), ER (estrogen receptor), and PR (progesterone receptor) are proteins that are expressed in breast cancer cells. These receptors play crucial roles in how breast cancer cells grow and respond to treatments.

Schematic showing the three different types of receptors (PR, ER and HER2) that define the different types of breast cancer. The three boxes at the bottom highlight some of the 3D structural features of these receptors.

PR: Progesterone receptors are nuclear receptors (expressed inside the cell) that interact with progesterone, a steroid hormone present in the body. The binding of progesterone can stimulate growth and survival of cancer cells.
ER: Estrogen receptors are also nuclear receptors (expressed inside the cell), and like PR, they respond to hormones — in this case, estrogen. When estrogen binds to these receptors, it can stimulate cancer cell growth.
HER2: A transmembrane protein (expressed on the cell surface) that helps breast cancer cells grow quickly. It can be overactive in some breast cancers, causing them to grow more aggressively.

Understanding the presence or absence of these receptors helps doctors determine the most effective treatment strategies — also known as targeted therapy — for each patient’s unique type of breast cancer.

Hormone Receptor-Positive (HR+) Breast Cancer:

HR+ breast cancers are the most prevalent subtypes of breast cancer according to the National Cancer Institute.

ER-Positive (ER+): When breast cancer cells have estrogen receptors (ER+), it means they can respond to estrogen, a hormone that promotes cell growth. To combat ER+ breast cancer, doctors often employ hormone therapy, such as Tamoxifen or aromatase inhibitors, to block the effects of estrogen and slow down cancer cell growth.

PR-Positive (PR+): Similarly, some breast cancers express progesterone receptors (PR+), which can also fuel cancer growth when progesterone binds to them. Hormone therapy is used to counteract the impact of progesterone in PR+ breast cancer.

ER/PR-Positive (ER+/PR+): In some cases, breast cancers express both ER and PR receptors. These are categorized as ER+/PR+ breast cancers and are typically treated with hormone therapy.

HER2-Positive (HER2+) Breast Cancer:

HER2-Positive (HER2+): HER2 is a protein that can be overexpressed in certain breast cancers. HER2+ breast cancer often grows more aggressively since HER2 is a growth-promoting protein. Targeted therapies, such as Herceptin (trastuzumab) and other HER2 inhibitors, are employed alongside chemotherapy to effectively treat HER2+ breast cancer.

Triple-Negative Breast Cancer (TNBC):

10–15% of breast cancer patients are diagnosed with triple-negative breast cancer. It is more common in women younger than age 40 and it tends to spread faster, currently has no good treatment option and has significantly lower survival rates than other breast cancer subtypes.

Triple-Negative Breast Cancer (TNBC): In this challenging subtype, breast cancer cells lack the expression of all three receptors — ER, PR, and HER2. Since hormone therapies and HER2-targeted treatments are ineffective, treatment options typically revolve around chemotherapy and, in some cases, immunotherapy.

Breast cancer is a heterogeneous disease, meaning it varies greatly from one patient to another. To provide the most effective treatment, physicians often perform molecular testing, such as genomic sequencing, to gain deeper insights into a patient’s specific subtype of breast cancer.

This personalized approach to treatment is transforming the field of oncology, offering new hope to patients facing a breast cancer diagnosis. As our understanding of breast cancer continues to evolve, so too do the treatment options available to those in need.

At ProteinQure, we’re dedicated to advancing the fight against breast cancer through cutting-edge research and innovative therapies. ProteinQure‘s first internal drug discovery program focuses on treating triple-negative breast cancer with a novel computationally designed peptide-drug conjugate. Check out this blog post, our press release and our whitepaper if you want to learn more about it. By staying at the forefront of scientific discovery, we aim to provide patients with the best possible treatment options and a brighter outlook on their journey to recovery.

Breast cancer is a challenging adversary, but with knowledge, research, and the right therapies, we’re working together to overcome it. Join us in the fight against breast cancer and help us spread awareness about the importance of HER2, ER, and PR receptors in breast cancer diagnosis and treatment!

All figures were created with BioRender.com and VMD.

Differences in breast cancer through a biochemical lens was originally published in ProteinQure on Medium, where people are continuing the conversation by highlighting and responding to this story.

ProteinQure’s Recent Breakthrough: A ‘Smart’ Breast Cancer Drug

Mark Fingerhuth — Wed, 23 Aug 2023 15:55:20 GMT

Chemotherapy, one of the primary treatments for cancer, unfortunately doesn’t differentiate between healthy cells and cancer cells. Imagine sending a SWAT team into a building to remove the bad guys but they can’t tell friend from foe. The result? A lot of unnecessary damage.

At ProteinQure we are working to solve this problem with our first internal drug discovery program focused on an especially deadly and currently untreatable type of breast cancer. We are thrilled to share some exciting news from our recent press release with you:

We conducted our first animal study with our most promising internal drug candidate, ARB-1-6. The study was overwhelmingly successful in demonstrating ARB-1-6’s ability to shrink breast cancer tumours without killing surrounding healthy cells.

Our groundbreaking research revolves around a special protein on the surface of some breast cancer cells called ‘sortilin’ or SORT1.

SORT1: The Doorway to Breast Cancer Cells

So, what’s so unique about SORT1? This protein acts like a doorway into the cell. Certain forms of aggressive cancer, such as triple-negative breast cancer (TNBC), have a lot of these doorways. Recognizing the prevalence of these cellular doorways in TNBC, ProteinQure has innovatively designed a ‘smart’ drug. This ‘smart’ drug is like a specially crafted key, tailored to target and unlock the SORT1 doorway, ensuring the direct and targeted delivery of a medicine right into the cancerous cell.

A ‘smart’ cancer drug ideally only kills cancer cells without harming the neighbouring healthy cells.

This focused approach has two significant benefits. First, it directly targets the harmful cancer cells, making the treatment more effective. Second, it minimizes damage to the healthy cells surrounding the cancer, which means fewer side effects for patients.

Pioneering ‘smart’ drugs

ProteinQure’s first ‘smart’ drug ARB-1-6, scientifically named a peptide-drug conjugate (PDC), is a combination of a peptide, a linker and a toxic payload.

ARB-1-6: a peptide drug conjugate (PDC) consisting of three components: peptide, linker and toxic payload.

ARB-1-6 was computationally designed using our proprietary software platform ProteinStudio. This platform utilizes a combination of molecular simulations on supercomputers and advanced AI models to design and optimize protein molecules for various properties.

ProteinStudio allowed us to tweak and optimize the peptide in such a way that it strongly & selectively binds to the SORT1 protein.

You can think of the peptide as the GPS system of the PDC, guiding the toxic payload directly to the SORT1 protein on the outside of the breast cancer cells. In other words, we’re creating guided missiles. Instead of damaging everything in their path, these missiles (or “smart” drugs) target only the harmful cancer cells, leaving healthy cells unharmed.

ARB-1-6 binds strongly to SORT1 protein on the outside of breast cancer tumour cells. It does not bind to healthy cells since they barely have SORT1 proteins.

Upon reaching a tumour cell, ARB-1-6 latches onto the SORT1 protein and the cell absorbs the drug. Once inside the cell, the cell naturally starts to disassemble the PDC by breaking it down into its three components: peptide, linker and toxic payload.

After binding to the SORT1 receptor on the outside of the cell, the ARB-1-6 molecule gets absorbed into the inside of the cell.

Initially, the peptide and linker shielded the payload’s toxic effects which allowed it to travel through the patient’s body without toxic side effects. However, once released it unleashes its full toxic potential spelling disaster for the tumour cell. The toxic payload now induces so-called ‘cell death’ effectively leading the tumour cell to self-destruct.

Once ARB-1-6 is inside the cell, the machinery inside the cell starts to split the molecule into its three components. Freeing the toxic payload unleashes it’s full toxic potential and kills the cancer cell.

Our recent animal studies have bolstered our excitement around ARB-1-6. When tested, this drug significantly slowed down the growth of breast cancer tumours in animals, without causing the harmful side effects typically associated with chemotherapy. These positive outcomes emphasize the potential of the targeted ‘smart’ drug approach, offering hope for more effective and safer treatments in the future.

What’s Next?

ProteinQure isn’t stopping here — we’re pressing on to test our PDC molecules further, aiming to refine and expand their applications. Combining ProteinStudio and the PDC delivery platform, the team is investigating the potential of delivering non-toxic payloads such as gene snippets or radioactive atoms. Gene snippets or ‘oligonucleotides’ can turn off specific harmful components in diseases and radioactive atoms can be used to diagnose cancers within the body. This not only amplifies the therapeutic capabilities of our delivery platform but also broadens the scope of diseases and conditions we can tackle.

Peptide-drug conjugates (PDCs) can deliver different payloads into cells of our choice.

Final Thoughts

In a world where we often hear of the harsh realities and side effects of cancer treatment, it’s important to make treatments safer and more efficient. With triple-negative breast cancer still awaiting FDA-approved solutions, our PDC molecules present a beacon of hope. Our novel approach offers a promising step forward in the ongoing battle against breast cancer.

For those interested in diving deeper into the technical details, check out the press release and our SORT1 whitepaper.

All figures were created with BioRender.com

ProteinQure’s Recent Breakthrough: A ‘Smart’ Breast Cancer Drug was originally published in ProteinQure on Medium, where people are continuing the conversation by highlighting and responding to this story.

Why you should write [Weekly Thoughts #1] — 5min

Lucas Siow — Mon, 11 Apr 2022 16:22:42 GMT

Why you should write more [Weekly Thoughts #1] — 5min

Find the introduction to this series

Originally written: Dec 18, 2020

Editing: Minimal

Lucas’ Updated Notes: At ProteinQure we have a page on Notion, where decisions are documented and that is publically available to all. This has been really helpful in providing a place where we collect and can review important decisions we made as a company. This is one of my favorite things we’ve to create alignment within ProteinQure.

See a redacted set of some of the most recent entries.

Screenshot from internal ProteinQure Notion

Why you should write

There are lots of ways to share information (slides, videos, writing of various sorts, meetings, etc.). We’ve experimented with many of those already. What I’ve come to realize is that the best form tends to be mostly about your goals and secondarily about your personal style. I wanted to focus a bit on why I like to write (both these weekly emails and longer memos, but also notion entries).

Note these are in order of importance/frequency to me. Though I think they apply to pretty much everyone (just in differing frequencies).

1. To help the writer make better decisions.

Writing forces you to be articulate in much more detail than any other form of communication.

The act of forcing yourself to write (and be concise) is extremely helpful for making your own case better. You often start to see the gaps in your case and prioritize much better. In the absence of writing, you are often able to trick yourself about how carefully thought out your plan is. This is because you are inevitably thinking about it deeply and will be able to answer questions fluidly. But writing convincingly (even to yourself) is a higher bar.

This and the 2nd point are by far the most valuable to me. It is why we should write on any decisions that are important and hard to reverse (hiring being the number 1 example).

2. Help us evaluate the quality of our decision-making in retrospect.

The act of writing makes the author more articulate, which allows us to go back and make sure we made good decisions. Otherwise, all people are extremely vulnerable to hindsight bias. This has stood out most to me in the hiring processes we have. It’s much easier to evaluate whether someone is good at interviewing by reading their comments at the time of the interview. Whereas, in retrospect, we always think we knew someone was a superstar or had a red flag.

But it also stands out when you read some of the memos that Tomas (about the direction of the tech team/platform) or I wrote (about the company). And while we both have been wrong often it is MUCH MUCH better than the people who have never written anything. In cases where people don’t document their thinking(for example people who don’t submit details on Lever re: hiring) they almost always remember making decisions for reasons which are correct now in hindsight.

As a general rule of thumb, if you want me to look back on your decision-making (for the positive) it requires a written-out argument. This was also the purpose of the “decisions” notion page that we have created.

3. To disseminate information broadly, more quickly

For a lot of information, you aren’t looking to get feedback, but rather you are just trying to share it broadly. In that case, reading information is by far the most convenient (it is fast and can be done asynchronously).

Writing also tends to have the lowest amount of miscommunication (fewer memory issues or mishearing). Though it still suffers from the fact that the recipient cannot really know if they are misunderstanding.

The issue is that the writing often takes more time than sharing in alternative methods that info if the audience ends up being small (<6 people).

4. To inform future PQ employees

This is a corollary of (3), as new employees join there is more and more institutional knowledge and context. You can never supplant mentorship from your colleagues, but good documentation is a company superpower. It helps get newer employees up to speed much faster. It also helps them avoid mistakes we’ve already made.This is third not because of its importance, but because realistically its just that good writing is not sufficient. As information in a company grows even well-documented and organized information becomes overwhelming. So you end up needing colleagues to curate that information for you anyways.

5. To help get feedback and helpful outside information more efficiently

This was the biggest myth, I think that comes from the pro-writing world.

The reality is that for most important decisions you want one person making that decision. And once that person is given that authority they can certainly benefit from alternative points of view, but the value is diminishing. The main issue is that no one will be as familiar or care as much as you the decision-maker with the context of the project. So their advice is almost inherently going to be less relevant. The longer you’ve been in ProteinQure the more this will become true. What is somewhat relevant is laying out your assumptions for your decision and having others help validate or reject those assumptions (often by providing facts/sources you simply weren’t aware of). You can also get improvements in grammar, spelling, clarity etc.. but those are usually lower value.

TL;DR

Write when you have to make an important decision. Use it to help focus your own decision-making more than anything else.

Why you should write [Weekly Thoughts #1] — 5min was originally published in ProteinQure on Medium, where people are continuing the conversation by highlighting and responding to this story.

Company consistency vs individual efficiency for company processes [Weekly Thoughts #3] — 3 min

Lucas Siow — Mon, 11 Apr 2022 16:15:54 GMT

Company consistency vs individual efficiency for company processes [Weekly Thoughts #3] — 3 min

Find the introduction to this series

Originally written: Jul 17, 2020

Editing: For clarity (and making the title more relevant). Thanks to Sean for help here. Changed name to employee ABC.

Lucas’ Updated Notes: This is a problem that seems to only increase as we grow. I mention multiple project management tools and systems that are no longer used at ProteinQure. These include ClickUp, Monday.com, and agile.

As of Q1 2022, we tried to center all teams (Tech, ML, Comp Bio, Wetlab, and Business Operations) on Gitlab for project management. For example, there is a BusOps GitLab board for issue tracking which is integrated with slack. I now believe this could easily work for Org’s up to 40 people at least. The hardest part is getting non-software hires acclimated to the toolset.

Individual efficiency could possibly be better understood as individual productivity/quality.

Company consistency vs individual efficiency for company processes

This week’s project management meetings highlighted a common conflict (or tension) that any company has to resolve over and over again. And we should avoid the mistaken assumption that it will EVER be solved. I don’t have any personal solutions or great insights at the moment, it's just something on my mind these days.

The tension is over whether you can build systems that are appropriate across individuals, projects, and teams simultaneously.

Making an individual (or small team) maximally productive requires adjusting their particular working styles and preferences. But making a company maximally productive means ensuring people are aligned to the goals and consistent across areas. So you sacrifice some amount of individual level productivity for reducing the overhead at a company level.

And those choices change for the differing levels of company stage and maturity. So in a fast-growing company, you have to keep adjusting.

The business operations team is probably a good example of this microcosm:

Jan 2018 — March 2019: I was the only member of the team (with support from our part-time accountant). Any company-wide systems that I participated in (Sprints, ClickUp, Monday.com, etc.) didn’t help me do my job. But it was also relatively costless for me to participate. Poor documentation on my part (for example tracking partnerships) didn’t matter either as long as it didn’t hurt performance, because no one else actually depended on the info.

March — Jul 2019: When we raised the seed round, it was clear the concept of sprint tasks and the monday.com + agile system wasn’t great for the act of raising a round. Using those tools would just incur overhead. And at the time felt really costly for me (I worked every day except 2 out of a period of 8 weeks straight). There are also negative costs for trying to update everyone all the time on fundraising (it creates distractions).

Jul 2019 — Dec 2019: EMPLOYEE ABC joins and the company grows from 6 to 11 people. ABC and my tasks are often overlapping so we need to document and co-ordinate more. Sprint tasks are not great for us (because we often work on very many small tasks and alignment via meetings). But we can put our most important things into Monday.com and it is easy for people to get some insight into what we are working on. We chose to participate in the company systems, despite not being valuable for our team specifically.

Jan 2020 — Jul 2020: ProteinQure is about 14 people for most of this period. The operations team is the same size and is using a private board on Monday.com (as well as notion) to help coordinate tasks on a more granular level (e.g. Pay X invoice, add Y person to insurance). Many of our tasks are internal meetings (and meetings more generally). It is hard to think of those tasks (and the same is true of helping with project management) as individual-specific tickets that can be tracked. It's also becoming less useful for everyone across the company to hear the nitty-gritty of what we work on (SRED credits, COVID Grants etc..).

August 2020 +: We are using Gitlab along with the rest of the team (we’ve gotten rid of Monday.com). Though for example, the “Assignment” rules don’t necessarily work for us. We tend to assign all tasks between us and then constantly change what we prioritize. Documentation is kept mostly on Notion, but we don’t tend to need context on each other's work (just clear distinctions on who is handling what). I would think this is likely to make sense until we have the next round of financing (and a significantly larger business ops organization). We are leaning into consistency with the rest of the company, the biggest benefit of which is not the operations team consistency, but the fact that all technical teams are also bought in.

Company consistency vs individual efficiency for company processes [Weekly Thoughts #3] — 3 min was originally published in ProteinQure on Medium, where people are continuing the conversation by highlighting and responding to this story.

Value of Startup Accelerators (Why we attend) [Weekly Thoughts #2] — 2min

Lucas Siow — Mon, 11 Apr 2022 16:13:28 GMT

Value of Startup Accelerators (Why we attend) [Weekly Thoughts #2] — 2min

Find the introduction to this series

Originally written: Sep 4, 2020

Editing: None

Lucas’ Updated Notes: We in fact did attend all three accelerators here.

Value of Startup Accelerators (Why we attend)

I am currently applying to three accelerators:

Google (which we accepted)
Scaleup by Laziridis Institute in waterloo
Biology at Endless Frontiers (NYU)

How valuable are accelerators and incubators? The truth is no one really knows. Unlike what you might see in the news or the startup ecosystem, I expect all of these programs to be of weakly positive value. With the exception of maybe YCombinator, most accelerators are of unknown value. The costs are almost exclusively my (and other attendees') time (sometimes some travel costs as well). And the benefits vary but can include:

Valuable Advice

If 10% of advice is valuable, that's a good incubator.

Most of it will be from people without context or knowledge of our stage/sector. I don’t think we have ever gotten useful technical advice from one of these programs though I hope that changes with google. We have gotten some useful strategic and industry advice.

Network and connections

Various programs have different types and structures for their mentors. Getting one good mentor out of a program can make the entire thing worth it. For example, in the San Francisco Canadian Technology Accelerator we had one mentor who had done BD in pharma and helped us with our discussions with one of our largest partners.

Marketing and FOMO

Particularly in the case of investors, these programs can help create a bit of an auction for fundraising. You get access to numerous investors simultaneously and they get to see the progress you make (hence hopefully incurring FOMO). It is efficient because of how concentrated the audience is.

As we get closer to needing to fundraise, I will be doing more activities like these. High risk (because you can get zero return), high reward (because you can get 1 lead investor or customer) activities.

Value of Startup Accelerators (Why we attend) [Weekly Thoughts #2] — 2min was originally published in ProteinQure on Medium, where people are continuing the conversation by highlighting and responding to this story.

A new series of writing on ProteinQure [Weekly Thoughts Introduction]

Lucas Siow — Mon, 11 Apr 2022 16:10:27 GMT

We’ve decided to create a series of blog posts sharing a subset of my weekly thoughts publically.

If you don’t want the context you can skip to the links to some content at the bottom.

The goal of making this writing public is

a) Help other startup companies especially in techbio space, with our musing and evolution on a variety of topics related to drug discovery, company building, and processes for startups.

b) Share more about our culture for the benefit of potential employees at ProteinQure.

c) Potentially improve our own thinking via feedback or commentary from more experienced folks who read and comment on the ideas.

What are my weekly thoughts?

Since mid-2019 I have written a weekly summary email about the company. In addition to basic information (important upcoming events, OKR updates, readme information, etc.), I write a section on what I am thinking about. The section is pretty free form and is geared towards internal consumption, but isn’t necessarily meant to be relevant to the whole team.

I am pretty good about being consistent with this. Given 52 weeks, we usually do about 50 updates (often skipping Christmas and New Year). We have guest writers 2–4 a year (often when I am on vacation), but overall I have probably written 45+ times a year for the last three years. Ten to twenty percent of the thoughts might be shorter (1–2 paragraphs) or repeated information. A few times a year they will be a long (5–6 pages) memo.

I hope to put up 10–15 over the course of 2022. Redacted as appropriate.

What are they not?

They are not meant to be completely original thoughts. They are my thinking on a topic. This means they could simply be a synthesis of books I’ve read or advice from founders and others I’ve spoken to. I try to attribute where I can remember, but I may not.

They are not meant to be well-cited or extremely detailed. Over time I’ve learned that shorter thoughts, lead to much more engagement from employees. Too much detailed reasoning is actually counterproductive. Citations usually cost me a lot of time and don’t add anything for most readers internally.

They are not in chronological order. Partly because we shifted away from Protonmail in early 2020 and didn’t document those ones as well. But also because I am trying to curate the ones I think are useful and have some flow (thanks to many people at PQ for helping with that).

They are lightly edited

I’ve decided to try and preserve the authenticness of the writing, by mostly posting very close to original writing. I include the date of posting in each post and any updated reflections I have on the work. In many cases, I would love to expand on some of the topics in more detail, but am trying to keep these a balance of short, useful, and authentic.

The first three thoughts (click links)

Why you should write more [Weekly Thoughts #1]

The value of startup accelerators [Weekly Thoughts #2]

Company consistency vs individual efficiency [Weekly Thoughts #3]

A new series of writing on ProteinQure [Weekly Thoughts Introduction] was originally published in ProteinQure on Medium, where people are continuing the conversation by highlighting and responding to this story.

AlphaFold Quickstart on AWS

Eugene Brodsky — Thu, 22 Jul 2021 22:55:18 GMT

AlphaFold 2.0 is widely regarded as a breakthrough milestone in predicting 3D structures of proteins using a Deep Neural Network approach. Naturally, when the AlphaFold paper was published and its source code made publicly available earlier this month, we curious folks at ProteinQure could not resist the temptation to take it for a spin ourselves. This blog post details our process for getting it running on a GPU-capable instance using the AWS Deep Learning AMI, and documents some workarounds for potentially unexpected pitfalls.

For the purposes of this post, we’ll assume you already have an AWS account and are somewhat comfortable with its use and related terminology. It is also worth mentioning that this experimentation isn’t free (but not insurmountably expensive either), and we’ll mention some cost saving techniques along the way. With this post we are aiming to provide a high-level overview of the process, but would be happy to reply in the comments with additional technical clarifications.

With that out of the way, let’s get started!

TL;DR: to just see how we fixed the GPU not being recognized, skip to step 6.

1. Pick your (virtual) hardware

You will presumably want to run the actual workload on a GPU-capable instance, but you will quickly see that a large chunk of time will be spent downloading the datasets required by AlphaFold. There’s no reason to run an expensive GPU instance for this menial task. So, we will use two instances in this experiment:

A t3a.large instance as our downloading workhorse. It offers 2 vCPUs and 8GB of RAM at $0.0752/hr. The download will take many hours, so we can expect this to cost somewhere between $3 and $6. The reason we want to pick a large instance type is because of the archive extraction step that follows the download. If we pick a smaller instance, we may exhaust our CPU credits and the extraction will slow down to a crawl.
A p3.2xlarge instance as our AlphaFold runner. It offers one NVIDIA Tesla V100 GPU, which is perfectly capable for our purpose. AlphaFold supports multiple GPUs, but currently has some issues using them reliably.

💡 To be clear, if you have deep pockets or just don’t want to bother, it’s totally fine to do the downloading on the GPU instance and save yourself the hassle. In that case, adjust accordingly while following the walk-through.

2. Create a persistent volume to hold your data

AlphaFold databases, once downloaded and uncompressed, will require just over 2.5TB of disk space. We certainly wouldn’t want this to be accidentally lost if we terminate our instances. For this, we’ll navigate to EBS and create a dedicated data volume with the following parameters:

Volume type: gp3. It has a good balance of cost and performance, and you can increase its IOPS and throughput as necessary
Capacity: 3TB. You don’t want to run out of space!
IOPS/Througput: This is a tricky one. One of the steps during AlphaFold inference will be bottlenecked by I/O, but also you don’t want to overpay for this. We recommend leaving IOPS at the default value of 3,000, but bumping the throughput to at least 300MB/s from the default 125. But we haven’t experimentally A/B tested this, and this is not likely to be a major performance bottleneck.

Make sure to give your volume a name so you can find it later in the EBS volume list.

Make especially sure that you stick to only a single Availability Zone (e.g. us-east-2b) when you create the volume AND your instances, as you can not attach EBS volumes to instances in different AZs! Check that your preferred GPU instance type is available in your selected AZ.

3. Set up the downloader instance

Let’s run a new t3a.large instance using an Ubuntu AMI (any version).

Attach the data volume you created in the previous step, for example at /dev/sdc, which will make it visible in the OS at /dev/xvdc.
Log in to the instance and verify this with lsblk — you should see /dev/xvdc as the last block device on the list.
Create a partition, format it with xfs, and mount it under /data:

$ sudo mkfs.xfs /dev/xvdc
$ sudo mkdir /data
$ sudo chmod 777 /data
$ sudo mount /dev/xvdc1 /data
$ df -h /data 
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdc1      3T    0B   3T    0%   /data

4. Install the prerequisites:

$ sudo apt update && sudo apt install -y git aria2 rsync tmux

4. Pull down code and databases

This is somewhat redundant to the official documentation, but worth mentioning here. For clarity, we will not be building any Docker images or testing GPU capabilities (we have no GPU) just yet. We’re only here to download the databases.

💡 A couple of things worth noting here. First, notice how we installed tmux in the previous step. Using tmux is a great way to not lose your console session if your SSH connection drops. You simply run tmuxas the first command when you ssh in to the instance. A tmux tutorial is definitely out of scope for this post, but now you have some terms to fuel your search, if you’re not familiar with it already! Second, it is possible to speed up your downloads by: 1) running them in concurrent tmux windows (or sessions), and 2) modifying the individual download scripts (under scripts/) by adding -s8 -x8 options to the aria2c commands. However, note: this puts additional load on the servers hosting the data, some of them belonging to academic institutions. So, having been armed with this knowledge, please use this in moderation and consider being a good citizen of the internet!

With that out of the way, let’s download:

$ cd /data
$ git clone https://github.com/deepmind/alphafold.git
$ mkdir af_databases
$ scripts/download_all_data.sh /data/af_databases
$ ### (OR use the individual download scripts)

This will take a looooong time. While that’s happening, we can set up our AlphaFold GPU runner instance.

5. Spin up the runner

Let’s create a p3.2xlarge instance in the AWS console. Use the AWS Deep Learning AMI for this purpose (Ubuntu 18.04). This is actually one of the easiest steps in this entire tutorial: you don’t need do much about this machine. Once started and you’ve logged in to it, install a couple of missing prerequisites (the others are already present):

$ sudo apt update && sudo apt install -y aria2 rsync tmux

You can skip the First Time Setup section of the official documentation: the NVIDIA container toolkit and Docker are pre-installed 🙌. Let’s check that the GPUs are usable by Dockerized workloads (as per official docs):

$ docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

The databases are still being downloaded on the other instance, so we will only run a modified subset of commands from the Running AlphaFold section of the official docs to to build the Docker image and install prerequisites:

$ cd
$ git clone https://github.com/deepmind/alphafold.git
$ cd alphafold
$ docker build -f docker/Dockerfile -t alphafold .
$ pip3 install -U -r docker/requirements.txt
$ cd
$ rm -rf alphafold

Yes, we removed the cloned directory in the last step. Why is that? Because we already have it saved on the data disk (still in use by the download), we’ll be working from there, and we don’t like getting confused!

Now all we can do is wait for the downloads to complete. It’s advisable to stop (NOT terminate!) the GPU instance at this time, and go for a long walk.

once the downloads are done…

…(and finished unarchiving, which will take a non-trivial amount of time as well; aim for ~3 hours!), we have a very precious 3TB EBS volume on our hands. We could optionally make a snapshot of it for safekeeping, depending on our future use cases.

Let’s stop our downloader instance and detach the data volume ( /dev/sdc if you were following along). We can now attach it to our GPU instance, also at /dev/sdc for consistency.

Start the GPU instance and SSH into it. Mount the data volume:

$ sudo mkdir /data
$ sudo chmod 777 /data
$ sudo mount /dev/xvdc1 /data
$ cd /data

You should see the contents of the volume just as they were on the downloader instance.

6. Fix the issue with GPUs not being recognized

Now we’ve finally gotten to the meat of the issue. This is what prompted the writing of this blog post in the first place.

If you were to run the AlphaFold code right now as per the docs, you might (or might not!) notice that the GPUs are not actually being used. This is evidenced by the following log gnarliness:

I0716 20:25:31.123608 139790799320896 run_docker.py:180] 2021-07-16 20:25:31.123045: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I0716 20:25:31.124151 139790799320896 run_docker.py:180] 2021-07-16 20:25:31.123767: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
I0716 20:25:31.124289 139790799320896 run_docker.py:180] 2021-07-16 20:25:31.123807: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
I0716 20:25:31.124416 139790799320896 run_docker.py:180] Skipping registering GPU devices...

But this is a GPU-enabled instance, and the tests ran fine, you might say? Well, it seems that the Docker image we’ve built is missing CUDA libs (though a pull request is open that should hopefully fix it).

Thankfully, the AWS Deep Learning AMI we’re using comes pre-installed with CUDA 11.1 and CuDNN libraries that are required by TensorFlow. They are located, as expected, under /usr/local/cuda-11.1/. All we need to do is get them into the container, and modify the LD_LIBRARY_PATH variable such that TensorFlow can actually find these libraries. This can be easily tested by running a Docker container, like so:

$ docker run --rm -it --gpus=all --entrypoint bash -v /usr/local/cuda-11.1/:/usr/local/cuda-11.1 -e LD_LIBRARY_PATH=/usr/local/cuda:/usr/local/cuda-11.1/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 alphafold
# python
Python 3.8.5 (default, Sep  4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-07-16 21:05:17.623154: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
>>> tf.config.list_physical_devices('GPU')
2021-07-16 21:05:21.849962: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
...
tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:00:1b.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB
...
2021-07-16 21:05:22.037939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 3 with properties:
pciBusID: 0000:00:1e.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-07-16 21:05:22.037987: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-16 21:05:22.041155: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
...
...
2021-07-16 21:29:20.397608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1, 2, 3
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]

Success! We’ve proven that there is a solution. But we’re not done yet. We now need to modify the docker/run_docker.py script to bake this fix into AlphaFold itself.

To accomplish this, we modify the docker/run_docker.py script described in this pull request: https://github.com/ProteinQure/alphafold/pull/1/files. The change would be too tedious to post here in its entirety, but we are simply doing two things:

Mounting the CUDA libraries from the host;
Setting the LD_LIBRARY_PATH environment variable (admittedly in a very brutal and clobbering way) to include the paths to these libraries

And that’s all there is to it! You’re ready to Fold!

Please carefully read the official documentation and make sure you do set the database dir as well as modify the output paths in your docker/run_docker.py script. Ensuring these directories exist and are writable is left as an exercise to the reader. We highly recommend all this data be kept on your 3TB data drive. In the original script, the outputs are going to /tmp/..., so pay attention.

Finally…

Don’t forget that you’re paying for all of these resources. Watch your bill carefully. If you’re feeling adventurous, you can also try to use AWS Spot instances to reduce your spend, but this is an advanced topic requiring a non-trivial amount of automation. If you’re going down this route, we recommend running your workloads on Kubernetes. We’d be happy to share how we do this at ProteinQure!

Conclusion

In this post, we walked through the process of setting up a functional AlphaFold environment on AWS, including some cost-management techniques, and proposed a fix for an issue with missing CUDA libraries that seems to be a blocker for many people trying to run AlphaFold in their environments.

AlphaFold Quickstart on AWS was originally published in ProteinQure on Medium, where people are continuing the conversation by highlighting and responding to this story.

Auto-fixing annoyances with Black and Pylint

Sean Aubin — Fri, 18 Jun 2021 15:20:09 GMT

In the previous post, we discussed using Visual Studio Code to enhance Python editing with various features, such syntax highlighting and keyboard shortcuts. This post introduces the tools Pylint and Black for automatically resolving Python code-style problems.

If you’re new to Python, these two pieces of code might seem a bit contrived, but otherwise functional:

https://medium.com/media/c942036be25a3ce988d7e4cfd3056baf/href https://medium.com/media/ac5912b808d2de525211bf60a74b8744/href

However, an experienced Python developer will have a hard time reading it and will find one bug. Luckily, there are tools to give you this same insight!

Auto-formatting with Black

Every Pythonista, left to their own devices, will come up with their own conventions and implement them with copious manual insertion of whitespace. This is tedious as an individual, but disastrous when collaborating with others as formatting preferences clash. Consistent formatting is essential for enabling readers/reviewers to focus on logic, instead of being distracted style.

Luckily, most programming languages have an auto-formatter which inserts the appropriate whitespace in your code. For Python, this is Black.

Black takes awkwardly formatted (but still runnable!) code:

https://medium.com/media/c942036be25a3ce988d7e4cfd3056baf/href

And gives you code with sensible whitespace:

https://medium.com/media/c068445167797ee5465ce7e7d2d382e3/href

Black also automatically breaks up long or nested lines of code into multiple lines.

You may find Black clashes with the conventions you’ve developed over the years. You will have to get over it. Black is purposely not configurable. If arguing about code style is the most futile conversation one can have, discussing an auto-formatter’s configuration is a close second. Instead, I recommend spending time developing opinions on other matters, such as music or pizza toppings.

Integrating Black with Visual Studio Code

Visual Studio Code’s can be found under File -> Preferences -> Settings. In the top-right corner, there should be an icon to Open Settings (JSON). You should see something that looks like a Python dictionary, but with curly braces:

{
    "editor.formatOnSave": true,
    "editor.renderWhitespace": "all",
    "files.autoSave": "afterDelay",
    "python.analysis.typeCheckingMode": "basic",
    "python.formatting.provider": "black",
    "python.formatting.blackArgs": [
        "--line-length=120"
    ],
}

Although I do personally recommend all of these settings, the most important one is “python.formatting.provider”: “black”, which will allow you to run black with the keyboard shortcut CTRL+SHIFT+I.

Auto-check with Pylint

Python is a big, old language. There are many ways to do things. Some of them are very wrong. Finding out which ones are wrong and why usually involves someone more experienced than you reading over your code.

However, many common mistakes can be detected automatically with Pylint.

For example, if you want to pass a list as a default argument to a function, you may write something like this:

https://medium.com/media/ac5912b808d2de525211bf60a74b8744/href

This is a common mistake in Python. If you run Pylint from the command-line:

pylint bad_code.py

It will warn you:

Dangerous default value [] as argument (dangerous-default-value)

Which allows you to implement the following fix:

https://medium.com/media/1aad06a90057b72e9eaa5dc089908033/href

Pylint has many such code problems it can check for. Some of them are more annoying than helpful, but Pylint lets you ignore many of them.

Here’s an extremely conservative list of ones I ignore, based on my years of working with Python, pastable into your pyproject.toml at the root of your project. Alternatively, you can slowly add to your own list every time you get annoyed.

Integrating Pylint with Visual Studio Code

Linting is turned on by default in Visual Studio Code. Additionally, if you have a pyproject.toml file in the root directory of the project you're working on, it will be detected and use by Pylint automatically, so there's no configuration required.

This post continued on the theme of using tools to make good coding habits painless and automatic. You can learn to do consistent formatting and avoid Python’s pitfalls with hard-earned experience, but using Pylint and Black is so much easier! In addition to reducing individual suffering, these tools allow collaborators of various skill levels to reach a baseline of quality. Similarly, the next post in the series will cover automatically detecting accidental code breakage with Pytest.

Auto-fixing annoyances with Black and Pylint was originally published in ProteinQure on Medium, where people are continuing the conversation by highlighting and responding to this story.

Better Code with Less Typing

Sean Aubin — Tue, 26 Jan 2021 20:02:59 GMT

Choosing a good code editor for science

If an artist is only as good as their tools, then I spent years finger-painting, followed by years getting hand cramps from paint-brushes I couldn’t figure out how to hold correctly.

When I was in High School, I used to write HTML in Notepad. Then my friend showed me Notepad++ and I was astonished by how much easier it made so many tasks.

When I started writing in Python, I kept using Notepad++, which really isn’t suited for extensive code editing. Then I spent several years trying and failing to configure and master Vim and Emacs.

Years later, I discovered there was a middle ground between using rudimentary tools and spending days configuring esoteric editors.

Visual Studio Code or Pycharm

Both Visual Studio Code and Pycharm were created with Python in mind. They have features like:

Syntax Highlighting Making code easier to read with colours differentiating constants, variables… etc.
Code Completion Like your phone’s auto-complete, but for code.
Tool Tips Documentation and hint pop-ups when hovering your cursor over a piece of code.

All enabled without any configuration!

They’re both good, but I have a slight preference towards Visual Studio Code, because it’s faster to start up. So the links and keyboard shortcuts in the following discussion will be Visual Studio Code specific.

Everyday commands you should know

In each editor, there are a few commands which are absolutely essential to leverage:

Go to definition Find where a function was defined or where a variable was declared with F12. Faster and more accurate than scrolling or searching the full text with Ctrl+F!
Find all references Find where a function is used in your project using Shift+Alt+F12. Especially useful when you want to change a function's parameters without breaking your code in several places.
Rename symbol Rename a variable using F2, instead of using find-and-replace to and accidentally changing something unrelated!
Multi-cursor editing Although the find-and-replace functionality of Visual Studio Code is powerful, I sometimes find it hard to track which parts I’m changing. Instead, I like to use mutli-cursor editing, wherein I highlight the text I want to change and use Ctrl+D to spawn a cursor at the next occurrence. This is difficult to explain textually, so try it out for yourself!

If you have trouble remembering keyboard shortcuts, both editors have the Command Palette, which lets you type the full command name, instead of hunting through drop-down menus.

I use the Command Palette everyday. However, I also recommend the tried and true method of printing out a cheat sheet and taping it to the side of your monitor.

In the next post, we’ll keep building on the theme of small changes to coding habits and environment with significant benefits. Specifically, we’ll cover how to avoid bugs without running code and how to stop arguing with your collaborators over code formatting.

Better Code with Less Typing was originally published in ProteinQure on Medium, where people are continuing the conversation by highlighting and responding to this story.