Roshan Rao (@proteinrosh) / X

Roshan Rao

733 posts

Roshan Rao

@proteinrosh

Research scientist @Biohub. Foundation models for biology. Prev: Co-Founder/RS @EvoscaleAI, RS @MetaAI, PhD @berkeley_ai.

New York, NY

Joined February 2019

Roshan Rao
@proteinrosh
Feb 16, 2021
Excited to release our new paper on unsupervised attention-based MSA protein lengauge modeling! 🧵 (1/8) Paper: biorxiv.org/content/10.110… Code: github.com/facebookresear…
GIF
Roshan Rao
@proteinrosh
Nov 1, 2022
We are thrilled to announce the ESM Metagenomic Atlas (esmatlas.com)! In this effort we folded the entirety of MGnify90 and are releasing all folded structures. This database contains >617 million structures, of which >225 million are predicted with high confidence.
Roshan Rao
@proteinrosh
Dec 10, 2019
Are you a computational biologist trying to embed some proteins? Jealous of NLP researchers for @huggingface's easy-to-use repository of models? Come check out our re-release of our TAPE code (github.com/songlab-cal/ta…), complete with a huggingface-style API for loading models!
Roshan Rao
@proteinrosh
Jun 25, 2024
We have trained ESM3, a generative bidirectional masked language model that reasons over the sequence, structure, and function of proteins. ESM3 is trained at three model scales - 1.4B, 7B, and 98B. x.com/alexrives/stat…
01:31
Alex Rives
@alexrives
Jun 25, 2024
We have trained ESM3 and we're excited to introduce EvolutionaryScale. ESM3 is a generative language model for programming biology. In experiments, we found ESM3 can simulate 500M years of evolution to generate new fluorescent proteins. Read more: evolutionaryscale.ai/blog/esm3-rele…
49K
Roshan Rao
@proteinrosh
Dec 16, 2020
Our new paper on unsupervised contact prediction with protein LMs is up on bioarxiv! Examining Transformers trained with MLM on protein sequences, we find attention maps predict contacts *better* than Potts models trained on the corresponding MSA. 1/12
biorxiv.org
Transformer protein language models are unsupervised structure learners
Unsupervised contact prediction is central to uncovering physical, structural, and functional constraints for protein structure determination and design. For decades, the predominant approach has...
Roshan Rao
@proteinrosh
Dec 9, 2021
Here’s a video of my talk! I tried to make it relatively accessible to people without much background in either biology or ML. Definitely unbiased reviewers (my roommates) suggest I at least partially succeeded. youtu.be/hcJS9d09ECA
Roshan Rao
@proteinrosh
May 1, 2020
TAPE v0.4 is released! In addition to a number of bugfixes, we've added the TRRosetta model for structure prediction so that you can play around with predicting structure in pytorch!
Roshan Rao
@proteinrosh
Dec 8, 2021
Post defense drinks ❤️
Roshan Rao
@proteinrosh
Oct 3, 2021
This is a helpful thread, but I want to point out that I and many others got into competitive PhD programs without having any publications. Strong letters of recommendation can and will outweigh publications.
Chaitanya K. Joshi
@chaitjo
Oct 3, 2021
Are you applying for a PhD in Machine Learning, Artificial Intelligence, and beyond? Here's a thread of high-quality resources that helped me understand the process + craft my application better. 👇
Roshan Rao
@proteinrosh
Jun 25, 2024
Working on ESM3 has been the most challenging and the most rewarding part of my career. I am incredibly proud of the team we have built - y’all make it so fun to come in to work each day.
Thomas Hayes
@THayes427
Jun 25, 2024
Replying to @THayes427
I’m incredibly grateful to work with this amazing team. This is the most dedicated and creative team I’ve ever worked with, and I’m so excited to continue building. Please don’t hesitate to reach out if you’re interested!
5.1K
Roshan Rao
@proteinrosh
Jun 7, 2021
Interesting paper! Arguably, this is exactly what our MSA Transformer is - a model that alternates between attention within a sequence and attention across different sequences. Big difference is they use random mini batching as opposed to an explicit search fir related points.
Jannik Kossen
@janundnik
Jun 7, 2021
🗞New Paper🗞 🤖🧪Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning 🧪🤖 Huge thanks to @neilbband* as well as @clarelyle, @AidanNGomez, @tom_rainforth, @yaringal, and @OATML_Oxford ! Introducing 🚀Non-Parametric Transformers🚀 1/
Roshan Rao
@proteinrosh
Dec 8, 2021
Please call me Roshan, Dr. Rao is my mother’s name. #phdlife
Roshan Rao
@proteinrosh
Sep 22, 2021
I don't fully agree here. Large Deep Networks models can have emergent behavior, beyond what they are explicitly designed to do. I wouldn't have predicted that AlphaFold could model protein complexes, but it is clearly able to do so in some cases, even without paired MSAs. (1/5)
Ewan Birney
@ewanbirney
Sep 21, 2021
A reminder for bioinformaticians - AlphaFold works off the *real* multiple alignment, created by evolution. Flipping an amino acid in the target protein to model a mutation will not work in AlphaFold. Please don't do it. Please dont write papers about how it doesn't work.
Roshan Rao
@proteinrosh
Dec 7, 2022
I figured the right approach to learning a model of protein stability was to wait for Gabe Rocklin to make a big enough dataset
bioRxiv Biophysics
@biorxiv_biophys
Dec 7, 2022
Mega-scale experimental analysis of protein folding stability in biology and protein design biorxiv.org/cgi/content/sh… #biorxiv_biophys