{"id":17,"date":"2014-04-28T08:38:16","date_gmt":"2014-04-28T00:38:16","guid":{"rendered":"https:\/\/ronghanghu.com\/\/\/?page_id=17"},"modified":"2026-02-28T10:05:15","modified_gmt":"2026-02-28T18:05:15","slug":"about-me","status":"publish","type":"page","link":"https:\/\/ronghanghu.com\/","title":{"rendered":"Ronghang Hu (\u80e1\u620e\u822a)"},"content":{"rendered":"<p style=\"padding-left: 6%;\"><img loading=\"lazy\" decoding=\"async\" class=\"ronghang_photo size-full wp-image-323 alignright\" src=\"https:\/\/ronghanghu.com\/wp-content\/uploads\/RH_avatar.png\" alt=\"\" width=\"200\" height=\"200\" \/>Member of Technical Staff, xAI<br \/>\nEmail: ronghang.hu@gmail.com<\/p>\n<p style=\"padding-left: 6%;\"><a href=\"https:\/\/scholar.google.com\/citations?user=rTw-pq0AAAAJ\" target=\"_blank\" rel=\"noopener\">Google Scholar<\/a> \u2022 <a href=\"https:\/\/github.com\/ronghanghu\" target=\"_blank\" rel=\"noopener\">GitHub<\/a> \u2022 <a href=\"https:\/\/x.com\/RonghangHu\" target=\"_blank\" rel=\"noopener\">X (Twitter)<\/a> \u2022 <a href=\"https:\/\/www.linkedin.com\/in\/ronghanghu\/\" target=\"_blank\" rel=\"noopener\">LinkedIn<\/a><\/p>\n<h3 style=\"text-align: center;\">About Me \/ Bio<\/h3>\n<ul>\n<li>Ronghang Hu is a member of technical staff at <a href=\"https:\/\/x.ai\/\">xAI<\/a>, focusing on pushing the frontier of multimodal AI.<\/li>\n<li>Previously, Ronghang Hu was a research scientist at <a href=\"https:\/\/ai.meta.com\/\">Meta FAIR<\/a> (formerly Facebook AI Research), and was devoted to the Segment Anything series of projects to build strong visual perception models, and was a core contributor to <a href=\"https:\/\/github.com\/facebookresearch\/sam2\">SAM 2<\/a> and <a href=\"https:\/\/github.com\/facebookresearch\/sam3\">SAM 3<\/a>. Ronghang obtained his Ph.D. degree in Computer Science from the University of California, Berkeley in 2020, and his B.Eng. degree from Tsinghua University in 2015.<\/li>\n<\/ul>\n<h3 style=\"text-align: center;\">Experiences<\/h3>\n<ul>\n<li><strong>xAI<\/strong> (Palo Alto, CA; 11\/2025 \u2014 present)<br \/>\nMember of Technical Staff<\/li>\n<li><strong>Meta FAIR<\/strong> (Menlo Park, CA; 06\/2020 \u2014 11\/2025)<br \/>\nResearch Scientist<\/li>\n<li><strong>Facebook AI Research<\/strong> (Menlo Park, CA; 05\/2019 \u2014 08\/2019)<br \/>\nResearch Intern<\/li>\n<li><strong>Facebook AI Research<\/strong> (Seattle, WA; 05\/2017 \u2014 08\/2017)<br \/>\nResearch Intern<\/li>\n<\/ul>\n<h3 style=\"text-align: center;\">Education<\/h3>\n<ul>\n<li><strong>University of California, Berkeley<\/strong> (Berkeley, CA; 08\/2015 \u2014 05\/2020)<br \/>\nPh.D. and M.S. in Computer Science<\/li>\n<li><strong>Tsinghua University<\/strong> (Beijing, China; 08\/2011 \u2014 07\/2015)<br \/>\nB.Eng. in Electronic Information Science and Technology<\/li>\n<\/ul>\n<h3 style=\"text-align: center;\">Selected Projects<\/h3>\n<hr \/>\n<h5><img loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-2538 size-medium\" src=\"https:\/\/ronghanghu.com\/wp-content\/uploads\/sam3-250x169.png\" alt=\"\" width=\"250\" height=\"169\" \/>SAM 3: Segment Anything with Concepts<\/h5>\n<p><span class=\"authorlist\">N. Carion, L. Gustafson, Y.-T. Hu, S. Debnath, <strong>R. Hu<\/strong>, D. S. Coll-Vinent, C. Ryali, K. V. Alwala, H. Khedr, A. Huang, J. Lei, T. Ma, B. Guo, A. Kalla, M. Marks, J. Greer, M. Wang, P. Sun, R. R\u00e4dle, T. Afouras, E. Mavroudi, K. Xu, T.-H. Wu, Y. Zhou, L. Momeni, R. Hazra, S. Ding, S. Vaze, F. Porcher, F. Li, S. Li, A. Kamath, H. K. Cheng, P. Doll\u00e1r, N. Ravi, K. Saenko, P. Zhang, C. Feichtenhofer<\/span><br \/>\n<span class=\"conference\"><em>International Conference on Learning Representations (ICLR), 2026<\/em><\/span><br \/>\n(<a href=\"https:\/\/arxiv.org\/pdf\/2511.16719\">PDF<\/a>, <a href=\"https:\/\/ai.meta.com\/sam3\/\">Project<\/a>, <a href=\"https:\/\/github.com\/facebookresearch\/sam3\">Code<\/a>, <a href=\"https:\/\/aidemos.meta.com\/segment-anything\">Demo<\/a>, <a href=\"https:\/\/ai.meta.com\/blog\/segment-anything-model-3\">Blog<\/a>)<\/p>\n<ul>\n<li><strong>Segment Anything Model 3 (SAM 3)<\/strong> is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks. Compared to its predecessor SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short text phrase or exemplars.<\/li>\n<\/ul>\n<hr \/>\n<h5><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2375 alignright\" src=\"https:\/\/ronghanghu.com\/wp-content\/uploads\/sam2-1024x1024.jpg\" alt=\"\" width=\"250\" height=\"250\" \/>SAM 2: Segment Anything in Images and Videos<\/h5>\n<p><span class=\"authorlist\">N. Ravi<sup>*,\u2020<\/sup>, V. Gabeur<sup>*<\/sup>, Y.-T. Hu<sup>*<\/sup>, <strong>R. Hu<sup>*<\/sup><\/strong>, C. Ryali<sup>*<\/sup>, T. Ma<sup>*<\/sup>, H. Khedr<sup>*<\/sup>, R. R\u00e4dle<sup>*<\/sup>, C. Rolland, L. Gustafson, E. Mintun, J. Pan, K. V. Alwala, N. Carion, C.-Y. Wu, R. Girshick, P. Doll\u00e1r<sup>\u2020<\/sup>, C. Feichtenhofer<sup>*,\u2020<\/sup> (<sup>*<\/sup>: equal technical contribution, <sup>\u2020<\/sup>: equal advising)<\/span><br \/>\n<span class=\"conference\"><em>International Conference on Learning Representations (ICLR), 2025 \u2014 <strong>Outstanding Paper Honorable Mentions<\/strong><\/em><\/span><br \/>\n(<a href=\"https:\/\/arxiv.org\/pdf\/2408.00714\">PDF<\/a>, <a href=\"https:\/\/ai.meta.com\/sam2\/\">Project<\/a>, <a href=\"https:\/\/github.com\/facebookresearch\/sam2\">Code<\/a>, <a href=\"https:\/\/sam2.metademolab.com\/\">Demo<\/a>, <a href=\"https:\/\/ai.meta.com\/datasets\/segment-anything-video\">Dataset<\/a>, <a href=\"https:\/\/ai.meta.com\/blog\/segment-anything-2\">Blog<\/a>)<\/p>\n<ul>\n<li><strong>Segment Anything Model 2 (SAM 2)<\/strong> is a foundation model towards solving promptable visual segmentation in images and videos. We extend SAM to video by considering images as a video with a single frame. The model design is a simple transformer architecture with streaming memory for real-time video processing.<\/li>\n<\/ul>\n<hr \/>\n<h5><img loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-2328 size-full\" src=\"https:\/\/ronghanghu.com\/wp-content\/uploads\/flip-e1678313236861.jpg\" alt=\"\" width=\"250\" height=\"145\" \/>Scaling Language-Image Pre-training via Masking<\/h5>\n<p><span class=\"authorlist\">Y. Li<sup>*<\/sup>, H. Fan<sup>*<\/sup>, <strong>R. Hu<sup>*<\/sup><\/strong>, C. Feichtenhofer<sup>\u2020<\/sup>, K. He<sup>\u2020<\/sup> (<sup>*<\/sup>: equal technical contribution, <sup>\u2020<\/sup>: equal advising)<\/span><br \/>\n<span class=\"conference\"><em>Computer Vision and Pattern Recognition (CVPR), 2023<\/em><\/span><br \/>\n(<a href=\"https:\/\/arxiv.org\/pdf\/2212.00794.pdf\">PDF<\/a>, <a href=\"https:\/\/github.com\/facebookresearch\/flip\">Code<\/a>)<\/p>\n<ul>\n<li>We present <strong>Fast Language-Image Pre-training (FLIP)<\/strong>, which gives ~3.7x speedup over the traditional CLIP and improves accuracy using the same training data on a large variety of downstream tasks.<\/li>\n<\/ul>\n<hr \/>\n<h5><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2336 size-full alignright\" src=\"https:\/\/ronghanghu.com\/wp-content\/uploads\/convnext_v2-e1678564497247.jpg\" alt=\"\" width=\"250\" height=\"141\" \/>ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders<\/h5>\n<p><span class=\"authorlist\">S. Woo, S. Debnath, <strong>R. Hu<\/strong>, X. Chen, Z. Liu, I. S. Kweon, S. Xie.<\/span><br \/>\n<span class=\"conference\"><em>Computer Vision and Pattern Recognition (CVPR), 2023<\/em><\/span><br \/>\n(<a href=\"https:\/\/arxiv.org\/pdf\/2301.00808.pdf\">PDF<\/a>, <a href=\"https:\/\/github.com\/facebookresearch\/ConvNeXt-V2\">Code<\/a>)<\/p>\n<ul>\n<li>We propose <strong>ConvNeXt V2<\/strong>, a fully convolutional masked autoencoder framework (FCMAE) and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition.<\/li>\n<\/ul>\n<hr \/>\n<h5><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2278 size-full alignright\" src=\"https:\/\/ronghanghu.com\/wp-content\/uploads\/flava_teaser-e1648688349905.jpg\" alt=\"\" width=\"250\" height=\"172\" \/>FLAVA: A Foundational Language And Vision Alignment Model<\/h5>\n<p><span class=\"authorlist\">A. Singh<sup>*<\/sup>, <strong>R. Hu<sup>*<\/sup><\/strong>, V. Goswami<sup>*<\/sup>, G. Couairon, W. Galuba, M. Rohrbach, D. Kiela (<sup>*<\/sup>: equal contribution)<\/span><br \/>\n<span class=\"conference\"><em>Computer Vision and Pattern Recognition (CVPR), 2022<\/em><\/span><br \/>\n(<a href=\"https:\/\/arxiv.org\/pdf\/2112.04482.pdf\">PDF<\/a>, <a href=\"https:\/\/flava-model.github.io\/\">Project Page<\/a>)<\/p>\n<ul>\n<li>We propose <strong>FLAVA, a foundational model<\/strong> that performs well over a wide variety of 35 tasks on all three target modalities: 1) vision, 2) language, and 3) vision &amp; language, and develop an efficient joint pretraining approach on both unimodal sources.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Member of Technical Staff, xAI Email: ronghang.hu@gmail.com Google Scholar \u2022 GitHub \u2022 X (Twitter) \u2022 LinkedIn About Me \/ Bio Ronghang Hu is a member of technical staff at xAI, focusing on pushing the frontier of multimodal AI. Previously, Ronghang Hu was a research scientist at Meta FAIR (formerly Facebook AI Research), and was devoted &hellip; <a href=\"https:\/\/ronghanghu.com\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> \"Ronghang Hu (\u80e1\u620e\u822a)\"<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"full-width-page.php","meta":{"footnotes":""},"class_list":["post-17","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/ronghanghu.com\/wp-json\/wp\/v2\/pages\/17","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ronghanghu.com\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ronghanghu.com\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ronghanghu.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ronghanghu.com\/wp-json\/wp\/v2\/comments?post=17"}],"version-history":[{"count":600,"href":"https:\/\/ronghanghu.com\/wp-json\/wp\/v2\/pages\/17\/revisions"}],"predecessor-version":[{"id":2553,"href":"https:\/\/ronghanghu.com\/wp-json\/wp\/v2\/pages\/17\/revisions\/2553"}],"wp:attachment":[{"href":"https:\/\/ronghanghu.com\/wp-json\/wp\/v2\/media?parent=17"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}