Sign in to view Tim’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
London Area, United Kingdom
Sign in to view Tim’s full profile
Tim can introduce you to 10+ people at UCL
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
8K followers
500+ connections
Sign in to view Tim’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Tim
Tim can introduce you to 10+ people at UCL
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Tim
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Tim’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
About
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Activity
8K followers
-
Tim Rocktäschel reposted thisTim Rocktäschel reposted this🚨🚨 We're hiring for two executive roles at Advanced Research + Invention Agency (ARIA): CFO and Chief Translation Officer. Each has the opportunity to have an enormous impact on the future of science and technology. ARIA has built an extraordinarily special team and these are pivotal roles for us What is a Chief Translation Officer? This person will be responsible for turning the scientific breakthroughs ARIA funds into real-world impact. We believe there are literally trillions of pounds up grabs. Ideal for former founders, investors, execs... Full JDs in the links in the comments.
-
Tim Rocktäschel reposted thisTim Rocktäschel reposted thisThrilled to be featured in the first ALife Newsletter of the year! I share about how I got into open-endedness research, the inspiration behind my recent work, and how I think open-endedness fits into the future of AI and AI researchers Huge thanks to the newsletter's editors! Newsletter: https://lnkd.in/g5Dd25Sh
-
Tim Rocktäschel reposted thisTim Rocktäschel reposted thisWe’re partnering with Waymo to introduce the Waymo World Model, powered by Genie 3. By combining Genie 3’s world knowledge with Waymo’s precise sensor data, the model generates photorealistic, interactive environments to train autonomous vehicles. This allows engineers to simulate “what if” scenarios – like extreme weather or reckless drivers – to stress-test systems against unpredictable events before encountering them in reality. Find our more → https://lnkd.in/eg9NU2jG
-
Tim Rocktäschel reposted thisTim Rocktäschel reposted thisEinsum is All You Need? I recently started working with JAX and realized how universal and awesome einsum really is. Even today, this 2018 article by Tim Rocktäschel remains relevant and still proved useful. It’s crazy how much easier it makes handling tensors. And boy, does it make coding and tweaking Attention simpler. On another note, I’m absolutely in love with personal blogs hosted on people’s own pages. Do let me know if you know more folks who post cool, deep stuff like this. Look at how many lines it saves -> #JAX #DeepLearning
-
Tim Rocktäschel reposted thisTim Rocktäschel reposted thisICLR 2026, here we come! 🚀 The Oxford Young Statisticians Seminar is thrilled to announce that our members (and their supervisors) have had a total of 9 papers accepted to the main track of this year's conference See the list below ⬇️ A huge congratulations to our brilliant researchers and their co-authors for this amazing achievement. We are proud to showcase their contributions to the machine learning community. All these young researchers will soon be looking for funding to attend the conference in Rio de Janeiro. Please reach out if you'd like to sponsor part of their travel. ____________________ Accepted Papers: 📔GRACE: A Language Model Framework for Explainable Inverse Reinforcement Learning, Silvia Sapora, Devon Hjelm, Omar Attia, Alexander Toshev, Bogdan Mazoure (https://lnkd.in/eEr--Ny2) 📔Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training, Jonathan Cook*, Silvia Sapora*, Arash Ahmadian, Akbir Khan, Tim Rocktäschel, Jakob Foerster, Laura Ruis (https://lnkd.in/dVqMyGkJ) 📔Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing, Iskander Azangulov, Teodora Pandeva, Niranjani Prasad, Javier Zazo, Sushrut Karmalkar (https://lnkd.in/eb2SF9WA) 📔Revisiting the scaling properties of downstream metrics in large language training, Jakub Karajewski, Amitis Shidani, Dan Busbridge, Sam Wiseman, Jason Ramapuram (https://lnkd.in/erkNHqjj) 📔BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design, Deepro Choudhury, Sinead Williamson, Adam Golinski, Ning Miao, Freddie Bickford Smith, Michael Kirchhof, Yizhe Zhang, Tom Rainforth (https://lnkd.in/euHtZfbT) 📔Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis, Tyler Farghly, Patrick Rebeschini, George Deligiannidis, Arnaud Doucet (https://lnkd.in/ey2ufNWE) 📔Accelerated Parallel Tempering via Neural Transports, Leo Zhang, Peter Potaptchik, Jiajun He, Yuanqi Du, Arnaud Doucet, Franciscos Vargas, Hai-Dang Dau, Saifuddin Syed (https://lnkd.in/eTAGWdHs) 📔CREPE: Controlling Diffusion with Replica Exchange, Jiajun He, Paul JEHA, Peter Potaptchik, Leo Zhang, Jose Miguel Hernandez Lobato, Yuanqi Du, Saifuddin Syed, Francisco Vargas (https://lnkd.in/e47n832C) 📔SigmaDock: Untwisting Molecular Docking With Fragment-Based SE(3) Diffusion, Alvaro Prat, Leo Zhang, Charlotte M Deane, Yee Whye Teh, Garrett Morris (https://lnkd.in/eJ62pKJJ) ____________________ #ICLR #MachineLearning #Statistics
-
Tim Rocktäschel reposted thisTim Rocktäschel reposted thisProject Genie is our experimental research prototype that lets you create and explore virtual worlds. 🌎 Powered by our Genie 3 model, it lets you generate infinite interactive environments. Here’s how it works: 🔵 Create: Design characters and worlds using text and visual prompts. 🔵 Refine: Nano Banana Pro makes an image preview that you can adjust. 🔵 Step inside: Your environment is generated in real-time as you move around. 🔵 Remix: Build on existing worlds or discover new ones in the gallery. We’re rolling this out now to Google AI Ultra subscribers in the U.S. (18+) Find out more → https://lnkd.in/eiz_Sj2GProject Genie: Experimenting with infinite, interactive worldsProject Genie: Experimenting with infinite, interactive worlds
-
Tim Rocktäschel reposted thisTim Rocktäschel reposted this🚨Final call: 2 𝐀𝐈 𝐅𝐮𝐥𝐥 𝐏𝐫𝐨𝐟𝐞𝐬𝐬𝐨𝐫𝐬𝐡𝐢𝐩 openings at HPI / University of Potsdam (Berlin region in Germany). Happy to answer questions about what makes our institute really unique! Deadline: Jan. 15 but late applications may get considered. Application details: https://lnkd.in/d8WQ_SgC
-
Tim Rocktäschel shared this"Denk ich an Deutschland in der Nacht, Dann bin ich um den Schlaf gebracht" https://lnkd.in/exv7fCz8 🤦♂️55 Millionen Euro, fünf Jahre Stillstand: Bau für Berlins größtes KI-Rechenzentrum beginnt55 Millionen Euro, fünf Jahre Stillstand: Bau für Berlins größtes KI-Rechenzentrum beginnt
-
Tim Rocktäschel reposted thisTim Rocktäschel reposted thisPlease join me, Doina Precup, KyungHyun Cho, Andrew Ng, Yoshua Bengio, Ruslan Salakhutdinov and Fernando Pereira in providing financial support for Open Review. It is one of the most important open platforms for quality AI research. We must ensure that it is well funded and can fulfill its mission. https://lnkd.in/e9M32h9g
-
Tim Rocktäschel reacted on thisTim Rocktäschel reacted on thisMy PhD student Sam Earle very successfully defended his thesis today from a committee consisting of Sebastian Risi Phillip Isola Tim Rocktäschel Eugene Vinitsky and myself. Interested? You can watch a recording of his defense talk here: https://lnkd.in/damCaRkZ
-
Tim Rocktäschel reacted on thisTim Rocktäschel reacted on thisWe were delighted to celebrate the launch of Sovereign AI alongside the Office for Investment (OfI) at the London Stock Exchange Thank you to everyone who joined us - founders, investors, and ecosystem partners - to mark the moment. Bring back the bell!🔔
-
Tim Rocktäschel reacted on thisTim Rocktäschel reacted on thiscome join us in london for our 10th The Research and Applied AI Summit (RAAIS) on 12 june '26 - frontier ai - world models - robotics/embodied ai - open-endedness - ai for science and medicine - ai in space w/200 ai researchers, engineers, startups, big cos, and academia https://raais.co/
-
Tim Rocktäschel liked thisTim Rocktäschel liked thisCongrats to Meryem Arik at Doubleword and Alistair Pullen at Cosine for being amongst the first 7 UK AI startups backed by Joséphine Kant and the team at Sovereign AI. Great to see my angel investments getting the recognition they deserve. https://lnkd.in/e_QVeT_R
-
Tim Rocktäschel reacted on thisTim Rocktäschel reacted on this100 days ago I walked into Whitehall with no idea what I'd signed up for. This week I helped launch the £500m Sovereign AI fund for the UK 🇬🇧 A week that took me from a garage at Wayve to the floor of the London Stock Exchange. Not a sentence I expected to write 100 days ago. Quick aside on the LSE: the bell is now a button. Whoever decided a button was more efficient than a bell owes British capitalism an apology. Consider this my first unofficial policy ask, Julia Hoggett. Bring the bell back. It takes a country to build the next generation of British AI companies. And we've already started. Our first equity investment is in Danyal Akarca and Jascha Achterberg at Callosum - the orchestration layer for the future of AI compute. And compute backing for six more brilliant British AI companies spanning coding agents, inference, world models, and biotech: Cosine Cursive Doubleword Odyssey Prima Mente twig Britain is building. Come build with us.
-
Tim Rocktäschel liked thisTim Rocktäschel liked thisJUST IN: One of the companies backed by Sovereign AI is outperforming OpenAI, Anthropic, Mistral AI, and DeepSeek AI! 🇬🇧 Cosine was founded by Alistair Pullen, Yang Li, and Sam Stenner and is a British sovereign AI frontier lab developing advanced models and coding agents. It's purpose-built for defence, national security and regulated industries where foreign-built AI is off the table and is consistently beating OpenAI, Anthropic, Mistral, and DeepSeek on coding benchmarks. It's now being backed by Sovereign AI which has awarded Cosine 500,000 GPU hours on Isambard-AI: one of the most powerful supercomputers in Europe. For the first time EVER, this makes it possible to build and deploy a fully sovereign AI model entirely on British soil, with no foreign dependency at any stage. This is AMAZING. People saying that £500m is too small to make a difference are missing the point. It's about access to compute and infrastructure and making the UK resilient. An amazing initiative. NICE James Wise, Joséphine Kant, Kanishka Narayan MP, Liz Kendall 👏
-
Tim Rocktäschel reacted on thisTim Rocktäschel reacted on thisLast night, we officially launched Sovereign AI with Alex Kendall at Wayve Our first investment? 💥Callosum - Danyal Akarca & Jascha Achterberg 💥 Proud to be backing Danyal, Jascha, and the team as they build one of the defining layers of next-generation AI systems. We also announced the first companies receiving compute, with a number of Right of First Refusals (ROFR) agreements in place. Prima Mente - Hannah Madan and Ravi Solanki Cosine - Yang Li, Alistair Pullen, James White Doubleword - Meryem Arik, Jamie Dborin (PhD), & Fergus Finn Cursive - Olivier Henaff , Talfan Evans, & Oliver Vikbladh Odyssey - Jeff Hawke & Oliver Cameron twig - Russ Tucker, James Allen, & Satnam Surae, PhD What do we offer? Early-stage equity investment of up to £20M. Market terms. Market speed. Fully funded access to the UK’s largest AI supercomputers, with up to 1 million GPU hours available per startup... Fast-tracked, cost-free visas to bring world-class AI talent from anywhere in the world to the companies we invest in... Strategic Assets programme providing support to create high-quality AI datasets and autonomous lab infrastructure. And that's just the start.... 💻⚡🇬🇧 Want to know more? Our new website is live in the comments... Britmaxxing....🇬🇧🚀 Liz Kendall Kanishka Narayan MP James Wise Department for Science, Innovation and Technology HM Treasury
-
Tim Rocktäschel reacted on thisTim Rocktäschel reacted on thisExciting atmosphere at the Sovereign AI launch party last night. This feels like it might actually work.
Experience & Education
-
UCL
********* ** ********** ************
-
***** * ******** ********** *** ******** *** *********** *******
******
-
****** ********
******** * ********* *********
-
***
****** ** ********** ***** ******** ******* undefined
-
-
******************** ** ******
********** *********** ** ****** ******** *******
-
View Tim’s full experience
See their title, tenure and more.
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View Tim’s full profile
-
See who you know in common
-
Get introduced
-
Contact Tim directly
Other similar profiles
Explore more posts
-
Bradley Voytek
UC San Diego • 9K followers
Great couple of weeks for the lab, and for Andrew Bender, PhD in particular! Two of Andrew's papers diving deep into the meaning of neural alpha oscillations are now published. The first, out in PNAS, is "Differential representations of spatial location by aperiodic and alpha oscillatory activity in working memory" [1]. There's been a ton of super cool work in the last decade or so showing that the amplitude and distribution of visual cortical alpha rhythms encode the contents and spatial location of information held in working memory. Our lab has shown that our ability measure neural oscillations is confounded by non-oscillatory aperiodic activity, so we were curious how much of this encoding is *really* oscillatory. Analyzing seven different EEG datasets from Ed Awh and Ed Vogel's labs, we found that, after correcting for aperiodic activity, alpha encoding actually *improved*! And that time-resolved aperiodic activity seemed more to encode the stimulus while it was on the screen. We're now looking much more deeply into this idea that sensory information that often comes into our brains is aperiodic, and encoded that way, but then somehow transforms into an oscillatory neural code for maintenance. The second paper, in the Journal of Cognitive Neuroscience, is "Resting-state Alpha and Mu Rhythms Change Shape across Development But Lack Diagnostic Sensitivity for Attention-Deficit/Hyperactivity Disorder and Autism" [2]. LinkedIn's word limits keep me from being able to dive too deeply into this one but WOW did Andrew and Natalie Schaworonkow do an incredible amount of beautiful, technically sophisticated work. They analyzed EEG data from thousands of children, classified alpha band activity either as visual cortical alpha or sensorimotor mu based on their spatial topography, and then quantified the fine-scale waveform features of both. (By the way, the pre-prints of both these papers are on freely available on bioRxiv if you can't access the published versions.) [1] https://lnkd.in/eN9rxdZZ [2] https://lnkd.in/eNCjyQYh
97
2 Comments -
Lionel Briand
University of Ottawa • 5K followers
Automated software unit testing (including oracles) using a collaborative, multi-agent architecture based on Large Language Models "Hallucination to Consensus: Multi-Agent LLMs for End-to-End JUnit Test Generation", Qinghua Xu, Guancheng Wang, Lionel Briand, Kui Liu This paper, led by Qinghua Xu in my team at the Lero Centre, was accepted for publication in ACM Transactions on Software Engineering and Methodology. The project was supported by Huawei. All code and data are publicly available. We propose CANDOR, a novel end-to-end, prompt engineering-based LLM framework for automated unit test generation in Java. CANDOR orchestrates multiple specialized LLM agents to collaboratively generate complete JUnit tests, including both high-quality test prefixes and accurate oracles. To mitigate the notorious hallucinations in LLMs and improve oracle correctness, we introduce a novel strategy that engages multiple reasoning LLMs in a panel discussion and generates accurate oracles based on consensus. Preprint: https://lnkd.in/enu5adY3
67
3 Comments -
Uday Khankhoje
Indian Institute of… • 812 followers
Reconfigurable intelligent surfaces (RIS/IRS) are likely to be key enablers of enhanced wireless communications going forward. A key problem, of determining the "coding patterns" of the RIS elements was solved by us in a recent paper https://lnkd.in/d6HPfu4n. Today, we're happy to release the source codes for our algorithms for the community to build on: https://lnkd.in/d3GHihm8 [With Radha Krishna Ganti and Sai Sanjay Narayanan]
127
1 Comment -
Ashish Kulkarni
eBay • 5K followers
🚀 Really excited to see this go out. Over the past several months, our focus has been on building not just models, but the infrastructure layer required to make LLMs truly work in the Indian context—across training, alignment, and evaluation. With the open-sourcing of ForgeLM and IndicEvalHarness, we’re sharing two important pieces of that stack: 🔧 ForgeLM: a practical, scalable framework for post-training and alignment (SFT, DPO, GRPO) 📊 IndicEvalHarness: a comprehensive evaluation suite for Indic languages spanning reasoning, generation, and domain benchmarks This also builds on a broader set of contributions we’ve been releasing: 🧠 Foundational models (text, vision-language, speech) https://lnkd.in/gM7Un7KE 🗂️ Synthetic pretraining data — BhashaKritika https://lnkd.in/gepQBNEf 📄 Indic post-training data — Pragyaan https://lnkd.in/gK-PsBee 🧪 Indic & multimodal benchmarks — IndicVisionBench, VoiceAgentBench, and others https://lnkd.in/gY8G_A2S Taken together, these represent our attempt to build a full-stack ecosystem for Indic AI spanning data, models, alignment, and evaluation. 🌏 We believe progress in AI, especially for diverse and multilingual ecosystems like India, requires strong shared infrastructure. Open-sourcing is a step in that direction. Would love to see how the community builds on top of this. 🤝 #krutrim #LLM #AI #MachineLearning #GenerativeAI #OpenSource #NLP #MultilingualAI #IndicAI #DeepLearning #AIResearch #Alignment #AIInfrastructure
15
-
Juan Moreno-Cruz
University of Waterloo • 3K followers
This whole series is incredibly useful and accessible. But this idea of independence errors of hallucination across languages, which I found hidden in the Referee 2 personality that Scott wrote, is one of the coolest things I've seen implemented with Claude (and it should be implemented for other LLMs, too). This post is very cool because it formalizes the idea. I am definitely doing this all the time now. I think it is one of the most useful things I've done in my work with Claude. Thanks, Scott!
57
1 Comment -
Giancarlo Sperlì
Università degli Studi di… • 3K followers
If you are interested in a human-centered analysis investigating the acceptability of LLM tools in code translation, take a look at this paper! This work was done in collaboration with Anna Rita Fasolino Andrea Vignali Gabriele Dario De Siano. #LLMs #HumanCenteredAI #codetranslation
18
-
Sophia Drossopoulou
Imperial College London • 474 followers
Excited that our paper Reasoning about External Calls was accepted for OOPSLA 2025. Here the current version https://lnkd.in/eFPyJj39. Abstract: In today's complex software, internal, trusted code is tightly intertwined with external untrusted code. To reason about internal code, programmers must reason about the potential effects of calls to external code, even though that code is not trusted and may not even be available. The effects of external calls can be limited, if internal code is programmed defensively, limiting potential effects by limiting access to the capabilities necessary to cause those effects. This paper addresses the specification and verification of internal code that relies on encapsulation and object capabilities to limit the effects of external calls. We propose new assertions for access to capabilities, new specifications for limiting effects, and a Hoare logic to verify that a module satisfies its specification, even while making external calls. We illustrate the approach though a running example with mechanised proofs, and prove soundness of the Hoare logic.
75
4 Comments -
Ganesh Venkatesh
Waymo • 2K followers
🚀 Exciting News! Our New Paper on helping Multimodal LLMs see Beyond Language Prior got accepted to CVPR Workshop on Visual Concepts! 🚀 Ever wonder why Multimodal LLMs (MLLMs), despite seeing images, sometimes miss the bigger picture or rely too much on text? While text-only LLMs get rich feedback from every token, MLLMs often face a "sparse feedback" problem. They struggle to learn about image concepts not explicitly mentioned in text descriptions and can default to just predicting text based on language patterns, rather than truly seeing. 🔥 Our latest research tackles this head-on! We've developed novel training strategies that: 1️⃣ Deepen Visual Understanding using Visual Loss: Teach the MLLM to build a much richer internal representation of ALL visual concepts in an image. 2️⃣ Boost Visual Attention using Blank Tokens: Encourage the model to pay significantly more attention to what it sees by subtly weakening its over-reliance on predicting just from previous text tokens. 💡 The Impact? We're seeing strong performance improvements on demanding visual tasks in both upstream (core understanding) and downstream (application-level) settings! 🏆 Key Achievement: Our approach enables our Llama 3.1 8B based Llava model – which is smaller, uses a simpler model architecture, and operates on lower-resolution visual inputs – to MATCH the performance of the much larger Llama 3.2 11B on challenging visual reasoning benchmarks! 🤯 This shows the power of smarter training recipe! This is a crucial first step towards MLLMs that are more visually intelligent, reliable, and truly understand the world around them. Please stay tuned for our upcoming updates on model architecture advancements more conducive to capturing visual and language concepts as well as post training MLLMs to reason through tricky questions. 👉 Dive into the details! Read the full paper here: https://lnkd.in/gZcbw5yC A huge thank you to my incredible co-authors Aarti Ghatkesar Uddeshya Upadhyay and everyone who supported this journey! #MultimodalLLM #MLLM #VisualGrounding #VisCon
69
2 Comments -
Paolo Ceravolo
Università degli Studi di… • 1K followers
The topic of the latent representation of event logs is interesting because it combines aspects related to individual events, inter-case (a sequence of events) and intra-case. Florence Wong has devised an effective strategy that combines semantic completeness and architectural simplicity. GitHub: https://lnkd.in/dY33F3qy arXiv paper: https://lnkd.in/dkpPWzJ2
6
-
HGPU group
HGPU group • 270 followers
Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation Large language models (LLMs) have shown remarkable capabilities in code translation, yet their performance deteriorates in low-resource programming domains such as Fortran and emerging frameworks like CUDA, where high-quality parallel data are scarce. We present an automated dataset generation pipeline featuring a dual-LLM Questioner-Solver design that incorporates external knowledge from compilers and runtime feedback. Beyond traditional source-target code pair datasets, our approach additionally generates (1) verified translations with unit tests for assessing functional consistency, and (2) multi-turn dialogues that capture the reasoning process behind translation refinement....
2
-
Furu Wei
Microsoft Research Asia • 13K followers
Introducing Generative Adversarial Distillation (GAD): a novel GAN-style formulation and framework that facilitates both on-policy and black-box distillation of large language models (LLMs). GAD is the first technique to enable block-box on-policy distillation from proprietary teachers where internal logits or parameters are inaccessible, or distillation between teacher and student LLMs with incompatible vocabularies. GAD expands our prior work on white-box on-policy distillation (i.e., MiniLLM), pioneering block-box on-policy distillation for LLM training. Specifically, GAD frames the student LLM as a generator and trains a discriminator to distinguish its responses from the teacher LLM’s, creating a minimax game. The discriminator acts as an on-policy reward model that co-evolves with the student, providing stable, adaptive feedback. Experimental results show that GAD consistently surpasses the commonly used sequence-level knowledge distillation. In particular, Qwen2.5-14B-Instruct (student) trained with GAD becomes comparable to its teacher, GPT-5-Chat, on the LMSYS-Chat automatic evaluation. The results establish GAD as a promising and effective paradigm for black-box LLM distillation. Our team has been conducting fundamental research in knowledge distillation with wide adoptions across the industry. - MiniLM: We introduced multi-head attention distillation, establishing the most effective distillation method for BERT-style models. The open-source MiniLM models (e.g., 6x384) have become the most widely utilized small encoder models on the Hugging Face. - MiniLLM: Our proposed Reverse KLD is recognized as one of the most effective, de facto on-policy distillation approaches for modern LLM training, which has been widely used by Thinking Machines, Gemma, and many other teams and models. - BitDistill: We proposed BitNet Distillation to finetune off-the-shelf full-precision LLMs (e.g., Qwen) into 1.58-bit precision (ternary weights {-1, 0, 1}), achieving performance parity with the full-precision counterparts on specific downstream tasks. - GAD: The development of Generative Adversarial Distillation (GAD) now allows for black-box on-policy distillation, overcoming two major prior limitations: (1) Distillation from proprietary teachers where internal logits or parameters are inaccessible; (2) Distillation between teacher and student LLMs with incompatible vocabularies. https://lnkd.in/gMaP2c7w
125
2 Comments -
José Pretel
University of Victoria • 233 followers
I am very excited to share our latest publication on the arXiv! The ATLAS collaboration has just released our latest measurements of production cross-sections of W-boson pairs 🥳🎉 This work presents precise measurements of WW production in proton-proton collisions at 13 TeV, using 140 inverse fb of ATLAS data collected between 2015 and 2018. Huge efforts were devoted to derive precise data-driven background estimations, enabling a measurement in a fully jet-inclusive phase space. We achieve a fiducial cross-section measurement with just 3.1% uncertainty, the most precise ever performed in a hadron collider to date. Differential cross-section measurements show excellent agreement with the state-of-the-art theoretical predictions at the same level of precision, providing important tests of the strong and electroweak sectors of the Standard Model. These have been used to constrain anomalous interactions in the framework of the Standard Model effective field theory. Check out the full pre-print for details: https://lnkd.in/e69Bhg6k Proud to contribute to this endeavor within the ATLAS collaboration and so grateful to all colleagues involved! #ATLAS #CERN #LHC #HighEnergyPhysics #arXiv #ParticlePhysics #WW #StandardModel #SM #Electroweak #Diboson
16
-
Qing Q.
University of Michigan • 4K followers
We recently arXived a paper exploring the out-of-distribution generalization of In-Context Learning (ICL) in Transformers, offering new geometric insights. Focusing on linear regression tasks where task vectors lie in a union of low-dimensional subspaces, we rigorously prove: (i) Transformers fail to generalize under subspace shifts, incurring error proportional to r∗sin(θ), where θ is the angle between subspaces; (ii) ICL generalizes well to their linear span when trained on a union of subspaces, with required prompt length governed by the intrinsic dimension; (iii) this generalization gap in (i) can be effectively closed via Low-Rank Adaptation (LoRA) fine-tuning. Empirically, we demonstrated them through experiments on nonlinear Transformers (GPT-2 models) and nonlinear tasks beyond the simplified setting we studied theoretically. Paper link: https://lnkd.in/gm2zVXMB
52
-
Jim Caton
North Dakota State University • 857 followers
If you need custom output or evaluation of a series of prompts, questions, etc..., you can build an adjacent program to evaluate each of these one at a time, using a locally or remotely run LLM (for example, using ollama). Pass the prompt to the LLM with reference material. After creating the RAG informed instance, you can use that instance to generate responses for the full list of questions. The best approach is simply to pass one question at a time to the LLM and have it generate a response/answer to that specific question.
-
TriBix
569 followers
Recursive Language Models https://lnkd.in/diPEteqY We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs and common long-context scaffolds across four diverse long context tasks, while having comparable (or cheaper) cost per query.
1
-
Vinicius Mikuni
Nagoya University • 697 followers
New paper! Symmetries often provide strong inductive biases in the physical sciences, encoding much of what we understand about nature. However, many known symmetries are not fully realized in real, messy, experimental data. Can we steer AI models to understand symmetries without fully enforcing them? Yes! In this new paper with Inbar Savoray, Pradyun Hebbar, Thandi Madula, Benjamin Nachman, and Nadav Outmezguine, we investigate different methods of penalizing models that are not fully symmetric under transformations of the Lorentz group, resulting in models that are more robust than a generic architecture, but more flexible than completely invariant implementations. Our proposed methods of Symmetry EncourAging Losses (SEAL) is simple to implement and don't require any changes to the model architecture! Check out the details of the paper here: https://lnkd.in/g2EHhGxD
64
1 Comment -
Joseph Vantassel
Virginia Tech • 2K followers
New release of the open-source Python package hvsrpy, for horizontal-to-vertical spectral ratio (HVSR) processing! 🎉 hvsrpy v2.1.0 makes reading data more robust, add two new interactive examples, and improves overall code quality. More details on the project's GitHub: https://lnkd.in/er9BVPpm #opensource #software #python #hvsr #hv
121
4 Comments -
Victor Fung
Lila Sciences • 2K followers
Atomistic foundation models are great as interatomic potentials: they have remarkable generalizability and excellent data efficiency and accuracy when fine-tuned compared to bespoke models. However they are also significantly more memory intensive and slower when compared to bespoke models. To address this limitation, in our latest preprint we show how we can use a graph partitioning strategy together with model pruning to enable large scale and parallelizable simulations of millions of atoms with foundation models. Pruning is a critical step in reducing model size as well as reducing the computational overhead of evaluating duplicated nodes in the partitioned subgraphs. What's also neat about this approach is that we can run million atom-plus simulations even when there is only a single GPU available. We have integrated this method within the MatterTune package to allow users to fine-tune AFMs and run large scale simulations in one place. Made possible by the excellent team at Georgia Tech. Links below: Preprint: https://lnkd.in/etSHjyCe Code: https://lnkd.in/epZ3xzFQ
160
2 Comments
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More