inference Archives - Inside HPC & AI News | High-Performance Computing & Artificial Intelligence

FriendliAI Partners with NVIDIA on Nemotron 3 for Agentic AI Inference

December 19, 2025 by staff

Redwood City, CA – FriendliAI, an AI inference platform company, announced a partnership with NVIDIA to launch the Nemotron 3 model family, available on FriendliAI’s Dedicated Endpoints. Developers can deploy Nemotron 3 models on FriendliAI’s inference platform. Highlights include: Up to 13× faster token generation via hybrid Mamba-Transformer MoE architecture and multi-token prediction (MTP) technique MoE routing […]

Filed Under: AI News, HPC-AI Hardware, HPC-AI Software, News Tagged With: AI, AI inference, FriendliAI, inference, nvidia

d-Matrix and Andes Collaborate on RISC-V Accelerator for AI Inference

November 17, 2025 by staff

ST. LOUIS (SC25) — Nov 17, 2025 – Generative AI inference compute company d-Matrix and Andes Technology , a supplier of RISC-V processor cores, announced that d-Matrix has selected the AndesCore AX46MPV for its next-generation Raptor inference architecture. The companies said the collaboration represents a convergence of memory-centric computing and open-standard processor innovation for AI workloads […]

Filed Under: AI News, HPC-AI Hardware, News Tagged With: AI inference, Andes Technology, d-Matrix, inference, RISC-V

NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Meta’s Llama 4 Maverick

May 23, 2025 by staff

NVIDIA said it has achieved a record large language model (LLM) inference speed, announcing that an NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs achieved more than 1,000 tokens per second (TPS) per user on the 400-billion-parameter Llama 4 Maverick model. NVIDIA said the model is the largest and most powerful in the Llama 4 […]

Filed Under: Compute, CPUs, GPUs, FPGAs, Machine Learning, News Tagged With: AI, AI compute, AI inference, Blackwell, inference, nvidia, tokens per second, TPS

NeuReality Announces Inference Appliance Is Preloaded with AI Models

May 16, 2025 by staff

Caesarea, Israel – May 14, 2025 – NeuRealityannounced that its NR1 Inference Appliance now comes preloaded with enterprise AI models, including Llama, Mistral, Qwen, Granite1, plus support for private generative AI clouds and on premise clusters. The company said the appliance is up and running in under 30 minutes and “delivers 3x better time-to-value, allowing customers […]

Filed Under: Compute, Machine Learning, News Tagged With: AI, AI inference, artificial intelligence, inference, NeuReality, NR1 Inference Appliance

Rafay Launches Serverless Inference Offering

May 12, 2025 by staff

Sunnyvale, CA – May 8, 2025 – Rafay Systems, a cloud-native and AI infrastructure orchestration and management company, announced general availability of the company’s Serverless Inference offering, a token-metered API for running open-source and privately trained or tuned LLMs. The company said many NVIDIA Cloud Providers (NCPs) and GPU Clouds are already leveraging the Rafay […]

Filed Under: AI News Tagged With: inference, Rafay, serverless computing, serverless inference

AI Inference: Meta Teams with Cerebras on Llama API

May 2, 2025 by staff

Meta has teamed with Cerebras on AI inference in Meta’s new Llama API, combining Meta’s open-source Llama models with inference technology from Cerebras. Developers building on the Llama 4 Cerebras model in the API can expect speeds up to 18 times faster than traditional GPU-based solutions ….

Filed Under: AI News Tagged With: AI inference, Cerebras, inference, Meta

AI Inference: Meta Collaborates with Cerebras on Llama API

May 2, 2025 by staff

Sunnyvale, CA — Meta has teamed with Cerebras on AI inference in Meta’s new Llama API, combining Meta’s open-source Llama models with inference technology from Cerebras. Developers building on the Llama 4 Cerebras model in the API can expect speeds up to 18 times faster than traditional GPU-based solutions, according to Cerebras. “This acceleration unlocks […]

Filed Under: Compute, CPUs, GPUs, FPGAs, Machine Learning, News Tagged With: AI compute, AI inference, Cerebras, inference, Meta

GigaIO and d-Matrix to Build Inference Platform for Enterprise AI

May 1, 2025 by staff

CARLSBAD, Calif.– Edge-to-core AI platform company GigaIO today announced the next phase of its partnership with d-Matrix to deliver an inference solution for enterprises deploying AI at scale. Integrating d-Matrix’s Corsair inference platform into GigaIO’s SuperNODE architecture creates a solution designed to eliminate “the complexity and performance bottlenecks traditionally associated with large-scale AI inference deployment.” […]

Filed Under: Machine Learning, News Tagged With: d-Matrix, enterprise AI, GigaIO, inference

MLCommons Releases MLPerf Inference v5.0 Benchmark Results

April 2, 2025 by staff

Today, MLCommons announced new results for its MLPerf Inference v5.0 benchmark suite, which delivers machine learning (ML) system performance benchmarking. The rorganization said the esults highlight that the AI community is focusing on generative AI ….

Filed Under: AI News Tagged With: AI, AI inference, artificial intelligence, inference, MLCommons, MLPerf

Blaize Received Approval to List its Common Stock and Warrants on Nasdaq

December 28, 2024 by Doug Black

WASHINGTON & EL DORADO HILLS, Calif., Jan 13, 2024 – Blaize, Inc., a provider of artificial intelligence-enabled edge computing solutions, and acquisition company BurTech today announced that they expect to complete their previously announced business combination on January 12, 2025. The combined company will be named “Blaize Holdings, Inc.” and its common stock and warrants […]

Filed Under: AI News, Main Feature, News Tagged With: AI, AI inference, artificial intelligence, Blaize, Edge AI, edge artificial intelligence, inference, Weekly Newsletter Articles

FriendliAI Partners with NVIDIA on Nemotron 3 for Agentic AI Inference

d-Matrix and Andes Collaborate on RISC-V Accelerator for AI Inference

NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Meta’s Llama 4 Maverick

NeuReality Announces Inference Appliance Is Preloaded with AI Models

Rafay Launches Serverless Inference Offering

AI Inference: Meta Teams with Cerebras on Llama API

AI Inference: Meta Collaborates with Cerebras on Llama API

GigaIO and d-Matrix to Build Inference Platform for Enterprise AI

MLCommons Releases MLPerf Inference v5.0 Benchmark Results

Blaize Received Approval to List its Common Stock and Warrants on Nasdaq

Sponsored Guest Articles

Unleashing Power: NVIDIA L40S Data Center GPU by PNY

White Papers

insideAI News Guide to How Data Analytics is Transforming Healthcare

More News from insideAI News