FriendliAI Partners with NVIDIA on Nemotron 3 for Agentic AI Inference

Redwood City, CA – FriendliAI, an AI inference platform company, announced a partnership with NVIDIA to launch the Nemotron 3 model family, available on FriendliAI’s Dedicated Endpoints. Developers can deploy Nemotron 3 models on FriendliAI’s inference platform. Highlights include: Up to 13× faster token generation via hybrid Mamba-Transformer MoE architecture and multi-token prediction (MTP) technique MoE routing […]

d-Matrix and Andes Collaborate on RISC-V Accelerator for AI Inference

ST. LOUIS (SC25) — Nov 17, 2025 – Generative AI inference compute company d-Matrix and Andes Technology , a supplier of RISC-V processor cores, announced that d-Matrix has selected the AndesCore AX46MPV for its next-generation Raptor inference architecture. The companies said the collaboration represents a convergence of memory-centric computing and open-standard processor innovation for AI workloads […]

NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Meta’s Llama 4 Maverick

NVIDIA said it has achieved a record large language model (LLM) inference speed, announcing that an NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs achieved more than 1,000 tokens per second (TPS) per user on the 400-billion-parameter Llama 4 Maverick model. NVIDIA said the model is the largest and most powerful in the Llama 4 […]

NeuReality Announces Inference Appliance Is Preloaded with AI Models

Caesarea, Israel – May 14, 2025 – NeuRealityannounced that its NR1 Inference Appliance now comes preloaded with enterprise AI models, including Llama, Mistral, Qwen, Granite1, plus support for private generative AI clouds and on premise clusters. The company said the appliance is up and running in under 30 minutes and “delivers 3x better time-to-value, allowing customers […]

Rafay Launches Serverless Inference Offering

Sunnyvale, CA – May 8, 2025 – Rafay Systems, a cloud-native and AI infrastructure orchestration and management company, announced general availability of the company’s Serverless Inference offering, a token-metered API for running open-source and privately trained or tuned LLMs. The company said many NVIDIA Cloud Providers (NCPs) and GPU Clouds are already leveraging the Rafay […]

AI Inference: Meta Teams with Cerebras on Llama API

Meta has teamed with Cerebras on AI inference in Meta’s new Llama API, combining  Meta’s open-source Llama models with inference technology from Cerebras. Developers building on the Llama 4 Cerebras model in the API can expect speeds up to 18 times faster than traditional GPU-based solutions ….

AI Inference: Meta Collaborates with Cerebras on Llama API

Sunnyvale, CA — Meta has teamed with Cerebras on AI inference in Meta’s new Llama API, combining  Meta’s open-source Llama models with inference technology from Cerebras. Developers building on the Llama 4 Cerebras model in the API can expect speeds up to 18 times faster than traditional GPU-based solutions, according to Cerebras. “This acceleration unlocks […]

GigaIO and d-Matrix to Build Inference Platform for Enterprise AI

CARLSBAD, Calif.– Edge-to-core AI platform company GigaIO today announced the next phase of its partnership with d-Matrix to deliver an inference solution for enterprises deploying AI at scale. Integrating d-Matrix’s Corsair inference platform into GigaIO’s SuperNODE architecture creates a solution designed to eliminate “the complexity and performance bottlenecks traditionally associated with large-scale AI inference deployment.” […]

MLCommons Releases MLPerf Inference v5.0 Benchmark Results

Today, MLCommons announced new results for its MLPerf Inference v5.0 benchmark suite, which delivers machine learning (ML) system performance benchmarking. The rorganization said the esults highlight that the AI community is focusing on generative AI ….

Blaize Received Approval to List its Common Stock and Warrants on Nasdaq

WASHINGTON & EL DORADO HILLS, Calif., Jan 13, 2024 – Blaize, Inc., a provider of artificial intelligence-enabled edge computing solutions, and acquisition company BurTech today announced that they expect to complete their previously announced business combination on January 12, 2025. The combined company will be named “Blaize Holdings, Inc.” and its common stock and warrants […]