Apache Spark Archives - Inside HPC & AI News | High-Performance Computing & Artificial Intelligence

Databricks Announces Major Contributions to Flagship Open Source Projects

July 2, 2022 by staff

Databricks announced that the company will contribute all features and enhancements it has made to Delta Lake to the Linux Foundation and open source all Delta Lake APIs as part of the Delta Lake 2.0 release. In addition, the company announced MLflow 2.0, which includes MLflow Pipelines, a new feature to accelerate and simplify ML model deployments. Finally, the company introduced Spark Connect, to enable the use of Spark on virtually any device, and Project Lightspeed, a next generation Spark Structured Streaming engine for data streaming on the lakehouse.

Filed Under: AI News, Google News Feed, Machine Learning, Main Feature, News, Uncategorized Tagged With: Apache Spark, Databricks, lakehouse, MLflow, Weekly Newsletter Articles

GigaOm Radar for Evaluating Data Warehouse Platforms

July 6, 2020 by staff

This new GigaOm Radar Report “GigaOm Radar for Evaluating Data Warehouse Platforms” provided by our friends over at Vertica, examines the leading platforms in the data warehouse marketplace, describes the fundamentals of the technology, identifies key criteria and evaluation metrics by which organizations can evaluate competing platforms, describes some potential technology developments to look out for in the future, and classifies platforms across those criteria and metrics.

Filed Under: Business of HPC, Data Center, Enterprise HPC, Featured, Google News Feed, HPC-AI Hardware, HPC-AI Software, Industry Segments, News, Sponsored Post, Storage, Uncategorized, White Papers Tagged With: AI, Apache Spark, data warehouse, Hadoop, Vertica, Weekly featured Newsletter Articles, Weekly Featured Newsletter Post

GigaOm Radar for Evaluating Data Warehouse Platforms

July 1, 2020 by DO NOT USE Leave a Comment

This new GigaOm Radar Report provided by our friends over at Vertica, examines the leading platforms in the data warehouse marketplace, describes the fundamentals of the technology, identifies key criteria and evaluation metrics by which organizations can evaluate competing platforms, describes some potential technology developments to look out for in the future, and classifies platforms across those criteria and metrics.

Tagged With: Apache Spark, Data Lake, data warehouse, Hadoop, Spark, Vertica

StreamSets Launches StreamSets Transformer

September 15, 2019 by staff

StreamSets, Inc., provider of the DataOps platform for modern data integration, released StreamSets® Transformer, a simple-to-use, drag-and-drop UI tool to create native Apache Spark applications. Designed for a wide range of users — even those without specialized skills — StreamSets Transformer enables the creation of pipelines for performing ETL, stream processing and machine-learning operations. Now, data engineers, scientists, architects and operators gain deep visibility into the execution of Apache Spark while broadening usage across the business.

Filed Under: AI News, Google News Feed, Main Feature, News, Uncategorized Tagged With: Apache Spark, Weekly Newsletter Articles

Podcast: HPC & AI Convergence Enables AI Workload Innovation

August 25, 2019 by Doug Black

In this Conversations in the Cloud podcast, Esther Baldwin from Intel describes how the convergence of HPC and AI is driving innovation. “On the topic of HPC & AI converged clusters, there’s a perception that if you want to do AI, you must stand up a separate cluster, which Esther notes is not true. Existing HPC customers can do AI on their existing infrastructure with solutions like HPC & AI converged clusters.”

Filed Under: Compute, Enterprise HPC, HPC-AI Hardware, HPC-AI Software, Industry Segments, Machine Learning, News, Podcast, Research / Education, Resources Tagged With: AI, Apache Spark, BigDL, HPC AI convergence, Inferencing, Intel, Intel Select Solutions, Slurm

Accelerate Your Apache Spark with Intel Optane DC Persistent Memory

July 28, 2019 by Doug Black

Piotr Balcer and Cheng Xu from Intel gave this talk at the 2019 Spark+AI Summit. “Intel Optane DC persistent memory breaks the traditional memory/storage hierarchy and scales up the computing server with higher capacity persistent memory. Also it brings higher bandwidth & lower latency than storage like SSD or HDD. And Apache Spark is widely used in the analytics like SQL and Machine Learning on the cloud environment.”

Filed Under: Enterprise HPC, Events, High Performance Analytics, HPC-AI Hardware, HPC-AI Software, Industry Segments, Machine Learning, Main Feature, News, Research / Education, Resources, Storage, Videos Tagged With: AI, Apache Spark, big data, Intel, Intel Optane DC Persistent Memory, Weekly Newsletter Articles

NEC Embraces Open Source Frameworks for SX-Aurora Vector Computing

July 10, 2019 by staff

In this video from ISC 2019, Dr. Erich Focht from NEC Deutschland GmbH describes how the company is embracing open source frameworks for the SX-Aurora TSUBASA Vector Supercomputer. “Until now, with the existing server processing capabilities, developing complex models on graphical information for AI has consumed significant time and host processor cycles. NEC Laboratories has developed the open-source Frovedis framework over the last 10 years, initially for parallel processing in Supercomputers. Now, its efficiencies have been brought to the scalable SX-Aurora vector processor.”

Filed Under: Compute, Enterprise HPC, Events, High Performance Analytics, HPC-AI Hardware, HPC-AI Software, Industry Segments, Machine Learning, Main Feature, News, Research / Education, Resources, Videos Tagged With: AI, Apache Spark, Frovedis, NEC, NEC SX-Auroroa, NEC-X, Vector computing, Weekly Newsletter Articles

Deep Learning Open Source Framework Optimized on Apache Spark*

July 9, 2018 by Richard Friedman

Intel recently released BigDL. It’s an open source, highly optimized, distributed, deep learning framework for Apache Spark*. It makes Hadoop/Spark into a unified platform for data storage, data processing and mining, feature engineering, traditional machine learning, and deep learning workloads, resulting in better economy of scale, higher resource utilization, ease of use/development, and better TCO.

Filed Under: HPC-AI Software, Machine Learning, Parallel Programming, Sponsored Post Tagged With: Apache Spark, Deep Learning, Intel BigDL, intel mkl, Intel TEC, Weekly Newsletter Articles

State of the Art Natural Language Processing at Scale

July 5, 2018 by staff

The two part presentation below from the Spark+AI Summit 2018 is a deep dive into key design choices made in the NLP library for Apache Spark. The library natively extends the Spark ML pipeline API’s which enables zero-copy, distributed, combined NLP, ML & DL pipelines, leveraging all of Spark’s built-in optimizations.

Filed Under: AI News, Featured, Google News Feed, Machine Learning, News, Uncategorized, Videos Tagged With: Apache Spark, NLP, Weekly Newsletter Articles

Databricks Partners with RStudio To Increase Productivity of Data Science Teams

June 29, 2018 by staff

Databricks, a leader in unified analytics and founded by the original creators of Apache Spark™, announced a partnership with RStudio, providers of a free and open-source integrated development environment for R, to increase the productivity of data science teams. The partnership will allow the two companies to seamlessly integrate Databricks’ Unified Analytics Platform with the RStudio Server, simplifying R programming on big data.

Filed Under: AI News, Google News Feed, Main Feature, News, Uncategorized Tagged With: Apache Spark, RStudio, Weekly Newsletter Articles

Databricks Announces Major Contributions to Flagship Open Source Projects

GigaOm Radar for Evaluating Data Warehouse Platforms

StreamSets Launches StreamSets Transformer

Podcast: HPC & AI Convergence Enables AI Workload Innovation

Accelerate Your Apache Spark with Intel Optane DC Persistent Memory

NEC Embraces Open Source Frameworks for SX-Aurora Vector Computing

Deep Learning Open Source Framework Optimized on Apache Spark*

State of the Art Natural Language Processing at Scale

Databricks Partners with RStudio To Increase Productivity of Data Science Teams

Sponsored Guest Articles

Accelerating Breakthroughs in Higher Education & Research with NVIDIA RTX PRO™ 6000 Blackwell Server Edition

White Papers

The Graphcore Second Generation IPU

More News from insideAI News