Image

LineShine Is Fastest Supercomputer At Over 2 Exaflops

There is a phenomenon where as you get older, your sense of scale becomes somewhat fixed in the earlier era that shaped you– things like expecting the Dollar Store to carry items for 1$, or to get a burger and fries for less than twenty bucks– or, in this case, thinking of supercomputers as being petaflop-scale machines. That’s not wrong, per se– most of the world’s fastest machines benchmarks are best measured in petaflops– but when you’re clocking at 2198 of the things, it becomes easier just to say that the LineShine computer can do 2.188 exaflops. At double precision. With CPUs only. Yes, we are impressed.

Even more impressive is that this machine just debuted in China, which means it was built without the benefit of the latest-and-greatest Western chips, thanks to US sanctions. It’s using a made-in-China LX2 CPU with 304 ARMv9 cores onboard. Well, it’s actually using around 46 thousand of them, but who’s counting?

Each CPU actually consists of two separate compute dies and onboard high bandwith memory (HBM) and DRAM– 4GB of HBM and 32GB of DDR5. The 152 ARMv9 CPU cores on each chip are all built with Scalable Vector Extensions (SVE) and Scalable Matrix Extensions (SME), so despite the lack of GPUs LineShine will have no problem doing the sorts of vector processing that is traditional for high-performance computing, given the 13.79 million cores.

On the other hand, the lack of GPUs shows when you change benchmarks– LineShine is number one in the rankings for High Performance Linpack (HPL), but getting outside the 64-bit box, the supercomputer only hits number four on the HPL-MxP mixed-precision benchmark, behind machines that pair their CPUs with accelerators like GPUs or NPUs. That may mollify the American ego, as while their El Capitain was bumped to second place on the HPL list, they can still claim the pole position on HPL-MxP. Which computer is actually more capable depends entirely on what you want to do with it, and neither Lawrence Livermore National Laboratory nor China’s National Supercomputing Centre in Shenzhen advertise their compute queues, though this paper suggests at least one job will be crunching earth observation data.

The definition of a supercomputer has shifted over time, and it’s only a matter of time before LineShine and El Capitain end up on the auction block, like other supercomputers before them. We might question it when it comes to desktops, but for institutional HPC, no amount of computing ever seems to be enough.

Image

The Challenges Of Simulating A Human Brain On A Supercomputer

It’s quite the understatement to say that at this point in time we don’t quite understand how even the tiniest brain works exactly. Much of this is due to the sheer complexity and scale of these little biological marvels: with the human brain packing billions of neurons and their associated supportive scaffolding into a few kilograms of gooey pink-white mass, the sheer connectivity density is more than we can reasonably hope to measure in-situ. Ergo attempts to recreate digital simulations of small sections of such brains, a process that’s making gradual progress.

Most recently we have been doing mapping of neurons and their connections in the brain of the humble fruitflyD. melanogaster. Despite their brains being minuscule, with only about 140,000 neurons and 50 million connections, we’re not quite at the level where we can have a simulated fruitfly brain spark to life. This should probably give us some hints as to the sheer complexity of mapping the human brain, never mind simulating even a small part like a cubic millimeter of the temporal cortex with about 57,000 cells and 150 million synapses.

Even once you have all the connectome data of such a bit of brain, it’s not like you can just toss it onto a supercomputer and expect a meaningful simulation. All supercomputers today are massively parallel, meaning thousands of networked computers that require the computing task to be split up and all communication between nodes restricted as much as possible to not starve nodes.

Continue reading “The Challenges Of Simulating A Human Brain On A Supercomputer”

Image

NextSilicon’s Maverick-2: The Future Of High-Performance Computing?

A few months back, Sandia National Laboratories announced they had acquired a new supercomputer. It wasn’t the biggest, but it still offered in their eyes something unique. This particular supercomputer contains NextSilicon’s much-hyped Maverick-2 ‘dataflow accelerator’ chips. Targeting the high-performance computing (HPC) market, these chips are claimed to hold a 10x advantage over the best GPU designs.

NextSilicon Maverick-2 OAM-2 module. (Credit: NextSilicon)
NextSilicon Maverick-2 OAM-2 module. (Credit: NextSilicon)

The strategy here appears to be somewhat of a mixture between VLIW, FPGAs and Sony’s Cell architecture, with a dedicated compiler that determines the best mapping of a particular calculation across the compute elements inside the chip. Naturally, the exact details about the internals are a closely held secret by NextSilicon and its partners (like Sandia), so we basically have only the public claims and PR material to go by.

Last year The Register covered this architecture along with a more in-depth look. What we can surmise from this is that it should perform pretty well for just about all applications, except for single-threaded performance. Of course, as a dedicated processor it cannot do CPU things, which is where NextSilicon’s less spectacular RISC-V-based CPU comes into the picture.

What’s apparent from glancing at the product renders on the NextSilicon site is that these Maverick-2 chips have absolutely massive dies, so they’re absolutely not cheap to manufacture. Whether they’ll make more of a splash than Intel’s Itanium or NVIDIA’s brute force remains to be seen.

Image

A Gentle Introduction To Fortran

Originally known as FORTRAN, but written in lower case since the 1990s with Fortran 90, this language was developed initially by John Backus as a way to make writing programs for the IBM 704 mainframe easier. The 704 was a 1954 mainframe with the honor of being the first mass-produced computer that supported hardware-based floating point calculations. This functionality opened it up to a whole new dimension of scientific computing, with use by Bell Labs, US national laboratories, NACA (later NASA), and many universities.

Much of this work involved turning equations for fluid dynamics and similar into programs that could be run on mainframes like the 704. This translating of formulas used to be done tediously in assembly languages before Backus’ Formula Translator (FORTRAN) was introduced to remove most of this tedium. With it, engineers and physicists could focus on doing their work and generating results rather than deal with the minutiae of assembly code. Decades later, this is still what Fortran is used for today, as a domain-specific language (DSL) for scientific computing and related fields.

In this introduction to Fortran 90 and its later updates we will be looking at what exactly it is that makes Fortran still such a good choice today, as well as how to get started with it.

Continue reading “A Gentle Introduction To Fortran”

Image

FLOSS Weekly Episode 834: It Was Cool In 2006

This week Jonathan chats with Ben Meadors and Rob Campbell about the boatload of software Microsoft just released as Open Source! What’s the motivation, why is the new Edit interesting, and what’s up with Copilot? Watch to find out!

Continue reading “FLOSS Weekly Episode 834: It Was Cool In 2006”

Image

NVIDIA Announces $59 Jetson Nano 2GB, A Single Board Computer With Makers In Mind

NVIDIA kicked off their line of GPU-accelerated single board computers back in 2014 with the Jetson TK1, a $200 USD development system for those looking to get involved with the burgeoning world of so-called “edge computing”. It was designed to put high performance computing in a small and energy efficient enough package that it could be integrated directly into products, rather than connecting to a data center half-way across the world.

The TK1 was an impressive piece of hardware, but not something the hacker and maker community was necessarily interested in. For one thing, it was fairly expensive. But perhaps more importantly, it was clearly geared more towards industry types than consumers. We did see the occasional project using the TK1 and the subsequent TX1 and TX2 boards, but they were few and far between.

Then came the Jetson Nano. Its 128 core Maxwell CPU still packed plenty of power and was fully compatible with NVIDIA’s CUDA architecture, but its smaller size and $99 price tag made it far more attractive for hobbyists. According to the company’s own figures, the number of active Jetson developers has more than tripled since the Nano’s introduction in March of 2019. With the platform accessible to a larger and more diverse group of users, new and innovative applications for machine learning started pouring in.

Cutting the price of the entry level Jetson hardware in half was clearly a step in the right direction, but NVIDIA wanted to bring even more developers into the fray. So why not see if lightning can strike twice? Today they’ve officially announced that the new Jetson Nano 2GB will go on sale later this month for just $59. Let’s take a close look at this new iteration of the Nano to see what’s changed (and what hasn’t) from last year’s model.

Continue reading “NVIDIA Announces $59 Jetson Nano 2GB, A Single Board Computer With Makers In Mind”