October 12, 2021 – Database company Kinetica is now accessible as a service on the Microsoft Azure cloud platform, designed to give organizations real-time contextual analysis and location intelligence on massive data sets with reduced computing infrastructure and lower costs. Kinetica’s vectorized database is used to analyze data from sensors and machines in real time. For […]
The Role of Middleware in Optimizing Vector Processing
This whitepaper delves into the world of unstructured data and describes some of the technologies, especially vector processors and their optimization software, that play key roles in solving the problems that arise as result of the accelerating amount of data generated globally.
Improving HPC Performance with the Roofline Model
“When we are optimizing our objective is to determine which hardware resource the code is exhausting (there must be one, otherwise it would run faster!), and then see how to modify the code to reduce its need for that resource. It is therefore essential to understand the maximum theoretical performance of that aspect of the machine, since if we are already achieving the peak performance we should give up, or choose a different algorithm.”
Video: Speed Your Code with Intel Parallel Studio XE
“Modern processors perform their best with parallel code that’s both vectorized and threaded, which can run more than 100 times faster more than serial code. So how can you accomplish this more easily through parallel programming? Enter Parallel Studio XE, a suite of tools that simplifies and speeds the design, building, tuning, and scaling of applications with the latest code modernization methods.”
Vectorization Now More Important Than Ever
Vectorization, the hardware optimization technique synonymous with early vector supercomputers like the Cray-1 (1975), has reappeared with even greater importance than before. Today, 40+ years later, the AVX-512 vector instructions in the most recent many-core Intel Xeon and Intel® Xeon PhiTM processors can increase application performance by 16x for single-precision codes.
Intel Compilers 18.0 Tune for AVX-512 ISA Extensions
Intel Compilers 18.0 and Intel Parallel Studio XE 2018 tuning software fully support the AVX-512 instructions. By widening and deepening the vector registers, the new instructions and added enhancements let the compiler squeeze more vector parallelism out of applications than before. Applications compiled with the –xCORE-AVX512 will generate an executable that utilizes these new high-performance instructions.
OpenMP at 20 Moving Forward to 5.0
This year, OpenMP*, the widely used API for shared memory parallelism supported in many C/C++ and Fortran compilers, turns 20. OpenMP is a great example of how hardware and software vendors, researchers, and academia, volunteering to work together, can successfully design a specification that benefits the entire developer community.
The Importance of Vectorization Resurfaces
Vectorization offers potential speedups in codes with significant array-based computations—speedups that amplify the improved performance obtained through higher-level, parallel computations using threads and distributed execution on clusters. Key features for vectorization include tunable array sizes to reflect various processor cache and instruction capabilities and stride-1 accesses within inner loops.
Let The Compiler Do Its Thing
“In the past, developers would get best results if a loop was unrolled, that is, duplicating the body as many times as needed to that the operations could be operated on using full vectors. The number of iterations would reflect the hardware that the code was targeted towards. Since the application may have to run on different hardware in the future, results for todays generation of hardware may be compromised in the future. In fact, it is better to let modern compilers to the unrolling.”










