Author Archive TEXTAROSSA Project


Innovative two-phase cooling system

An innovative two-phase cooling system has been developed within the EuroHPC Joint Undertaking (EuroHPC JU) project TEXTAROSSA. This innovative solution addresses the pressing cooling challenges faced by HPC systems, enabling more efficient heat dissipation and a significant reduction in energy consumption.

Curious to see how it works? Check out our video by Elisabetta Boella (E4 HPC Product Specialist), for a closer look at this groundbreaking technology.


TEXTAROSSA @ HiPEAC Conference by E4

E4 Computer Engineering presented TEXTAROSSA at the HiPEAC international conference in Munich.


TEXTAROSSA presented at Supercomputing 2023 by PSNC

PSNC presented TEXTAROSSA at Supercomputing conference 2023.



Our TEXTAROSSA poster (you can find a version here) has been presented at the 2nd Italian Conference on Big Data and Data Science (ITADATA) by E4 during the even in Naples (Italy).


New EuroHPC website

The new EuroHPC website on European Exascale projects is available: Don’t forget to check the ExaBlog!


TEXTAROSSA @ HPCAI Advisory Council

The TEXTAROSSA project has been presented by E4 at the HPCAI Advisory Council (Lugano, Svizzera – 03-06/04/2023).


Streaming Programming Models

One of the aims of the TEXTAROSSA project is to define and develop a stream-based programming paradigm able to integrate vertically with the heterogeneous TEXTAROSSA node.

Towards this activity the TEXTAROSSA project leverages the FastFlow [1] C++ header-only library which provides application designers abstractions for parallel programming (e.g., Pipeline, ordered Task-Farm, Divide & Conquer, Parallel-For-Reduce, Macro Data-Flow) and a carefully designed run-time system. At the lower layer, the library defines so-called Building Blocks (BB), i.e., recurrent data-flow compositions of concurrent activities working in a streaming fashion, which represent the primary abstraction layer to build FastFlow parallel patterns and streaming topologies [2, 3]. A parallel application is conceived by adequately selecting and assembling a small set of BBs modelling data and control flows. The BBs can be combined and nested in different ways forming either acyclic or cyclic concurrency graphs, where nodes are FastFlow concurrent entities and edges are communication channels.

Figure 1: Example of a FastFlow application: the communication topology is described as a composition of building blocks in a data-flow graph.

Within the project the aim is to extend FastFlow with a new offloader node able to delegate the computation to an FPGA accelerator hosted on the TEXTAROSSA node. This is done by programmatically loading the desired compute kernel onto the FPGA and streaming the input/output data to/from the FPGA accelerator. This will allow to have FastFlow applications which leverage seamlessly heterogeneous compute resources by simply using traditional nodes, using CPU threads, and offloader nodes, delegating work to accelerators, in the same concurrency graph.

Figure 2: FastFlow node comparison: Traditional vs. Offloader.

The main challenge in designing the offloader node is how to maximize the performance gain from using the accelerator card. Indeed, while the accelerator is expected to compute the results of the compute kernel substantially faster than the CPU, the communication with the card causes extra delays. The aim is to design schemes which can minimize/hide the impact of the extra time needed to send/receive data from the accelerator card.

Works Cited
[1] M. Aldinucci, M. Danelutto, P. Kilpatrick and M. Torquati, “FastFlow: High-level and Efficient Streaming on Multi-core,” in Programming Multi-core and Many-core Computing Systems, John Wiley & Sons, Ltd, 2017, pp. 261-280.
[2] T. Massimo, Harnessing Parallelism in Multi/Many-Cores with Streams and Parallel Patterns, University Of Pisa, 2019.
[3] M. Aldinucci, S. Campa, M. Danelutto, P. Kilpatrick and M. Torquati, “Design patterns percolating to parallel programming framework implementation,” International Journal of Parallel Programming, vol. 42, no. 6, pp. 1012-1031, 2013.

Leading Partner: CINI/UNITO


Webinar “PATC: Heterogeneous Programming on FPGA with OmpSs@FPGA”

Carlo Alvarez, BSC, will present on March 24, 2023, 09.00-17.30 a webinar entitled “PATC: Heterogeneous Programming on FPGA with OmpSs@FPGA” in the context of the TEXTAROSSA project.



Approximate computing for AI

Machine Learning in general, and Deep Neural Networks (DNNs) in particular, have recently been shown to tolerate low-precision representations of their parameters.

This represents an opportunity to accelerate computations, reduce storage, and, most importantly, reduce power consumption. At the edge and on embedded devices, the latter is critical.

In neural networks, two game-changing factors are developing.

The RISC-V open instruction set architecture (ISA) enables for the seamless implementation of custom instruction sets. Second, several novel formats for real number arithmetic exist. In TextaRossa we aim to merge these two major components by developing an accelerator for mixed precision, employing one or more promising low-precision formats (e.g., Posit, bfloat). We aim to develop an enhancement to an original RISC-V ISA that allows for the computation such formats as well as the interoperability of these formats alongside the standard 32-bit IEEE Floats (a.k.a. binary32) or traditional fixed-point formats to provide a compact representation of real numbers with minimal to no accuracy deterioration and with a compression factor of 2 to 4 times. In TextaRossa we have two main paths in exploiting low-precision format.

The first one is the design by UNIPISA of an IP core for a lightweight PPU (Posit Processing Unit) to be connected to a 64b RISC-V processor in the form of a co-processor with an extension of the Instruction Set Architecture (ISA). We focus on the compression abilities of posits by providing a co-processor with only conversions in mind, called light PPU. We can convert binary32 floating point numbers to posit numbers with 16 and 8 bits. This co-processor can be paired with a RISCV-V core that already has a floating-point unit (e.g., Ariane 64b RISC-V) without interrupting the existing pipeline. On the other hand, we can use this unit to enable ALU computation of posit numbers with the posit-to-fixed conversion modules on a RISCV-V core that does not support floating-point.

The second one is the design by UNIPISA of a complete Posit Processing Unit (namely Full PPU, FPPU) that can be connected to a RISC-V processor core with a further extension of the ISA, adding the capabilities of complete posit arithmetic to such core. This approach enables us to deliver efficient real number arithmetic with 8 or 16 bits (thus reducing the bits used by a factor 4 or 2, compared to binary32 numbers), even in low-power processors that are not equipped with a traditional floating-point unit. Low power performance of the PPU coprocessors has been also validated by UNIPISA and POLIMI.

Leading partner: UNIPI


  1. M. Cococcioni, F. Rossi, E. Ruffaldi and S. Saponara, “A Lightweight Posit Processing Unit for RISC-V Processors in Deep Neural Network Applications,” in IEEE Transactions on Emerging Topics in Computing, vol. 10, no. 4, pp. 1898-1908, 1 Oct.-Dec. 2022, doi: 10.1109/TETC.2021.3120538.
  2. Michele Piccoli, Davide Zoni, William Fornaciari, Marco Cococcioni, Federico Rossi, Emanuele Ruffaldi, Sergio Saponara, and Giuseppe Massari. “Dynamic Power consumption of the Full Posit Processing Unit: Analysis and Experiments”. In: PARMA-DITAM 2023. Open Access Series in Informatics (OASIcs). Dagstuhl, Germany, 2023, to appear.

TEXTAROSSA won the Favorite Zany Acronym Award

The TEXTAROSSA project won the superlative award for the Favorite Zany Acronym by HPC Wire. Not the most important of the scientific achievements, but for sure funny!

Read the full story here.