Exploiting Hardware-Level Parallelism in the Manticore Hardware-Accelerated RTL Simulator

Exploiting Hardware-Level Parallelism in the Manticore Hardware-Accelerated RTL Simulator

Before a chip design is turned from a hardware design language (HDL) like VHDL or Verilog into physical hardware, testing and validating the design is an essential step. Yet simulating a HDL design is rather slow due to the simulator using either only a single CPU thread, or limited multi-threading due to the requirements of fine-grained concurrency. This is due to the strict timing requirements of simulating hardware and the various clock domains that ultimately determine whether a design passes or fails. In a recent attempt to speed up RTL (transistor) level simulations like these, Mahyar Emami and colleagues propose a custom processor architecture – called Manticore – that can be used to run a HDL design after nothing more than compiling the HDL source and some processing.


In the preprint paper they detail their implementation, covering the static bulk-synchronous parallel (BSP) execution model that underlies the architecture and associated tooling. Rather than having the simulator (hardware or software) determine the synchronization and communication needs of different elements of the design-under-test, the compiler instead seeks to determine these moments ahead of time. This simplifies the requirements of the Manticore execution units, which are optimized to execute just this simulation task.


Although an ASIC version of Manticore would obviously be significantly faster than the FPGA version the researchers used in this implementation, the 475 MHz, 225-core implementation on a Xilinx UltraScale+ FPGA (Alveo U200 card) compared favorably against the Verilator simulator which was run on three x86 systems ranging from an Intel Core i7-9700K to an AMD EPYC 7V73X. Best of all ..

Support the originator by clicking the read the rest link below.