Veryl Simulator: Performance Comparison with Verilator

We have been working on a native Veryl simulator built on the new IR-based analyzer introduced earlier this year. This post shares early performance numbers comparing it against Verilator, the de facto standard open source SystemVerilog simulator.

Approach

The Veryl simulator combines two execution backends:

In practice the simulation starts running almost immediately on the Cranelift output, and then speeds up mid-run once GCC has finished compiling.

Benchmark

We ran a Linux boot (about 30M simulated cycles) on Heliodor, an Out-of-Order RISC-V core written in Veryl, with 1, 2, and 4 core configurations.

The Veryl simulator also supports 4-state simulation. We used 2-state mode for this benchmark because this version of Verilator is 2-state only.

For each configuration we measured both the first run (no cached artifacts) and the cached run (re-running after the optimized binary has been built), on two machines representing different CPU generations.

Intel Xeon Gold 6434 (Sapphire Rapids, 2023)

AMD Ryzen Threadripper 1950X (Zen 1, 2017)

Observations

Veryl is faster than Verilator in both modes: substantially on the first run, and more modestly on the cached run. Most edit-compile-run cycles during development are dominated by the first-run number. Once a simulation runs long enough — regression sweeps, full OS boots — the cached-run number takes over.

What's next

The simulator is still under active development. We plan to extend the benchmark to other CPU architectures and a wider range of designs, and to publish the benchmark setup so the numbers can be reproduced.