Flycast world’s first Dreamcast emulator to receive Vulkan renderer – available later today on RetroArch with nightly core!

The first Dreamcast emulator ever to get a Vulkan renderer. Completely open-source, written from scratch, and available later today on RetroArch. Update your core later today to get the latest version with the Vulkan renderer! Available for Android, Windows, and Linux.

For more information, read down below…

Wait … a new what?

The renderer is the emulator component that emulates the Dreamcast/Naomi GPU chip, namely the PowerVR Series2. It was one of the first generations of 3D chips, with only a fixed pipeline. The PowerVR2 supported DirectX 6.0, which was the graphics API used by Windows CE games on the Dreamcast. Successors of the PowerVR2 would later be found in the original iPhone and iPod Touch (PowerVR4), iPhone 4 and iPad (PowerVR5) and many many other mobile devices. Now the Dreamcast GPU is more than 20 years old. You might think it should be easy to emulate such an ancient chip on modern hardware, right? Well … yes for the most part. But there’s one thing that the PVR2 does really well, and it’s order-independent transparency. And even today this is still not trivial to implement even on modern hardware. You won’t find this feature in Open GL or DirectX, and you need a pretty recent version of these APIs to be able to emulate it, which means manually sorting individual pixels from back to front and blending them together, and doing this for each visible pixel on the screen!

OK, but what about Vulkan?

For those of you who are not familiar with Vulkan, it is a relatively new 3D graphics API, basically a follow-on to Open GL. Open GL is quite permissive and has little declarative constraints. You just throw stuff at the driver when you need to and the driver’s job is to figure it out. The downside of this is that the Open GL driver often needs to guess what you’ll do next and he might not guess right. And when it doesn’t, performance suffers. Vulkan is radically different in that everything must be declared in advance, in great details, and there’s very little room for improvisation on the part of the driver. Vulkan works much closer to the hardware than Open GL does. So you can expect less overhead, more reliability and better performance in many cases.

The downside of Vulkan is the sheer amount of code you have to write to display just a single triangle on the screen, let alone a full-featured Dreamcast renderer. Last time I checked, the Vulkan renderer had 47 source files and around 7800 lines of code. (The Open GL renderer only has around 6000 lines of code.)

So what do we get?

As with Open GL, there are actually two Vulkan renderers: The first one uses a traditional single render pass with per-triangle or per-mesh sorting done by the CPU. The second one is capable of order-independent transparency with per-pixel sorting performed by the GPU. It uses multiple subpasses to compose the final image: the first subpass draws the opaque geometry depth map and the shadows casted on them. The second subpass renders all opaque geometry to a temporary color framebuffer, and transparent geometry into a huge pixel linked list. The last subpass then renders shadow volumes for translucent geometry. And finally all pixels are sorted and blended together using the opaque framebuffer of the previous subpass as background.

The next Flycast nightly build will have support for Vulkan on all major platforms: Windows, Linux and Android. In terms of features, the new renderer should be on par with the Open GL renderer, with the notable exception of lightgun crosshair and VMU screens display, which will be added soon. However, expect to find bugs and crashes here and there as is expected with any new piece of software. Also it may be slower than Open GL depending on many factors such as GPU, driver version, game being played, etc. We’ll do our best to fix any issue encountered and overcome performance issues. When reporting problems, make sure to indicate what GPU you’re using and the Vulkan driver version. It is highly recommended to upgrade your drivers to the latest version available, especially on mobile.

Here is a showcase of the differences between the basic and OIT renderers. By the way, this also applies to Open GL.

Here the hair of these ladies show glitching triangles in basic mode.

In Speed Devils 2, the shadow volumes (called “Modifier Volumes” in Dreamcast literature) are used in a special way to project headlights. This is only possible by using deferred rendering.

In this example, look at Ryo’s cast shadow on his left. There is a fog effect applied to this scene, but the basic single pass renderer cannot apply a fog effect to the cast shadow. In the OIT renderer, the shadow is perfectly fogged.

In Jet Set Radio, the character is composed of translucent polygons, and these polygons can be shadowed as well. Only the OIT renderer can properly render shadows cast on translucent polygons.

To finish, here is another seldom used GPU features: secondary accumulation buffer. It can be used to do tri-linear filtering and other effects. This is Evil Dead – Hail to the King and it is clear that the basic renderer is having a hard time here.

Final thoughts

Yes, the per-pixel alpha transparency option which to this date was only available on Windows and Linux now also works on Android with the Vulkan renderer. However, keep in mind that per-pixel alpha sorting is heavily memory bandwidth-limited. It has been tested on a Mali G76 (Samsung Galaxy S10+) – and it runs acceptably at 640×480 or 800×600 resolution. Your mileage may vary depending on the GPU power inside your Android phone. We recommend you to find that sweet spot which works best for you, and if results are too bad with per-pixel alpha enabled, turn back to per-triangle.

Some clear advantages of the Vulkan renderer is that frame pacing is much better than the OpenGL renderer, and performance is far higher when it comes to texture uploads and/or framebuffer manipulation. For example – when you KO an opponent in Dead Or Alive 2 against an explosive wall – the framerate would often tumble a bit on GL, but no such issues with Vulkan. Similar improvements can be noticed in Virtua Tennis 2 – when certain framebuffer effects happen after a replay, performance is much more steady with Vulkan thanks to the high degree of parallelism.

With Vulkan, we have heard reports that virtually all sound crackles and stutters are gone. That’s because with vulkan you choose the sync points where you wait. In GL the driver has to guess and sometimes it fails. These effects are using render to texture, and with OpenGL this creates sync issues.

RetroArch – Hardware video decoding – coming soon!

As you may well know, RetroArch has embedded video player support on platforms such as Windows, and Linux. Just like VLC, Kodi, mpv and other video players out there, it accomplishes this by leveraging the ffmpeg project.

Up until now, all video decoding was performed entirely in software. This means that the CPU has to do all the decoding instead of being able to delegate it to the GPU. This meant that on some systems, video playback could be too slow if the CPU was too underpowered. This so happens to be the case on many ARM SoC devices out there, such as the Raspberry Pi and Odroids.

Now, we finally support hardware video decoding through ffmpeg’s own APIs! This should really help on systems where there is a CPU bottleneck and the GPU happens to support hardware decoding. Whether or not you are able to decode 1080p, 1440p or 4K on hardware depends entirely on your GPU’s capabilities however.

In addition to hardware decoding, frame based multithreading is now enabled for SW based video decoders, but actual effectiveness hasn’t been proven yet.

The core switches back to SW based decoding if the HW based decoding couldn’t be initialized.

The following backends have been tested:

  • DXVA2 [Windows]
  • D3D11VA [Windows] (it will use this when using the D3D11 driver
  • VDPAU [Linux] (Tested on an AMD System with VDPAU to VAAPI layer)
  • VAAPI

We have performed the following tests so far:

  • Nvidia Titan XP/RTX 2080 Ti
  • – Can hardware decode 1080p/1440p/4K content.

  • Intel UHD 630
  • – Can hardware decode 1080p/1440p/4K content.

  • AMD Radeon R9 290x
  • – This is a slightly older card from 2014. It only supports 1080p hardware video decoding at best. 1440p and 4K content therefore falls back to software video decoding. This means that if your CPU is not up to the task, you won’t be able to run this content at fullspeed.

As a stress test video, we picked a 4K video (3840×2160) with a total bitrate of 29561 kb/s (h264/AVC1, YUV420P), running at 30 frames per second. The CPU we’re using for this test is an Intel Core i7 7700k. With such a CPU, we don’t really have a CPU bottleneck and we are merely GPU bound when it comes to rendering the content.

With software decoding (the current default in RetroArch) – we averaged around 55fps with the 2080 Ti. Our CPU load averages around 15% with GPU load averaging around 11%.

With hardware decoding (the 2080 Ti defaults to DXVA2 for this test) – we averaged 77fps with the 2080 Ti. Our CPU load averages around 11% with GPU load averaging around 20%.

NOTE: The above is long since out of date – the same video is now 256fps with hardware decoding and 224fps with threaded video decoding at an automatically defined amount of threads. Quite the improvement from 55fps I’m sure you’ll agree.

What remains to be done

We will still need to gather tests for the following backends:

  • Cuda
  • Videotoolbox
  • DRM
  • OpenCL
  • Mediacodec

Future plans

In short, we hope this will really help out RetroArch’s video playback capabilities not only on desktops such as Windows and Linux, but also on the ARM SoCs, and in specific our own Linux distribution, Lakka.

But hardware video decoding is not the end-all-be all. There is certainly a lot of room for improvement for future speedups, and these are being investigated. But that’s the subject of another blog post somewhere down the line.

For now, rest assured that big things are coming up for the next version of RetroArch!