Goran Adrinek, Lead gameplay programmer
This article explains the work it took to implement mixed reality into our game using our own engine. Apart from using the approaches already taken by some other teams, we have developed some new techniques that work great to solve latency issues as well as make camera calibration really simple for the user. We haven’t found anyone describe such solutions to these common problems, so our implementation should be interesting both to people new to the subject and to the people who already did mixed reality and want to improve their approach.
How it all started for us
Mixed reality was already found to be a great way to demonstrate VR experience outside of VR. Naturally, we also wanted to use mixed reality for promotion of Serious Sam VR: The Last Hope. We started implementing mixed reality two weeks before the EGX 2016 show so we could be featured on the Vive stand, capture some promotional footage as well as show off our game in all its glory featuring the physical minigun controller.
The people from HTC already had some experience with mixed reality video and they gave us the requirements for their mixed reality mixing and recording setup. We were supposed to render multiple views from the game, the foreground, the background view etc. All of that plus the chroma keyed player would be mixed in the open source software OBS  and videos could be captured. This approach was described in a great article by Kert Gartner  and that was our starting point for the research and implementation.
Since we use our own Serious Engine, it was fairly straightforward to implement separate rendering of foreground/background view from the spectator camera that could be controlled by a Vive controller (we went for the moving spectator camera from the start). As soon as we did that, we knew we weren’t happy with the approach since it suffered from multiple problems:
- There would be sorting issues when enemies, projectiles and particle effects get close to the player.
- Performance would be bad since we had to render the scene several times more than necessary (2 times for the eyes plus at least 2 times more for the foreground and background spectator views).
- We couldn’t light the player with in game lighting.
We knew that in order to solve these problems we had to find a way to put the player inside the game instead of relying on mixing software. For that to work we needed to capture the video and feed it into our own engine. After some more research we have found that we weren’t the first ones to come up with that approach. There was a great article by Shaun McCabe  that described exactly what we wanted to do.
Getting the player inside the game
In  Shaun described how they used the example code from Microsoft Media Foundation  to capture the video so we implemented video capture in our engine. It worked out nicely using the Logitech HD WEBCAM C270 we had in our office. The video was not good enough for promo recording but it was a start to get a prototype working.
The video was uploaded to the GPU where we used a shader to perform chroma keying the same way it is done in OBS (because people were already experienced with the parameters in OBS).
In order to make the player from the video visible inside the game, we used a simple technique of rendering a single camera facing rectangle that was alpha keyed (matching the chroma keying). That rectangle is centered around the player’s position in the scene (determined by the tracked headset). Parts of the scene that aren’t around the player are cut away (Garbage matte technique) – very useful in our makeshift studio with green screen that didn’t cover much of the scene.
Streamlining the camera calibration
Even in the early prototype version, we were really frustrated with the process of manual camera calibration – tweaking the offset parameters (heading, pitch and banking angles, as well as the linear offset) so we could match the position of the physical camera (one providing the video feed) with the in game spectator camera (one rendering the game world). In order to make it easy on us and especially on the people at the trade shows, we came up with an approach that didn’t require any manual tweaking.
The idea was simple (but it took a while for it to mature): while viewing yourself in camera feed, simply mark five points displayed on the screen by placing the Vive controller so the center of the hole matches the crosshair on the screen and press the trigger button to take the sample (see image below).
This is repeated when standing closer to the camera and then further away from the camera. This methods yields five rays in 3D space (ray goes from the far away point through the near point taken for the same sample on the screen) for five known positions in screen space. These information is then used to find the camera position, FOV and rotation angles (relative to the attached controller or relative to the scene origin if extra controller is not used). The involved math will be described in a separate article as this approach should be useful to many developers.
This method was a lifesaver, especially for people that were supposed to set it up at the show and were struggling even to get a near match with manual tweaking. This method, if done correctly, by precisely aligning the controller to match the on screen points, yielded results that were a perfect match in less than a minute! Calibration results can then be observed by simply comparing the size, position and orientation of the Vive controllers rendered in game with the Vive controllers in the video.
The Debut: EGX 2016
With all that implemented at the last possible moment, we impatiently waited for our marketing people and the Vive crew at EGX to set up everything at the show. This proved to be a big problem because the capture card and camera at the show didn’t work with our video capture module based on Windows Media Foundation! It looked like all that work was going to be for nothing but we managed to quickly implement completely new video capture method using the libdshowcapture library . The library was selected because it was used by the OBS (which was routinely used by the Vive crew). New method of video capture was implemented in one night of programming frenzy from Zagreb logged in remotely to the equipment set up on the show floor in Birmingham and was finished a couple hours before the start of the show.
The results were magnificent! Lots of people tried our physical minigun (that calls for another article) and were recorded in mixed reality while doing so. That was the mixed reality debut we were hoping for and were really happy with the results!
Fixing the video delay
Apart from camera calibration, we encountered another interesting problem: in game weapons and controllers were rushing ahead of player’s hands in the video. This was not a problem in sessions involving the physical minigun since the in game weapon was not displayed in mixed reality and the physical minigun was perfectly following the physical player. We knew we wanted to record promotional videos of players wielding the in game weapons and that it would be the most useful for the players who wanted to stream the game once the game was out (not everyone has our physical minigun). Due to the inherent delay of capturing the video and uploading it to the GPU there was a noticeable delay between player’s hands and the controllers or weapons in game and that didn’t look good. This looked even worse when camera (with attached controller) was moving. To fight this problem, we came up with an idea of delaying the controller input so it matches the video feed. This of course made the player feel weird, so instead we just delayed the controller/weapon placement in mixed reality view! This way the weapons were perfectly matched to player movements in the video stream and it appeared as if player was really holding the weapons (at least ones with realistic dimensions). And the delay (around 60 milliseconds in our final setup) was not big enough to notice that the bullets and projectiles are not exactly matched to the delayed weapons.
There was one final problem that took us a while to figure out. Although we have applied the same delay compensation to the controller attached to the camera, there was still some offset while the camera was moving. This was really bumming us out for days, but we finally found what the problem was – the camera had the video stabilizer turned on! So if you’re moving your camera in mixed reality, make sure you disable the video stabilization in camera – use the stabilizer hardware instead that affects the attached controller as well.
We have come a long way in less than one month since our humble beginnings in mixed reality in both software and in hardware. We’d like to describe our setup as this may also be of help to people starting off in mixed reality. For video capture, we ended up using Canon EOS 7D Mark II DSLR Camera with 18-135mm lens connected to the Avermedia Live Gamer Extreme external capture card via HDMI. Videos were recorded on a different machine (for performance reasons) using the Elgato HD 60 Pro. Our gaming rig featured the Nvidia 1080 GPU and Intel Core i7 4790K CPU. We have mounted a controller on top of the camera using the 3D printed adapter  for the Vive controller attached to a hot shoe using a hot shoe adapter . Vive controller attached to the camera was connected via USB cable so we could have two free controllers for the player. In moving camera sessions, we used the DSLR Flycam Nano Camera Stabilizer.
Done for now… What’s next?
It was so much fun implementing all this in our own engine where we have a lot freedom to experiment and we’re very happy with the results. It was also fun to experiment with the videos and to see ourselves recorded in mixed reality and imagine it will be a lot of fun to the players as well. A lot of effort was put into developing an easy to follow, interactive, step by step wizard interface. This should make the setup process easy enough for all the players that want to try out mixed reality in our game and have some kind of camera and a patch of green screen.
Recently we have learned about the method of using the stereo depth camera  which makes it possible to add depth to the player in mixed reality. Now we can’t wait to get our hands on such camera so we can do even better!