Measuring Performance MS and FPS

https://catlikecoding.com/unity/tutorials/basics/measuring-performance/

use game window status, frame debugger, and profiler.
compare dynamic batching, gpu instancing, and srp batcher.
display a frame rate counter.
cycle through functions automatically. 这个咋翻译?
smoothly transition between functions. 函数之间平滑过渡

this is the fourth tutorial in a series about learning the basic https://catlikecoding.com/unity/tutorials/basics/
of working with unity. it is an introduction to measuring performance. we will also add the ability to morph 变形 from function to another to our function library.

this tutorial is made with unity 2019.4.12f1.
Measuring Performance MS and FPS
1 profiling unity
unity continuously renders new frames. to make anything that moves appear fluid it has to do this fast enough so we perceive the sequence of images as continuous motion. typically 30 frames per second——FPS for short——is the minimum to aim for and 60FPS is ideal.
these numbers appear often because many devices have a display refresh rate of 60 hertz. you can not draw frames faster than that without turing VSync off, which will cause image tearing. if consistent 60 FPS can not be achieved then the next best rate is 30FPS, which is once per two (0.5) display refreshes. one step lower 更低的话 would be 15 FPS, which is insufficient for fluid motion.

whether a target frame rate can be achieved depends on how long it takes to process individual frames. to reach 60FPS we must update and render each frame in less than 16.67 milliseconds. the time budget for 30FPS is 33.33 ms per frame.

when our graph is running we can get a sense of how smooth its motion is by simply observing it, but this is a very imprecise 不精确的 way to measure its performance. if motion appears smooth then it probably exceeds 30 FPS and if it appears to stutter it’s probably less than that. it might also be smooth one moment and stutter the next, due to inconsistent performance. this can be caused by variation in our app, but also due to other apps running on the same device. if we barely reached 60FPS then we could end up going back and forth between 30FPS and 60FPS rapidly, which would feel jittery despite a high average FPS. so to get a good idea of what’s going on we have to measure performance more precisely. unity has a few tools to help us with this.

1.1 game window statistics
The game window has a Statistics overlay panel that can be activated via its Stats toolbar button. It displays measurements taken for the last rendered frame. It doesn’t tell us much, but it’s the simplest tool that we can use to get an indication of what’s going on. While in edit mode the game windows usually updates only sporadically, after something changed. In play mode it refreshes all the time.

The following statistics are for our graph with the torus function and resolution at 100, using the default render pipeline, which I’ll refer to as DRP from now on. I have VSync turned on for the game window, so refreshes are synchronized with my 60 Hz display.

Measuring Performance MS and FPS

The statistics show a frame during which the CPU main thread took 23.6ms and the render thread took 27.8ms. You’ll likely get different results, depending on your hardware. In my case it suggests that the entire frame took 51.4ms to render, but the statistics panel reported 36FPS, matching the render thread time. The FPS indicator seems to takes the worst of both and assumes that matches the frame rate. This is an oversimplification that only takes the CPU side into account, ignoring the GPU and display. The real frame rate is likely lower.

What’s a thread?
A thread is a subprocess, in this case of the Unity app. There can be multiple threads running in parallel at the same time. The statistics show how long Unity’s main and render threads were running during the last frame.

Besides the durations and FPS indication the statistics panel also displays various details about what was rendered. There were 30.003 batches, and apparently zero saved by batching. These are draw commands send to the GPU. Our graph contains 10.000 points, so it appears that each point got rendered three times. That’s once for a depth pass, once for shadow casters—listed separately as well—and once to render the final cube, per point. The other three batches are for additional work like the sky box and shadow processing that is independent of our graph. There were also six set-pass calls, which can be though of as the GPU getting reconfigured to render in a different way, like with a different material.
Measuring Performance MS and FPS
If we switch to URP the statistics are different. It renders faster and in this case the main CPU thread is slower than the render thread. It’s easy to guess why: there are only 20.001 batches, 10.000 less than for DRP. That’s because URP doesn’t use a separate depth pass for directional shadows. The statistics report zero shadow casters, but that’s because it can only show this number for DRP.

Another strange thing is that a negative number might be shown for Saved by batching. This happens because URP uses the SRP batcher by default and the statistics panel doesn’t understand it. The SRP batcher doesn’t eliminate individual draw commands but can make them much more efficient. To illustrate this select our URP asset and disable SRP Batcher under the Advanced section at the bottom of its inspector.
Measuring Performance MS and FPS
With the SRP batcher disabled URP performance is much worse
Measuring Performance MS and FPS
1.2 dynamic batching
besides the srp batcher URP has another toggle for dynamic batching. this is an old technique that dynamically combines small meshes into a single larger one which then gets rendered instead. enabling it for UPR reduces batches to 10.023 and the statitics panel indicates that 9.978 draws were eliminated.
Measuring Performance MS and FPS
Statistics for URP with dynamic batching.

in my case the SRP batcher and dynamic batching have comparable performance, because the cube meshes of our graph’s points are ideal candidates for dynamic batching.

the SRP batches is not available for DRP, but we can enable dynamic batching for it. in this case we can find the toggle in the other settings section of the player project settings, a bit below from where we set color space to linear. it is only visible when no scriptable render pipeline settings are used.
Measuring Performance MS and FPS
Statistics for DRP with dynamic batching.——D-》default

dynamic batching is much more efficient for DRP, eliminating 29.964 batches, reducing them to only 39.
it is an improvement, but still not as fast as URP.

1.3 gpu instancing
another way to improve rendering performance is by enabling gpu instancing. this makes it possible to use a single draw command to tell the GPU to draw many instances of one mesh with the same material, providing an array of transformation matrices and optionally other instance data. in this case we have to enable it per material. ours have an enable gpu instancing toggle for it.
Measuring Performance MS and FPS
Material with GPU instancing enabled.

URP prefers the SRP batcher over GPU instancing, so to make it work for our points the SRP batcher has to be disabled. we can then see that the amount of batches is reduced to just 45, much better than dynamic batching. we will discover the reason for this difference later.
Measuring Performance MS and FPS
Statistics for URP with GPU instancing.

we could conclude from this data that for URP gpu instancing is best, followed by dynamic batching, and then the SRP batcher. but the difference is small and the indicated FPS is higher than my display refresh rate in all cases, so they seem effectively equivalent for our graph. the only clear conclusion is that using none of those is not good idea.

for drp gpu instancing appears to perform a bit better than dynamic batching, and both approaches are a lot better than using neither.
Measuring Performance MS and FPS
Statistics for DRP with GPU instancing.

1.4 fame debugger