Analysis
7Dec99: Marcin Slawicz and Jos Grupping have updated their graphical and statistical analysis of the FSBench benchmark data. By way of explanation: the first graph shows the trend for different types of CPU. Although there is a considerable spread in the individual results, the trend is clear: twice the speed gives twice the performance. Marcin's numbers (following both charts) reflect how much one line is above another line. The spread of data for each type of processor probably shows the effect of the way the different PC's are maintained. The cleaner the system, the better the performance.
It looks as if the AMD K6 is, as Marcin concluded too, the only "slow" CPU. The Intel cpu's do all about the same. The AMD K7 (Athlon) performs a bit better, probably reflecting the better fpu.

The second graph shows the benchmark results ordered by video card type (or 3D card type). A word of caution: from a true statistical standpoint one can question whether the groups are homogeneous enough for this type of analysis. Nevertheless: note how much the graphs overlap. In general the conclusion is that the type of video isn't near as important as the cpu-speed. Most modern cards will give about the same performance. It appears that the G400 is the outlier, while the new GeForce does a little better, but not very much (10%?)

7 Dec 99 Here is a series of posts recently seen on the sinflight.com FS2000 forum concerning the statistical analysis of data from the report System. Here are the best of the lot, most recent posts at the top:
Subject: Conclusions to FPS
statistical analysis
From: "Jim Ho" <jho@dres.dnd.ca>
Date: 3 Dec 1999 21:21:48 GMT
It was pointed out that to further make the statistical tests fairer, the FPS values should be normalized for CPU speed. To do this, the formula [FPS/(Mhz*100)] was suggested. This was a good approach as it would avoid the danger of giving the impression of comparing "apples to oranges", a statistical sin. With apologies for having polluted this space with previous postings, here are the results. Again the data sets were taken from Avsim.com/fsbench with converted data (GeForce & TNT2) kindly supplied by a respondent. Normality Test: Passed (P = 0.024) Equal Variance Test: Passed (P = 0.395) Group N Mean Std Dev SEM* GeForce 5 4.048 0.630 0.282 TNT2 11 3.697 0.474 0.143 TNT2Ultr 24 3.419 0.470 0.0959 Matrox 10 3.403 0.349 0.110 *N = number of entries; mean = average; std dev = standard deviation; SEM = standard error of the mean % speed difference compared to GeGorce: TNT2 = 8.7; TNT2Ultra = 15.5; Matrox = 15.9 The differences in the mean values among the treatment groups are greater than would be expected by chance; there is a statistically significant difference (P = 0.031). This suggests that at least one of the groups should show some difference. We proceeded to assign GeForce as "control" group and use Dunnett's test designed for comparing a control vs multiple others. For this, the normalized data for TNT, Matrox and TNT2Ultra were included as items of contrast. Multiple Comparisons versus Control Group (Dunnett's Method) ; as mentioned before, the column to read is under "P<0.050". Yes represents significant different at 95% probability: Comparison Diff of Means P<0.050 GeForce vs. Matrox 0.645 Yes GeForce vs. TNT2Ultra 0.629 Yes GeForce vs. TNT2 0.351 No Conclusion: surprising, the test revealed significant difference for TNT2Ultra and Matrox while there was none for TNT2, as some of you have hinted. Not included in this test are the Rage, TNT and V2-3 data set. The normalized data for these failed normality test and could not be tested with this methodology. So, after trying various attempts at seeking a clearer picture to video performance, it would appear that the GeForce still demonstrates a slight edge (almost 16%). But if there is any lesson to be learned, we have demonstrated an objective approach to solving the popular query as to what is faster. Thanks to all the respondents who have shown interest in this work and some gave various suggestions for improving the approach to this query. To the casual observer, all this work may appear to be hair splitting. But it is important to remember we are attempting to arrive at a methodology that is fair and that can be used as an objective way to treat future data. Thanks to B. Wilson for maintaining the database. As per request, here is the reference: "All data used herein provided by flight sim enthusiasts throughout the world, as reported at FSBench, the flight sim benchmarking site, http://avsim.com/fsbench."
Subject: Round 3 & Final
statistical analysis of Avsim Fsbench data
From: "Jim Ho" <jho@dres.dnd.ca>
Date: 2 Dec 1999 21:44:43 GMT
Statistical Analysis of Fsbench FS2000
1024x768 Nov 27 Data
It has been shown that higher CPU Mhz can
give better frame rates. By the same token,
FPS data from slower machines may bias video
board comparisons. To avoid CPU speed
distortion, data selection included only FPS
entries from those registering CPU 400 Mhz
and higher. They were sorted according to
video types. Note that each group had unequal
items (N) making choice of test methodology
critical.
One Way Analysis of Variance done on
Thursday, December 02, 1999, 09:11:01
Normality Test: Passed (P = 0.127). This
means that conventional parametric methods
can be used. Equal Variance Test: Passed (P =
0.427). Even though some groups have small N,
they nevertheless did not present
unacceptable variance or "noise" problems.
Again, conventional tests can be used.
Group Name N Missing Mean Std Dev SEM GeForce
7 0 20.900 4.408 1.666 TnT2Ultra 24 0 16.813
3.243 0.662 TnT2 12 0 15.722 1.841 0.531 TnT
16 0 16.862 3.440 0.860 Matrox 10 0 17.826
1.909 0.604 Rage 6 0 13.695 5.193 2.120 V2-3
15 0 15.855 2.607 0.673
Source of Variation DF SS MS F P Between
Groups 6 213.133 35.522 3.526 0.004 Residual
83 836.085 10.073 Total 89 1049.217
The meaning of the unfamiliar terms can be
found here:
www.statsoft.com/textbook/stathome.html
The differences in the mean values among the
treatment groups are greater than would be
expected by chance; there is a statistically
significant difference (P = 0.004). It thus
gives the impression that some of the groups
do not come from the same population or that
they are not random (to use a Gatesian term).
Power of performed test with alpha = 0.050:
0.818; this test can flag situations where
there are insufficient data items. The
results are acceptable.
We want to find out if any one of the video
boards stands out as a different or better
performer. The multiple comparison test is
preferred over the conventional t-test as
this method takes all group variances into
account. A conservative test, one that is
less likely to flag a false positive is the
Student-Newman-Keuls Method and the test
result is shown below. The column to read is
under "P<0.050" where significant difference
at better than 95% probability is indicated.
All Pairwise Multiple Comparison using
Student-Newman-Keuls Method :
Comparison Diff of Means p q P P<0.050
GeForce vs. Rage 7.205 7 5.771 0.002 Yes
GeForce vs. TnT2 5.178 6 4.852 0.012 Yes
GeForce vs. V2-3 5.045 5 4.911 0.007 Yes
GeForce vs. TnT2Ultra 4.087 4 4.240 0.019 Yes
GeForce vs. TnT 4.038 3 3.970 0.017 Yes
GeForce vs. Matrox 3.074 2 2.779 0.053 No
Matrox vs. Rage 4.131 6 3.565 0.130 No Matrox
vs. TnT2 2.104 5 2.190 0.534 No Matrox vs.
V2-3 1.971 4 2.152 0.430 No Matrox vs.
TnT2Ultra 1.013 3 1.200 0.674 No Matrox vs.
TnT 0.964 2 1.065 0.454 No TnT vs. Rage 3.167
5 2.948 0.237 No TnT vs. TnT2 1.141 4 1.331
0.783 No TnT vs. V2-3 1.008 3 1.250 0.652 No
TnT vs. TnT2Ultra 0.0500 2 0.0690 0.961 No
TnT2Ultra vs. Rage 3.118 4 3.043 0.146 No
TnT2Ultra vs. TnT2 1.091 3 1.375 0.597 No
TnT2Ultra vs. V2-3 0.958 2 1.297 0.362 No
V2-3 vs. Rage 2.160 3 1.992 0.341 No V2-3 vs.
TnT2 0.133 2 0.153 0.914 No TnT2 vs. Rage
2.027 2 1.806 0.205 No
By SNK analysis, it would appear that the
GeForce performs better than most of the
other boards. However, we can still use an
even more conservative test, the Tukey (no
giggles please). Only a few comparative pairs
are shown to illustrate the results.
All Pairwise Multiple Comparison using Tukey
Test:
Comparison Diff of Means p q P P<0.050
GeForce vs. Rage 7.205 7 5.771 0.002 Yes
GeForce vs. TnT2 5.178 7 4.852 0.016 Yes
GeForce vs. V2-3 5.045 7 4.911 0.014 Yes
GeForce vs. TnT2Ultra 4.087 7 4.240 0.053 No
GeForce vs. TnT 4.038 7 3.970 0.086 No
GeForce vs. Matrox 3.074 7 2.779 0.444 No
With this tougher test, we now see that the
GeForce is only better than 3 cards: Rage,
TnT2 and V2-3. The Ultra has jumped into "no
difference" and why the TnT held up so well
is a mystery. By a whisker, the G400 boards
also skipped the threshold and Matrox users
can feel smug Again, statistical tests are
only as good as the soundness of the data
collected. None of the above would have any
meaning if it can be shown that there is a
flaw in the data sets or the way they are
manipulated. It is still possible that with
more data for each group, comparison results
may be different so this exercise should be
considered work in progress. Analysis was
done with SigmaStat 2.03.
(http://www.spss.com/software/science/sigmastat/)
Thanks for your previous comment, Jim.
Subject: Re: Round 3 & Final
statistical analysis of Avsim Fsbench data
From: Walt Bertram <wbertram@worldnet.att.net>
Date: Fri, 03 Dec 1999 00:36:49 -0500
Jim, I question whether the broad brush analysis you applied is significant. It seems you are still comparing apples and oranges on one hand, and peaches and apricots on the other. I took a quick look at the copy of FSBench(*) data that I received a few days ago. First, a quick look shows that there is a lot of bad data in this dataset. In it there are 5 systems reported which used the GeForce chipset (there was a 6th report, but the clock frequency was not reported, so I discarded it). There were 13 systems reported that used the TNT2 chipset and that had a clock frequency greater than 400 MHz. Two of those were identical copies of another entry, so I removed those two, leaving 11. The average clock frequency of the GeForce set was 567 MHz, and that of the TNT2 set was 456 MHz. A significant difference, which could account for the difference in average FPS of the GeForce vs. the TNT2. These data are summarized below. The FPS used is that in the 4th column of FPS data. FPS Clock 100*FPS/Clock Chipset 18.7 450 4.16 nVidia GeForce 256 23.2 500 4.64 nVidia GeForce 256 16.5 550 3.00 nVidia GeForce 256 24.3 600 4.05 nVidia GeForce 256 32.3 733 4.41 nVidia GeForce 256 ------------------------------------------ 23.0 567 4.05 Average of GeForce 0.63 Std Dev, = 16% FPS Clock 100*FPS/Clock Chipset 16.6 400 4.15 nVidia TNT2 14 400 3.50 nVidia TNT2 16.1 400 4.03 nVidia TNT2 16.6 400 4.15 nVidia TNT2 14 433 3.23 nVidia TNT2 19.1 450 4.24 nVidia TNT2 14.2 450 3.16 nVidia TNT2 16.3 466 3.50 nVidia TNT2 17.8 466 3.82 nVidia TNT2 22.2 550 4.04 nVidia TNT2 17.1 600 2.85 nVidia TNT2 ----------------------------------------- 16.7 456 3.70 Average of TNT2 0.47 Std Dev, = 13% The difference in 100*FPS/Clock is 0.35, or 9.5%. This is a difference of about 0.6 std dev. Is such a difference statistically significant? Walt *All data used herein provided by flight sim enthusiasts throughout the world, as reported at FSBench, the flight sim benchmarking site, http://avsim.com/fsbench.
Subject: Round 2 Statistical
Analysis of FPS Data from Avsim.com
From: "Jim Ho" <jho@dres.dnd.ca>
Date: 1 Dec 1999 17:44:32 GMT
On Monday, analysis was done with FPS data from simflight.com and as someone has noticed, a few ambiguities in the results were probably a function of the noisy data. Similar tests were done with what appear to be better behaved data from Avsim.com and shown below are the results. 1. Does MHz affect FPS performance? Answer: Yes. Spearman Rank Order Correlation Coefficient = 0.831; there is a high relationship between faster CPU and better FPS. 2. Which video board performs better? Answer: The GeForce 256 is significantly better than all others. Other than that it's difficult to say if the others perform differently from one another. All Pairwise Multiple Comparison Procedures (Student-Newman-Keuls Method) : Comparison Diff of FPS Probability 95% Sig. Diff. GeF 256 vs. Banchee 6.921 0.013 Yes GeF 256 vs. Rage128 6.886 0.014 Yes GeF 256 vs. V3 6.331 0.003 Yes GeF 256 vs. TnT2 6.227 0.004 Yes GeF 256 vs. TnT 6.163 <0.001 Yes GeF 256 vs. TnT2 Ult 4.352 0.009 Yes TnT2 Ultra vs. Banch 2.569 0.57 No TnT2 Ultra vs. Rage1 2.534 0.538 No TnT2 Ultra vs. V3 1.979 0.32 No TnT2 Ultra vs. TnT2 1.875 0.313 No TnT2 Ultra vs. TnT 1.811 0.099 No TnT vs. Banchee 0.758 0.989 No TnT vs. Rage128 0.723 0.971 No TnT vs. V3 0.168 0.988 No TnT vs. TnT2 0.0643 0.96 No TnT2 vs. Banchee 0.694 0.977 No TnT2 vs. Rage128 0.659 0.927 No TnT2 vs. V3 0.104 0.938 No V3 vs. Banchee 0.59 0.928 No V3 vs. Rage128 0.555 0.743 No Rage128 vs. Banchee 0.0355 0.986 No 3. Of the 3 CPU types (P2-3, Celeron and AMD) which one performs better? Answer: A. P2-3 vs Celeron no difference Compare FPS means P2-3 = 15.457 Cel = 16.192 B. Celeron vs AMD, Celeron is better. Failed variance test so Mann-Whitney Rank Sum Test was used. Compare FPS median Cel = 15.935 AMD = 10.450 The difference in the median values between the two groups is greater than would be expected by chance; there is a statistically significant difference (P = 0.025) C. P2-3 vs AMD, P2-3 is better. Equal Variance Test Failed so used Mann-Whitney Rank Sum Test Compare FPS median P2-3 = 16.18 AMD = 10.45 In conclusion, it is important that we work with good clean data and approach the task with some objectivity. Having said that, the current analysis supports the general expectation that faster MHz will yield better performance. The other commonly held "wisdom" that the type of graphics card does not matter is NOT supported. The GeForce shows significant advantage when properly compare with the others. The surprise outcome was that the P2-3 CPUs do not appear better than the Celerons. Overclockers can rejoice. It should be noted that the AMD result is distorted by having too few K3 representation and overall, the K2s dragged down the group. Thanks to Vince for the encouragement. Jim.
Subject: Re: Round 2 Statistical
Analysis of FPS Data from Avsim.com
From: "Jim Ho" <jho@dres.dnd.ca>
Date: 1 Dec 1999 17:50:48 GMT
Sorry about the messy table; here is a cleaner version. All Pairwise Multiple Comparison Procedures (Student-Newman-Keuls Method) : Comparison Diff of FPS Probability 95% Sig. Diff. GeF 256 vs. Banchee 6.921 0.013 Yes GeF 256 vs. Rage128 6.886 0.014 Yes GeF 256 vs. V3 6.331 0.003 Yes GeF 256 vs. TnT2 6.227 0.004 Yes GeF 256 vs. TnT 6.163 <0.001 Yes GeF 256 vs. TnT2 Ult 4.352 0.009 Yes TnT2 Ultra vs. Banch 2.569 0.57 No TnT2 Ultra vs. Rage1 2.534 0.538 No TnT2 Ultra vs. V3 1.979 0.32 No TnT2 Ultra vs. TnT2 1.875 0.313 No TnT2 Ultra vs. TnT 1.811 0.099 No TnT vs. Banchee 0.758 0.989 No TnT vs. Rage128 0.723 0.971 No TnT vs. V3 0.168 0.988 No TnT vs. TnT2 0.0643 0.96 No TnT2 vs. Banchee 0.694 0.977 No TnT2 vs. Rage128 0.659 0.927 No TnT2 vs. V3 0.104 0.938 No V3 vs. Banchee 0.59 0.928 No V3 vs. Rage128 0.555 0.743 No Rage128 vs. Banchee 0.0355 0.986 No This server does not handle tables well. Jim.
23Nov99 For those who might be interested, here is the breakdown of all Reports submitted, graphed by date and by the sim reported. Looks like FS98 is the winner so far, with FS2000 coming up fast.

18Nov99 Some interesting new analysis of the FS2000 Benchmarks have been posted on forums and sent to me lately. Here they are. Marcin's analysis used the 1024x768 data:
What does your frame rate really depend on?
Hello flightsimmers,
I agree the frame rate is not the most important thing during your flights, however nobody wants to watch the FS2000 slide show. Recently I took some FS2000 benchmarks from http://www.avsim.com/fsbench and made the data recapitulation. To be able to compare different systems (with different CPU clocks) I used the FPS / CPU clock *100 factor (frames per second for every 100 MHz). I excluded the AMD K6-2 results (read later why).
FPS for every 100 MHz with different graphic boards (average results):
Banshee 3.73
Voodoo3 3.68
GeForce256 3.58
TnT2 3.57
TnT
3.55
Rage128 3.53
G400 3.27
All current boards do well (or badly if you want). As you can see, the frame rate hardly depends on the type of graphic board. Saying more, the best boards (those that do the best benchmarks on Quake, Unreal etc.) don’t necessarily have to be the winners with FS2000. This situation can change when your CPU will pump 50 or 100 frames every second (the graphic board will be the bottleneck). You will need Pentium 1000 MHz or a new CPU architecture for this I’m afraid. In my opinion the image quality, not speed, should decide about the graphic board for FS2000.
FPS for every 100 MHz with different CPU clocks:
266 MHz 3.73
400 MHz 3.42
450 MHz 3.41
500 MHz 3.57
600 MHz 3.26
Every 100 MHz of your CPU can pump about 3.5 frames more. Remember – the numbers regard the particular FS2000 setup (as described at http://www.avsim.com/fsbench).
What FPS should you expect with the 450 MHz CPU?
TnT or TnT2 15.1
Voodoo3 15.7
Banshee 16.1
It would be about 35 FPS with future 1000 MHz CPU if nothing else will change.
But what about the CPU type? FPS for every 100 MHz for different CPU types:
K7 3.70
Celeron 3.53
P II 3.46
P III 3.43
K6-2 2.51
Only one looser: K6-2 (probably due to the weak FP unit). That’s why it is excluded above.
This recapitulation doesn't regard some other important factors like the motherboard chipset, memory type, graphic port type, sound board and others, however it seems that at the moment the best way to make your flights smoother is to use fast (clock and FP unit) CPU.
Marcin Slawicz
mslawicz@polbox.com
In all I think Marcin's analysis and Jos's graphs are convincing proof that the ONLY thing that matters to FS2000 performance is the raw clock rate of your CPU. The AMD K6-2 is a poor performer, and the Athlon might be a slightly better performer than the rest.
A personal big thanks to both Marcin and Jos for analyzing and plotting the data!