FiringSquad: Home of the Hardcore Gamer - Games, Hardware, Reviews and NewsSubmit your own or view users' CPU overclocking results!

  
 Home   News   THE MATRIX   Deals   Hardware   Games   Features   Media   Products   Forums   FS China 
AddThis Social Bookmark Button

Home : Matrix : Blogs : by indigo196 : NVIDIA 8800GTX EXPOSED AND EXPLOITED
» Join the Greatest Gaming Community NOW! (It's free)

Already a member? Login
 

  Media-Blog Entry User Public Matrix Page Matrix Home
indigo196
into the unknown we go

Permanent Link:
ACTIONS »
- View Profile
- Return to User's Matrix Page
Please login to participate in the Matrix. Login here
 


          CLUSTERS (4)
 
 
View the Gigabyte Cluster Cluster Page Gigabyte Cluster  Talk to me in my Shout! Box

View the Linux Games Cluster Page Linux Games  Talk to me in my Shout! Box

View the Computer Security Cluster Page Computer Security  Talk to me in my Shout! Box

View the Scripting and Programming Cluster Page Scripting and Programming  Talk to me in my Shout! Box

See all available clusters

          FRIENDS (15)
 
 
View Eriond's User Page Eriond (125) Talk to me in my Shout! Box

View Yoda_Blues's User Page Yoda_Blues (263) Talk to me in my Shout! Box

View DaugWok's User Page DaugWok (85) Talk to me in my Shout! Box

View DanT's User Page DanT (484) Talk to me in my Shout! Box

View OgreFade's User Page OgreFade (150) Talk to me in my Shout! Box

View Ging9's User Page Ging9 (83) Talk to me in my Shout! Box

View FS-Pongky's User Page FS-Pongky (1136) Talk to me in my Shout! Box

View Knuckles's User Page Knuckles (1300) Talk to me in my Shout! Box

View FS Demo's User Page FS Demo (40) Talk to me in my Shout! Box

View CanadaDave's User Page CanadaDave (303) Talk to me in my Shout! Box

View kevinSpiess's User Page kevinSpiess (156) Talk to me in my Shout! Box

View rubyofoz's User Page rubyofoz (1) Talk to me in my Shout! Box

View lasan of twain's User Page lasan of twain  Talk to me in my Shout! Box

View sonic64bit's User Page sonic64bit (5) Talk to me in my Shout! Box

View acreade's User Page acreade  Talk to me in my Shout! Box




          VIEWING MEDIA-BLOG ENTRY
 
5 entry(ies) in this category  
Note: You must be logged in to rate this media blog. » Login Average rating »  86 % - 64 User(s)
cool NVIDIA 8800GTX EXPOSED AND EXPLOITED (25 comments )
by: indigo196 (253) | Posted in cluster Round 3 Editors Challenge Sponsored by Intel
Subject: http://www.gpgpu.org
Posted 27 months ago ( edited 27 months ago ) in category DEFAULT

» MEDIA (9)
Click to view full-resolution version
GPU vs CPU Floating Point Operations

Click to view full-resolution version
GPU vs CPU

Click to view full-resolution version
BLAS SEGMM

Click to view full-resolution version
2D Complex FFT

Click to view full-resolution version
European Option Pricing Black-Scholes

Click to view full-resolution version
7900 GPU Diagram

Click to view full-resolution version
8800 GPU Diagram

Click to view full-resolution version
thread processor

Click to view full-resolution version
CUDA software stack

Introduction
The G80 series of cards being produced by NVIDIA provides raw processing power that was previously only available in server clusters or mainframe computers. This power is most often used to produce visually stunning and detailed environments for games, but recent advancements by companies such as RapidMind, PeakStream and Havok prove that these GPUs can be used for a great variety of math intensive computing. I appreciate gorgeous graphics as much as the next gamer, but I would love to see advances made in AI, environmental physics and other elements that improve the immersive quality of the games I play.

General-Purpose computation on GPUs (GPGPU) has recently come to the forefront of technical news, despite getting its start back in the later 1970s.[1] In fact, some experts have labeled GPGPU as one of the “5 Disruptive Technologies To Watch in 2007”.[2] Companies such as PeakStream, Acceleware and RapidMind have achieved astonishing results on 7900GTX cards, the predecessor to the G80 series, with implementations running 120x faster than CPU code. Havok announced a partnership with NVIDIA in early 2006 to produce Havok FX that leverages Shader Model 3.0 class GPUs to enable collisions of thousands of objects in real-time using the GPU instead of the CPU.

To demonstrate the potential that the 8800GTX holds to improve games, I have included some detailed information on the GPGPU achievements that were made using the 7900 series of cards. These achievements are astounding in their own right, but when you compare the architecture of the 7900GTX to that of the 8800GTX you may find it hard to contain your enthusiasm. The fact that these examples are all non-gaming applications should make even the most dedicated gamer proud that their hobby may assist man in solving medical mysteries.

Example: Acceleware
Acceleware was established in 2004 and provides solutions that leverage the power of GPUs to increase performance and processing power. Their intended markets are cell phone manufacturing, energy, seismic, biomedical, fluid dynamics, pharmaceuticals, industrial, and military companies. They created a solution for Boston Scientific that supercharged their simulations by a factor of 25 when compared to CPU based simulations.[3] These simulations allow Boston Scientific “to investigate the influence and mutual dependency of several design variables”.[3] The result is the improvement of MRI devices that will improve the ability of doctors to diagnose patients.

Example: RapidMind
RapidMind is a company based in Waterloo, Canada, that is built on over five years of advanced research and development. The company was formed in 2004 to commercialize the research of Sh that was started at the University of Waterloo. Sh is a library that acts as a language embedded in C++ that allows programmers to use GPUs for general purpose computations. RapidMind has taken the knowledge gained from the development Sh and created the RapidMind Development Platform that makes parallel programming as easy as single-threaded, single core programming. To show the strength of their solution, RapidMind produced three benchmarks: BLAS SGEMM routine, 2D complex-to-complete FFT routine and a quasi-Monte Carlo evaluation of the Black-Scholes option pricing model. These benchmarks were run on a 7900 GT based GPU and high-end workstation or server-class CPUs. The most impressive result was obtained in the European Option Pricing benchmark which showed the RapidMind GPU implementation to be 120x faster than the original CPU code.[4] RapidMind itself claims that “RapidMind–enabled applications have achieved performance increases of 3x to 30x”.[5]

Example: Havok FX
Havok was founded in 1998 in Dublin Ireland and provides software and services for digital media creators in both games and movies industries. At GDC06 Havok FX was announced jointly by Havok and Nvidia. Havok FX is an add-on which allows programmers to leverage the power of GPUs supporting Shader Model 3.0 to produce stunning effects that behave correctly. At GDC06 Nvidia claimed that “Havok FX running on a pair of GeForce 7900GTX graphics cards in SLI is more than ten times faster than software physics calculations running on a Pentium Extreme Edition 955”.[6] Havok FX was released in Q2 of 2006. The list of titles that use Havok software includes The Elder Scrolls IV: Oblivion, F.E.A.R. and Age of Empires III.

The G80 in perspective
All of the above examples were based on GPGPU running on GeForce 7900 graphics cards and the results are nothing short of astounding. GPGPU computation makes use of ALUs in the GPU. The 7900 GT cards had 96 ALUs clocked at 450Mhz [7900 GPU Diagram] while the 8800GTX has 128 ALUs clocked at 1.35Ghz.[thread processor] Let that sink in slowly - 1.3x the number of ALUs each running at 3x the speed. The GeForce 8800GTX actually divides those 128 processors up in to 16 multiprocessors [8800 GPU Diagram]. The 8800GTS has 96 ALUs clocked at 1.2Ghz each grouped in to 12 multiprocessors.

I found some very technical benchmarks done that compared the NVIDIA 7900GTX (G71), NVIDIA 8800GTX (G80) and the ATI X1900XTX (R580) published by Mike Houston of Stanford University.[7] These benchmarks are very technical but do show that the 8800GTX is more powerful than either the 7900GTX or X1900XTX cards.

Thanks DirectX 10!
The reason for the explosion in the useable shaders on the 8800GTX is the DX10 requirement of unified shaders, the geometry shader requirement and no more fixed function components. This resulted in GPUs that are not divided up into ‘x’ number of vertex shaders and ‘y’ number of pixel shaders. The elimination of capability bits will also force vendors to produce cards that meet the same basic requirements, removing the variations in floating-point formats that existed under DX9. This consistency will reduce the confusion that developers faced in utilizing the previous generation of hardware.

CUDA: A New Architecture for GPU Computing
CUDA stands for Compute Unified Device Architecture and is a new hardware and software architecture that enables the GPU to be used as a data-parallel computing device without the need to map to the graphics API. CUDA is an extension of the C programming language which should allow for a minimum learning curve for developers. CUDA is available on the GeForce 8800 series and future products.

Game Development Potential
The GPGPU results from Acceleware and RapidMind coupled with the work of Havok in the arena of games proves that there is potential in harnessing the power of the GPU beyond making games visually stunning. Havok has already started to improve the implementation of physics in game environments, but that is only one part of a game. This next part is theoretical on my part and I will suggest areas in which some of today’s games could be improved by tapping in to the power of data-parallel programming on the GPU.

Neverwinter Nights 2 and other single player games
The single player experience in Neverwinter Nights 2 is hampered by the poor AI that controls your companions. The path-finding AI works for most of the open areas, but fails miserably when your party is exploring dungeons, underground caverns or building interiors. Computer controlled companions often get stuck on terrain or simply lost leaving you to get trounced by encounters created for a party of four. While you could simply pause the game and make individual adjustments, that process breaks the level of immersion in the game. The AI also has problems while controlling spell-casters, allowing your companions to burn through their offensive spells in situations that do not require them and failing to have them use healing spells when party members are on the brink of death. Given the performance improvements that were shown above, I have to wonder how much more realistic the AI could have been if the developer had been able to make use of the computational power in the 8800GTX.

F.E.A.R. and other FPS games
F.E.A.R. is a game that relies heavily on the spooky factor. As a player you are immersed with creepy atmospheric environmental effects such as steam, smoke and particles floating in beams of light. AI in F.E.A.R. was some of the best seen in recent shooters as well as the physics effects from shooting bad guys. Injury, death and environmental damage were handled elegantly. So why do I bring this game up? Simple. More could still be done. Imagine using the power of the GPU to generate a dynamic map as the result of chemical spills or burning liquids applied in real-time. The immersion level in the game would be greatly increased.

World of Warcraft and other MMOs
Economy is always an issue in MMOs and no one ever seems happy about how an in-game economy is modeled. Certainly game developers struggle to achieve a realistic economic system that can react to unanticipated fluctuations caused by players. In this context I think about the RapidMind benchmark for European Pricing Options that ran 120x faster on a 7900 based GPU than the original benchmark did on a CPU. Apply this muscle to controlling the actions of NPC traders and MMO economies would take on a complex life of their own that react to player-induced trading frenzies.

Ballistics Report

Pros
• The 8800GTX has 128 ALUs vs 96 ALUs on the 7900GTX and they are also clocked 3x higher
• NVIDIA has released the CUDA SDK to assist developers in exploiting the power of the GPU in GPGPU programming
• Companies like RapidMind, Acceleware and Havok are making it easier to implement GPGPU strategies
• GPUs have far outstripped CPUs in processing Floating Point Operations
• GPUs have a large installed base that add-on cards would have to build
• GPUs would be cheaper to use on server-side implementations than buying server clusters

Cons
• GPGPU programming remains difficult and requires programmers to think differently about their applications
• The power of DX10 compatible parts is crucial to expanding GPGPU implementations due to explosion of shaders required to meet the specification, but the installed base of DX10 cards in the near future will be low

Final Verdict – 100% excitement about possibilities
GPGPU implementations show greatly improved processing capabilities over CPU solutions and the introduction of DX10 compatible parts should increase that. Companies like RapidMind, Acceleware and Havok are making it easier for traditional programmers to leverage GPUs in their applications. NVIDIA and ATI, with CUDA and CTM respectively, are building tools to expose their GPUs to a greater extent to GPGPU programmers. The 8800GTX is a tremendous leap forward in computational power for GPGPU applications both in the world of computer games and in real-world simulations. It gives me a warm fuzzy feeling knowing that the power of my GPU, which so often sits wasted while I perform common task like reading Firingsquad.com, could be used in programs similar to Folding@home to cure diseases.

[1] History of GPGPU -- http://www.gpgpu.org/data/history.shtml
[2] 5 Disruptive Techologies To Watch In 2007 by David Strom -- http://www.informationweek.com/internet/showArticle.jhtml?articleID=196800208
[3] Acceleware and Boston Scientific -- http://www.nvidia.com/object/acceleware_boston_scientific_success.html
[4] RapidMind GPU Evaluation -- http://rapidmind.net/case-gpu.php
[5] RapidMind -- http://rapidmind.net/product.php
[6] The Tech Report -- http://www.techreport.com/onearticle.x/9610
[7] Understanding GPUs Through Benchmarking -- http://www.cse.ohio-state.edu/~kerwin/GPGPUPerformance.pdf

(« prev) 11 of 11 (next ») In cluster: Round 3 Editors Challenge Sponsored by Intel » Flag this
Note: You must be logged in to rate this media blog. » Login Average rating »  86 % - 64 User(s)


25 User Comment(s) • 13 root comment(s)
Page 1 of 3Next Page
Click to view 's User Page  ()  Click to view 's User Profile Talk to  in the Shout! Box Apr 07, 2007 - 05:05 pm
All in all a well wrote artical. Just wish this site would let me log in, might use it a bit more. Shiera

» Login to reply to this


Click to view Johnny The Bravo Maestro's User Page Johnny The Bravo Maestro (3)  Talk to Johnny The Bravo Maestro in the Shout! Box Apr 07, 2007 - 04:05 am
Throughly enjoyed reading your article! very well written, reserched and refrenced. Good luck.

» Login to reply to this


Click to view indigo196's User Page indigo196 (253)  Click to view indigo196's User Profile Talk to indigo196 in the Shout! Box Apr 06, 2007 - 09:34 am | Edited on Apr 06, 2007 - 09:38 am
Thanks to all the folks who have come, read the article and voted. I also appreciate those of you who have offered comments on the articles.

Thanks!

» Login to reply to this


Click to view Arturo02's User Page Arturo02 (5)  Talk to Arturo02 in the Shout! Box Apr 05, 2007 - 02:03 am
I think the writing in this article is the best so far Buc has written. It flows well, and has a good synergy.

If I knew more about computers I could be more technical but my forte is writing. That part you did well on, the tech stuff I defer to others to comment on. :)

» Login to reply to this


Click to view GrapeApe's User Page GrapeApe (36)  Click to view GrapeApe's User Profile Talk to GrapeApe in the Shout! Box Apr 04, 2007 - 11:30 pm | Edited on Apr 06, 2007 - 06:12 pm
» Good style just needs a little tweak.
Good writing style, but needs some minor fact checking, of which I'm sure Brandon et al. will gladly help in the future.

Just some quick points for FYI;

- Havok FX is part of the Havok 4 engine, there are no games out yet based on Havok 4 yet (although a UBI demos were shown in CA last month), all the examples you listed are based on Havok 3 which do not have support for Havok FX (although they could be added with alot of effort). Crysis is supposed to use it's own physics engine that's supposed to support VPU physics, which would've been a good example to go just outside the Havok realm.

- The statement "The 8800GTX is a tremendous leap forward in computational power for GPGPU applications both in the world of computer games and in real-world simulations." actually is not true in overall general computational power as the R580 has more power when using CTM and the MADD + ADD scenario (about 5-10% more).

But overall good article, good flow and nice combination of examples and good projection of future application of the technology.

Edited because after re-reading the opening statement looked harsher than I intended it to be. It's not a big criticism, since often we have to remind FS and other reviers of things to tweak/correct, and they update their articles. Too bad you guys can't do FS approved changes to update stuff in the same manner they do after their reviews have launched I know you and Dave would both like to just tweak stuff.... like adding more flair. >B~)

» Login to reply to this
Click to view indigo196's User Page indigo196 (253)  Click to view indigo196's User Profile Talk to indigo196 in the Shout! Box Apr 05, 2007 - 04:36 am | Edited on Apr 05, 2007 - 05:06 am
Grape:

I should have been more clear about the fact that the title I listed were using the previous version of Havok that did not include Havok FX.

On the computational power of the R580 (X1900XTX) though I based my statement on the information in the PDF at http://www.cse.ohio-state.edu/~kerwin/GPGPUPerformance.pdf. These were benchmarks though and I did not find performance information on CTM or MADD + ADD. Do you have links to that information? I have started to dig much deeper on this topic so I would love to read about that stuff. I also did not find a comparison between CTM and CUDA -- so if you know of anything about that it would be great.

» Login to reply to this
Click to view GrapeApe's User Page GrapeApe (36)  Talk to GrapeApe in the Shout! Box Apr 06, 2007 - 06:00 pm
Well CTM and a few other apps (IIRC F@H and peakstream [mentioned in your article]) let you access the MADD + ADD feature of the R580 shaders, after which it becomes the mathimatical limits;

R580: Has a single MADD/MUL/ADD (2flops) ALU and single ADD (1flop) ALU per shader unit with 4 components each (Vec3 + Scalar). So 48 x (2+1) x 4 x 650mhz = 374.4 Gflops for the XTX and Crossfire Editions.

G80: 128 simple scalar stream ALUs at 1350MHz with MADD (2flops only), gives you about 345.6 Gflops for the GTX.

Which is 8.3% or the 5-10% range difference I mentioned above.

nV mentions a higher 520 Gflop number but it's not exposed for other computational uses in CUDA... yet at least. The feeling is that they are accounting for other computational components, however it has led rise to the idea tha there is this hidden bunch of shaders left to be 'unlocked' by nV, which IMO is unlikely, but a case could be made for that if they made a part far stronger than they expected and knew they didn't need to launch it at 100% and then give themselves something to have as a reply to the R600. This lets them slow down their pace of R&D if they already hit a home run. However that is very VERY unlikely IMO.

» Login to reply to this
Click to view indigo196's User Page indigo196 (253)  Click to view indigo196's User Profile Talk to indigo196 in the Shout! Box Apr 06, 2007 - 06:22 pm
Yeah... the interesting part is that the link to the performance tests I posted showed some significant gains in the G80 over the R580... I did find some sites claiming that the theoretical Gflops for the G80 is 520, but that the max they have achieved so far is 330. I could not verify that stuff with second and third sources so I did not include it.

» Login to reply to this





Page 1 of 3Next Page

POST A COMMENT

» Note: You need to be logged in to write a comment!

Login here, or if you don't have an account with FiringSquad, register here, it's FREE!


My Media-Blog categories
DEFAULT (5)VIEW
Game Reviews (2)VIEW

» Return to indigo196's Matrix Page