Digit Oktavianto Web Log

Catatan Sampah si Digit

Wawancara David Kirk Chief Scientist NVIDIA : GPUs Help Spread Parallel Computing

| Comments

Ini saya ambil petikan wawancara dari http://techon.nikkeibp.co.jp/article/HONSHI/20080729/155654/?P=2 karena saya pikir prnjrlasan dari David Kirk sangat bagus sekali. Selamat menikmati. :)

Graphics processing units (GPU) are evolving to provide a diverse range of image processing functions flexibly and at high speed; as a result they are morphing into architectures appropriate for general-purpose calculating engines. The term “GPU computing” has been applied to the idea of utilizing their capabilities for high-speed processing of applications such as medical imagery processing. We spoke to David Kirk (right), chief scientist at NVIDIA Corp, a manufacturer of GPUs. Kirk is a leading proponent of GPU computing, and the person behind NVIDIA’s development thrust. You’ve been heavily promoting GPU computing recently.

What made you decide it was such an important development?

Parallel computing is entering a very interesting stage right now. It used to be that parallel computing environments were only available on supercomputers costing hundreds of millions of dollars, but today anyone can buy a system.

I have always been involved in computer graphics technology, such as GPU development for personal computers (PC). Now that GPUs are programmable we’re beginning to see adoption of parallel computing by general computer users. GPUs can make a real contribution to spreading parallel computing technology into new fields such as education and science. GPUs are going to be crucial tools not only in displaying imaging, but in all types of computation. There are not many successful examples of using multiple processor cores to run software in parallel, are there?

I think it’s a bit premature to categorize all those attempts as failures. The knowledge, design approaches and other tips gained are being carried on in parallel computing architectures today. There is one element that has not ever succeeded, which is educating software developers in parallel computing.

For the past 20 or 30 years, software developers haven’t needed to consider parallel computing at all, because the advantages gained from higher microprocessor operating frequencies were just there for the taking. Now that microprocessor manufacturers are switching over to multicore designs, however, parallel computing has become a basic assumption in software design. Everyone has to take it into account now.

The actual coding needed to enable parallel processing is simple. In most cases, though, that alone doesn’t provide any improvement in execution performance as the number of cores is increased. The performance improvement gained by automatically converting a sequential program intended for a single-processor system to run under parallel processing is not very impressive, either.

The key is the stage before actual program coding: analyzing the problem and visualizing the parallel task structure needed to process it. This approach to parallel processing - I refer to it as “computational thinking” - is something that software developers really need to master.

For a developer, this should be something fun, like solving a puzzle. When educating programmers it shouldn’t be the first lesson, of course, but I do think that computational thinking dealing with parallel processing should be taught in the second. Parallel computing should be introduced into all kinds of equipment, from supercomputers to mobile phones. We want programmers to utilize Compute Unified Device Architecture (CUDA) as the development environment. What are the critical architectural differences between GPUs and microprocessors?

Microprocessors have supported extended instruction sets for single instruction, multiple data (SIMD) processing for some time, and recently it has become common to mount multiple central processing unit (CPU) cores. Microprocessors are designed to execute a given sequence of instruction groups in the shortest time possible. This basic approach is the same even if the chip has multiple CPU cores.

To ensure that the CPU cores can keep on executing instructions without entering the idle state, the time spent waiting for the data has to be very short. Even in CPU cores with an out-of-order execution function, allowing instructions with data read operations to be executed first, there are usually only two computation pipelines, and the data wait time is still a critical factor. That’s why a massive cache is used. In terms of area, caches can account for 80% or more of modern microprocessors.

Caches are very different in GPUs. GPUs are designed for applications with high degrees of parallelism. In graphics processing, for example, a GPU might be used to calculate position, process shading and perform other operations on millions of triangles, for all the colors of millions of pixels. That would mean millions of simultaneous tasks. They have the threads needed to handle multiple data, and are designed from the start for parallel processing applications, so the data does not need to be cached in advance.

When a particular thread requests data, it sleeps. While the data is being read from the camcorder, a different thread that requested data at an earlier point in time and has been sleeping is executed on a different processor core. The architecture executes data read and processing pipeline-style, so much less of the chip footprint is taken up by memory. Unlike microprocessors, most of the chip area is processor core. The GPU’s processing efficiency per unit area is very high.

Existing GPUs combine the characteristics of two architectures: SIMD and multiple instruction, multiple data (MIMD). Standard SIMD operation units cannot control branching, random addressing and similar things independently, but the GPU processor core has a great degree of freedom in controlling branching and such depending on the data. Because a single program, as opposed to a single instruction, can be executed for multiple data, I refer to this as single program, multiple data (SPMD) architecture.

What are your thoughts on the Larrabee graphics IC plan announced by Intel Corp, or on the idea of AMD Inc to combine CPU and GPU cores at the instruction set architecture level?**

I don’t know the details, but I get the feeling that the Larrabee is actually a GPU designed by a microprocessor engineer. It looks to me like they are trying to develop a chip that can be used as a GPU by combining microprocessor component elements. Intel’s claim that it will maintain software compatibility with the 86-family instruction set strikes me as a bit strange, though; software developers don’t write code while worrying about the instruction set. They code in a programming language they’re familiar with, like C. By doing that, they ensure that even if the instruction set changes, all they have to do is recompile. I think they’ll have to develop new software to attain peak performance with Larrabee.

AMD’s idea of combining the CPU and GPU cores does have merit. Just by putting both cores on the same chip, the shorter interconnect will boost communication speed. It is probably the right solution for low-cost PCs, because they can go for general application, which means lower costs.

The problem is that the degree of freedom is curtailed. When a GPU is to be used for general computation, not only for graphics, freedom is assured in the sense that a designer can determine how many more GPUs must be added to improve performance significantly. This is only possible when the GPU is a separate chip, however. I think the major PC designs will continue to use separate CPU and GPU chips.

interviewed by Tomohisa Takei