Specifications
What do we know about the rest of the specifications? They haven´t been regrouped completely by Microsoft in a marketing documents but have been disseminated in several documentation send to developers. We have regrouped in a table as much specifications as we could find and added the specifications of DirectX 9, the evolution in SM 3.0 and the two architectures that support them additionally to Microsoft´s announcement about DirectX 10.1.

Just like with the release of each new versions of DirectX, there is more of everything: instructions, registers etc. The objective is to avoid developers to be restricted by the possibilities of the Shader Model, here in version 4.0 without imposing a useless costly complexity at the GPU level. As you probably have noticed, basic specifications are now similar (just like the set of instructions) between pixels shaders and vertex shaders (and geometry shaders of course). This is what Microsoft calls unification of shaders (this isn´t at the hardware level!).

The 3D3 10 calculation unit seen by MicrosoftMore details…
The number of instructions increases from 512 to 65536 (128x more) and the number of executed is not unlimited. Just to remind you, the number of instructions executed might be higher because of the loops that repeat a series of instructions. The number of temporary registers jumped from 32 to 4096, but of course as it is the case with 32, GPU manufacturers won´t have to integrate as much in hardware, at least with optimum performances. The driver will have to be able to support a shader that uses as much registers and modify it to take in account hardware restrictions.
One of the most important evolutions of this DirectX is about constant and their updates. Everything has been reviewed to make their use more flexible while reducing the CPU cost of their management and without touching the performances of their accesses. The constant and textures represent a memory access and could have been unified, but the constraint for access and performances are very much different and still justify their separation.
Each element, whether if it is pixel or shader, has in the beginning a maximum of 16 registers as compared to 10 for pixels with DirectX 9. In the case of a vertex, these are basic data used for rendering that come from the CPU and from the objects to render, which are organised by the Input Assembler which is an improved version of what is currently done with Geometry Instancing. For a pixel, there are mainly interpolated data, texture color and addresses. The things get a little bit more complicated with the geometry shaders since they work on primitives. They have to be able to accept in input, data of 3 vertices (triangles) but also from adjacent vertices: 6 x 16 registers 4x FP32 and that is enormous. Geometry shaders can transfer up to 32 registers to pixels or 16 more than without them. We don’t know however for sure if geometry shaders can modify these 32 register or if they have to let pass the 16 original data that come from vertex shaders and possibly add to them up to 16 others.
The access to textures has evolved. Today, the number of texturing instructions is already unlimited (but restricted by the maximum amount of instructions) but not the number of textures supported and the mode to access textures (= number of samplers) which are set at 16. For example, texture 1 and trilinear filtering requires one of the 16 possibilities of access. With DirectX10, textures and samplers are separated. The number of samplers is still of 16 (in other words, one shader can use 16 modes of filtering) but the number of textures increases from 16 to 128, this is also the case of the vertex shader (4 currently for GPUs that meet the shaders 3.0 specifications) and the geometry shaders. These textures can be up to 8192x8192 pixels as compared to 2048x2048 currently requested even if recent GPUs all support textures of 4096x4096 pixels (this size is sometime a problem if the GPU can´t find a big enough free memory space to place the texture, which often happens in FP32). FP16 filtering finally becomes required (the GeForce 6 and 7 support it but not the Radeon X1000) as well as shadow map access and filtering (PCF, percentage closer filtering, only supported by Nvidia). This is also the case of all type of shaders (no filtering in current vertex shaders). That is not all because a new type of access is added: load. The sampling of a texture consist in taking the closest texel to a certain value (point sampling) or the group of texel that is close to this value (bilinear or anisotropic filtering). Load consists in recovering a very specific texel. This facilitates the use of textures for data storage other than images.
The calculation accuracy is still FP32 but has also improved. GPUs manufacturers can currently support FP32 just like they want: round-off accuracy, special number support (NaN, +/-Inf etc.)… This can be a bit of a problem for developers that see different behaviour for shaders from one GPU to another. For example, one current implementation can replace a NaN (for example 0/0 = NaN) by 0. This simplifies the design of calculation units and facilitates basic 3D rendering that can more easily deal with a concrete 0 than a NaN. It doesn´t however correspond to the usual floating point calculation and Microsoft has decided that to facilitate the evolution, it would be best to force the support of these special numbers. Several other similar points have been chosen to get as close as possible to the IEEE 754 that is found in CPUs without completely following the specifications of the IEEE 754 (this would have been a useless additional cost). The relative error can be more important than in IEEE 754. You should note that Microsoft hasn´t only defined the behaviour of units that processes floating point number for calculation units but also for texture and blending filtering units.
Another major innovation of Direct 3D 10 is the integration of the complete support of 32 bits integers additionally to floating points. The support of integers is useful in many situations, for development and goes along with the support of binary operators. This is a precious tool for developers that now have a set of operations closer and closer to the CPU.
The number of render target increases to 8. This means that a DirectX 10 GPU will be able to write in memory 8 values additionally to Z-data, instead of 4 today. FP32 blending becomes obligatory whereas FP16 wasn´t in DirectX 9 even if it was supported by most of the SM 3.0 cards (except for the 6200). The support of multisample antialiasing still is optional. When it is supported, however, manufacturers have to make possible the reading of a multisampled render target like a standard texture. With current GPUs it is impossible and such a render target must be downsampled before being used again. The support is very complex because of the data compression algorithms that are the base of MSAA.

Direct 3D 10 isn´t released yet that Microsoft already speaks of the successor! Direct 3D 10 hardware will work with Direct 3D 10.1 even if it won´t be able to exploit all its possibilities. This is also the case for a DX8 GPU with DX9. DX9 GPUs won´t however work with DX10 and D3D 10 games will have to integrate a D3D 9 rendering to support them. MSAA 4x will be required with Direct 3D 10.1 (all of ATI and NVIDIA´s GPUs support it but it isn´t the case of Intel´s products, S3 etc). With this new version of Direct 3D, Microsoft will finally have the possibility to specify in detail the functioning of antialiasing and to give more control to developers. This hasn´t been done yet probably because of a lack of time and to avoid any delay for the release of DirectX 10. Some of the Direct 3D 10 GPUs will possibly support this more than advanced management of antialiasing (by the way, the NVIDIA G80 will have a brand new antialiasing engine).
The management of FP32 texture filtering will be necessary and Microsoft speaks of increased calculation accuracy but hasn´t said if it was a FP32 even closer to the IEEE 754 or another format. Blending will also evolve to increase in flexibility. We can roughly suppose that Direct 3D 10.1 will represent an evolution of the remaining fixed units of GPUs. Is it before making them completely programmable too in a future version?