One of the more intresting things to come out of it was some details on the 'global data store' (GDS), wihch while only given an overview had an intresting nugget of information in it which would have been easy to skip over.
While not directly exposed in DXCompute or OpenCL the GDS does come into play with DXCompute's 'appendbuffers' (and whatever the OpenCL version of this same construct is) as that is where the data is written to thus allowing the GPU to accelerate the process.
What this means in real terms is that if you compute shader needs to store memory which everyone in dispatch needs to get to for some reason then you could use these append buffers with only a small hit (25 cycles on the HD5 series) as long as the data will fit into the memory block. Granted, you would still need to place barriers into your shader/OpenCL code to ensure that everyone is done writing but it might allow for faster data sharing in some situations.
I don't know if NV does anything simular however, maybe I'll check that out later..
Right, back to the Webinars...