Nowadays, commodity computers are complex heterogenous systems that provide a huge amount of computational power. However, to take advantage of this power we have to orchestrate the use of processing units with different characteristics, such as the general purpose multi-cores and GPUs. Moreover, these heterogeneous systems can be interconnected to form a cluster of heterogeneous nodes, and once again exploiting the available comutational power brings the same type of problems, at a different level. A collaborative execution environment [1] is presented for exploiting data parallelism in a heterogeneous system composed by CPUs and GPUs, and the extension of CUDA is proposed, for using it in clusters of message-passing systems (MPI-CUDA [2]), in order to take advantage of clusters of these types of heterogeneous nodes.