High Performance Computing (HPC) represents any computational activity requiring more than a single computer to execute a given task. As example, we can cite supercomputers and clusters that are used to solve advanced and complex computation problems. HPC has also the capacity to handle and analyze massive amounts of data at high speed. In our research, we used HPC to explore three major issues (see sub-topics): HPC for large and distributed data applications, HPC programming, Fault tolerance and HPC.
|HPC for large and distributed data applications||
Due to the exponential growth of data, today there is an ever increasing need to process and analyze big data. The huge size of the available data and their high-heterogeneity and high-dimensionality make large-scale data mining applications computationally very demanding. Moreover, the quality of the data mining results often depends directly on the amount of computing resources available. In fact, data mining applications are poised to become the dominant consumers of supercomputing in the near future. There is a necessity to develop e?ective parallel algorithms for various data mining techniques. However, designing such algorithms is challenging. In this sub-topic, we try to develop and experiment data mining algorithms that are able to benefit from the capabilities of HPC infrastructures.
While discussions of HPC architectures have long centered on performance gains, that is not the only measure of success of HPC. The programmability of these architectures is as important as performance. Because the high heterogeneity of HPC architectures, their programmability concerns various HPC platforms: Multiprocessors, Clusters, Grid Computing, Multicores, etc. Today’s Petascale computing algorithms and applications are not structured to take advantage of these architectures. Recent studies estimate that 80-90% of applications use a single level of parallelism. Hence, it’s important to revisit old parallel programming models and languages to define parallel programming models for HPC. One approach is to use hybrid parallelism, i.e. parallelism at multiple levels with multiple programming models (MPI, OpenMP, MapReduce, Hadoop, Spark, CUDA, etc.), will be used in many applications. Hybrid parallelism may emerge because application speedup at each level can be multiplied by future architectures.
|Fault tolerance and HPC||
With the emergence of Petascale systems, fault tolerance has received a lot of attention over the last decade. Most the existing fault tolerance techniques for parallel and distributed applications have been applied to high computing applications. Unfortunately, despite their relative success, existing approaches do not fit well with the challenging evolutions of large-scale systems. In a development of a fault tolerance technique for HPC architectures, scalability and performance are the most two important aspects that we need to take to our consideration. The goal of this sub-topic is to develop scalable fault tolerance techniques to help to make future high performance computing applications self-adaptive and fault survivable.