Pleiades – back on a success story

De quel type d’infrastructure informatique un chercheur a besoin ? un iPad, une station de travail, une machine à mémoire partagée ou d’un superordinateur avec des dizaines ou des centaines de milliers de coeurs ? Son application est-elle dominée par la bande passante mémoire, par la performance pure du coeur, par les communications inter-processeurs ou par des exigences logicielles spécifiques ? Au lieu d’installer un seul type d’infrastructure, il est souvent plus économique et efficace d’en installer plusieurs parce que les besoins matériels des applications sont très différents. Les clusters Pleiades ont été une tentative dans cette direction. Rideau prochainement sur un succès total.

What kind of computer does a research scientist needs? Is it an iPad, a PC, a multicores server, or a parallel supercomputer with tens, hundreds, thousands or ten thousands of processors ? Is his program dominated by pure CPU (or core) performance, by main memory access, by network communications, or by special software requirements ? All those cases demand different types of computer architectures.


Vincent KELLER

Several different computers instead of a big-size installation

What kind of computers should be installed by universities, by industry, by data or research centers? They are all expected to offer the demanded requirements with minimal expenses. The cost includes not only the computer hardware and software purchase (estimated at about one third of the overall costs for the four to five years of running), but also the needed personnel for running and supporting (about another third), the infrastructure and, very important these days, the energy needs for running and cooling.
At the beginning of the XXIst century, computer centers typically bought one machine to deliver the power to many users. This implies an installation that can satisfy all different types of requirements, increasing costs and reducing overall efficiency. Instead of one computer, it could perhaps be cheaper to install several with different architectures, each one adapted to a specific type of applications. The Pleiades cluster was an attempt in this direction, specifically to offer cost-effective computing power for parallel applications needing high main memory bandwidth and low inter-node communication.

The Pleiades cluster development

Pleiades1 started in 2003 when a CPU, originally targeted for personal computer and codenamed Intel Pentium 4 (P4), reached the high main memory bandwidth of 6.4 GB/s and became available on the market. Fortran and C compilers, the mathematical LAPACK library existed, and the vendor-independent MPICH communication library had become an international standard. After a short introductory phase with a 24 P4 machine originating from a European R&D project, a 132 P4 Pleiades1 installation was placed at the mechanical engineering department. Two institutions (the CRPP and the former ISE institute at STI faculty) agreed to combine their local computing demands. The resulting parallel computer of triple single institute performance - one part was reserved for other potential EPFL users – gave the opportunity to compute bigger cases than on the initially planned individual installations. The 132 32bits P4 standard boxes were placed on IKEA-like shelves and interconnected by a FastEthernet network. It was a typical beowulf-class cluster running Linux, administrated by the OpenPBS RMS and offering all the needed scientific software, libraries and compilers. The machine was dismantled in 2008 after 5 years of full utilization and high scientific outcome.
This idea to put the computational needs together was the origin of the new Pleiades2 cluster (installed in 2005) and its extension Pleiades2+ with altogether 17 contributing laboratories, units and institutes, and with close to 1000 cores installed. Pleiades2 was built around the new 64bits Intel Xeon CPU with a similar memory bandwidth performance to those of Pleiades1 and interconnected by a GB Ethernet network. Pleiades2 also ran Linux and was administrated by OpenPBS. This technical choice made fully transparent for the users to port their applications from the old cluster to the new one. At a hardware level, Pleiades2 racks could benefit from the newly constructed computer room equipped with two CRACs. Pleiades2+ was installed in November 2006. It was the first large extension of the cluster and the birth of the newly standard and accepted multicores architecture for the Pleiades users. The choice was the dual-core Intel Woodcrest processor. Two of them were installed in one node. This machine dramatically increased the performance of the CPU bound applications but also the memory bound applications because of the high memory bandwidth performance (21.3 GB/s). The OS still remained Linux and the RMS OpenPBS.

A new cost distribution model

For the first time, computing resources in a Swiss university computing center had to be paid by the scientists, at least part of them. A Pleiades cost distribution model was developed and can be found in [1]. The specific financial contribution of each lab was used to calculate its priority in the sense of the scheduler. The more a lab paid for computing resources, the higher the priority for its jobs. An increasingly high interest of the users – and consequently also co-owners - has been recognized, and ended up in an increasingly efficient use of the resources (the Pleiades clusters reached an overall average use of 80% over 9 years). It was possible to introduce an individual treatment resulting in the installation of an eight nodes parallel I/O system run over the speedy inter-node communication network. This parallel I/O system was eight times faster than the old NFS installation running over the overloaded frontend computer. In addition, a special effort has been initiated in helping users to improve their codes. For this purpose two courses [2] have been offered over 10 years, one of them [3] has been extended and is still being taught [4].
When the remaining and still under operation Pleiades nodes are definitively switched off in December 2012, the success story of Pleiades will not end. After 9 years of tried and true service, Pleiades offered roughly 50 millions CPU*hours to its users. This enormous amount of resources led to a large scientific outcome (hundreds of publications and tens of PhD thesis). Finally, the Pleiades experience has been used to set up the newly adopted EPFL politics for high performance computing machines at DIT.

[1] GRUBER, Ralf, KELLER, Vincent. HPC@GreenIT. Springer. ISBN 978-3-642-01788-9, 2010

[2] TRAN Trach-Minh. MPI and GRUBER, Ralf. High Performance Computing Methods

[3] TRAN Trach-Minh. MPI

[4] TRAN Trach-Minh and KELLER, Vincent, MPI and OpenMP

Cherchez ...

- dans tous les Flash informatique
(entre 1986 et 2001: seulement sur les titres et auteurs)
- par mot-clé


Cette page est un article d'une publication de l'EPFL.
Le contenu et certains liens ne sont peut-être plus d'actualité.


Les articles n'engagent que leurs auteurs, sauf ceux qui concernent de façon évidente des prestations officielles (sous la responsabilité du DIT ou d'autres entités). Toute reproduction, même partielle, n'est autorisée qu'avec l'accord de la rédaction et des auteurs.

Archives sur clé USB

Le Flash informatique ne paraîtra plus. Le dernier numéro est daté de décembre 2013.

Taguage des articles

Depuis 2010, pour aider le lecteur, les articles sont taggués:
  •   tout public
    que vous soyiez utilisateur occasionnel du PC familial, ou bien simplement propriétaire d'un iPhone, lisez l'article marqué tout public, vous y apprendrez plein de choses qui vous permettront de mieux appréhender ces technologies qui envahissent votre quotidien
  •   public averti
    l'article parle de concepts techniques, mais à la portée de toute personne intéressée par les dessous des nouvelles technologies
  •   expert
    le sujet abordé n'intéresse que peu de lecteurs, mais ceux-là seront ravis d'approfondir un thème, d'en savoir plus sur un nouveau langage.