Durham University upgrades its cosmology supercomputer to a switchless architecture with Rockport Networks

Durham University’s Institute for Computational Cosmology (ICC) is moving to a switchless network architecture for its COSMA7 supercomputer to reduce the risk of network congestion that slows the pace of its research into the origins of the universe .

ICC uses a series of large supercomputers to power its space-related research, which requires performing complex and sophisticated simulations so that its team of 50 researchers can learn more about how the universe works.

The COSMA7 supercomputer is supported by the DiRAC (Distributed Research Using Advanced Computing) network, which provides computing resources and funding for supercomputing setups at the universities of Cambridge, Durham, Leicester and Edinburgh.

The supercomputer is also receiving funding from Exascale Computing Algorithms and Infrastructures Benefiting UK Research (ExCALIBUR), a £45.7m initiative focused on delivering next-generation simulation software to high-priority research areas in the UK. .

During a press briefing to discuss the COSMA7 network upgrade, Alastair Basden, Technical Lead for the COSMA High Performance Computing (HPC) Cluster at Durham University, said the institution was working with teams research around the world.

“It’s a very international institution and works with universities all over the world, and what we mainly do is run huge simulations of the universe – starting with the big bang – before spreading this out into the world. time to the present day, allowing us to look at the evolution of the universe during this time,” Basden said.

“We can put different physics into the simulations and we can see things that we don’t understand. Things like dark matter, dark energy and that sort of thing.

“There are different parameters for these and we put them into the startup simulation, propagate the simulation, and then compare what we get in the simulation with what we’ve observed using giant telescopes.”

To ensure that the supercomputer can continue to perform its work efficiently and productively, and following a successful proof of concept, the university chose to revamp and upgrade the COSMA7 network architecture to a switchless design using technology from Rockport Networks.

The deployment is funded by the DiRAC and ExCAlBUR programs, Rockport’s technology allowing the university to distribute the network switching function to COSMA7’s end nodes, effectively making it the network.

This, in turn, helps eliminate layers of switches from the supercomputer infrastructure, and means the risk of network congestion and bottlenecks is reduced, allowing researchers at the university to run their simulations more efficiently and get your hands on the data they produce faster. .

Matthew Williams, CTO of Rockport Networks, said the project is indicative of changing attitudes and ideas about how to solve network congestion problems.

“Tackling congestion has gone beyond provisioning more switches to throwing bandwidth at the problem,” he said. “The sophisticated control and architecture means the customer is no longer at the mercy of bottlenecks created by their network infrastructure.”

The ICC was introduced to Rockport Networks while the company was still in stealth mode through mutual contacts at hardware giant Dell, Basden said.

“We are a Dell Center of Excellence here in Durham, and Dell thought we might be interested in Rockport’s technology,” he added. “So we were hooked up with them a little over a year and a half ago, and we had this cluster here that we could test it on and take it from there.”

The final rollout is this week, but Basden told Computer Weekly that user feedback during the testing phase has been entirely positive, with his research teams able to carry out their work without disruption.

“A lot of people didn’t notice [the difference], which is very positive,” he said. “To them it’s just a network and they’ve used it and they haven’t had to adjust their code, but people who have been aware of the work going on behind the scenes have been impressed with it.”

As an example of the quality of testing, Basden points to the performance improvements that researchers working on a large and complex smoothed particle hydrodynamics code saw during the testing phase.

This particular code uses “task-based parallelism” so its operation is not affected by network congestion issues, but its performance also improved once Rockport’s technology was added to the stack. .

“We are always on the lookout for advanced technologies that can improve the performance and reliability of the advanced computing workloads we run,” Basden said.

“Based on the results and our first experience with Rockport’s switchless architecture, we were confident in our choice to improve our exascale modeling performance – all supported by the right economics.”

Comments are closed.