Server monitoring and HPC

Server Monitoring and HPC: Interview with Rich Brueckner

Kyle Claypool Insights for Dev Managers Leave a Comment

Server monitoring is critical whether you’re running a website on a shared server or running a large cluster of high performance computers. As we talk with DevOps and Cloud Computing experts, we’ve gotten a wide variety of perspectives. One interview we were most excited about was Rich Brueckner, President of insideHPC and one of Forbes’s Top 20 Big Data Influencers.

Rich provides technical insight into our industry and gives us some exciting news about how future hardware will make everyone’s life easier. He’s definitely excited about Google Fiber. He’s most likely envious of us here in Kansas City!

Here’s our interview with Rich:

What is your opinion of the major cloud providers? I.e. Amazon’s AWS, Windows Azure, others.

“As the leader, Amazon is not standing still. I am continually amazed at the pace of their AWS technology rollouts. They don’t do hand-holding so well, so I think companies like Cycle Computing are opening new doors for them.

For HPC in the Cloud, Penguin Computing has a terrific offering with low-latency clusters optimized for high performance computing. At the same time, I think what Dreamhost and Inktank are doing with the open source Ceph file system is going  to set them apart with their ability to manage Big Data.”

Server monitoring is obviously important for high-volume websites. What tools do you recommend for the enterprise? What about for cash-strapped startups?

“I came away very impressed after seeing demos from:

Real Status in the U.K.

Boundary

Wild Packets

It’s pretty ground-breaking stuff and I think those slidecasts are well worth a look.”

Let’s say over the next 5 years (okay, maybe 10), Google Fiber spreads Gigabit connection speeds across the country. How do you see that impacting HPC and the cloud at a consumer level?

“We’re already seeing a huge spike in technology startups as a result of Google Fiber availability in places like Kansas City. For HPC applications, high bandwidth and low latency are keys to scaling performance. As we enter the era of Big Data, I think that the proliferation of gigabit connections will make Cloud HPC much more practical than it is today.”

Where are the most exciting applications you’re seeing in HPC, both in the areas of research and commercial application?

“We just saw a huge HPC cluster spun up on AWS by a company called Cycle computing. With over 10,600 server instances working together, that system-on-the-fly was able to run a huge cancer research job for a Big 10 Pharmaceutical company in just a few hours and at a cost of just a few thousand dollars. Considering that kind of hardware costs tens of millions of dollars to procure and operate, this kind of practical Cloud HPC represents a real breakthrough for the future.”

(Check out more of that story here)

“Just a year or two ago, there was only a handful of machines capable of a Petaflop (a “thousand trillion” or 10 the 15th power floating point operations per second) of application performance. Today that number is closer to 30 or 40 supercomputers that we know about. As Petascale becomes readily accessible to thousands of scientists around the world, I think we will see quantum leaps in science and engineering.”

Server monitoring is critical whether you’ve got a single dedicated server or an entire HPC cluster. What monitoring tools do you use most?

I do spend a lot of time working with end users, and I think the vendors in the HPC space have a lot to offer the Cloud market in terms of technology and know-how for server management and deployment. I’ve heard a lot from users of Adaptive Computing, PBS Works, StackIQ, and Univa Grid Engine, and they are accomplishing amazing things. I think you’ll see all of these companies as rising stars in middleware for Big Data. On the other side of the fence, Puppet Labs here in Portland has an extremely enthusiastic community of users as does Opscode in Seattle with their Chef software.

Any projects you’d like to plug, or trends you’re particularly excited about?

“I am very interested in something called the UberCloud Experiment, a project that aims to identify, test, and document potential solutions to the known roadblocks in high performance computing as a service. The project is spearheaded by Wolfgang Gentzsch, a former colleague of mine from Sun Microsystems. If anyone can crack the code and enable the HPC community to finally embrace the Cloud as the way forward, it would be Wolfgang.”

Best known as “the guy in the red hat”, Rich Brueckner has over 25 years of High Performance Computing experience at Cray Research, SGI, and Sun Microsystems. In 2010 he acquired insideHPC where he focuses on delivering the best and latest news in the HPC industry. In his free time he writes fiction, cartoons, and parody films. Check him out on Twitter!