Here at Stackify, there is an ethos that you will find common to every member of the engineering teams: we love building software. By the same token, we love developer tools that make building software faster, easier, more fun, more accurate, etc. One of those indispensable tools is Joseph Albahari’s LINQPad, which as it turns out, is handy for all kinds of rapid prototyping in .NET.
Last week while I was working on some top-secret new bits for Prefix, I needed to implement an algorithm to find the first instance of any of a given series of bytes inside of a larger byte array. The algorithm needed to be entirely accurate and extremely quick since it will be called often and from performance-critical sections of code. Armed with a few different ideas of how the algorithm might be implemented, I dusted off my copy of LINQPad and went to work.
I cannot overstate the value of LINQPad for knocking out rapid C# development tasks in an informal context. It saves me time almost every day of the week, and answers questions without having to compile and run my whole project (e.g. “does this date format string give me the outcome I expect?”). By some scratch math, I figure this adds up to a couple weeks a year of extra productivity for Stackify, just from one developer.
When it comes to comparing the performance between multiple implementation strategies, I’d like to share with you a boilerplate template that I’ve refined over time. The goal is to make extremely light work of the “comparison” part of the job, leaving only the implementation of strategies you wish to test.
As you can see in the example, we define a delegate that matches the signature of the various strategies we want to compare, as well as a compact custom return type that encapsulates some values that summarize the results and timing of a particular strategy. Included in the return type is a checksum value that, while not particularly foolproof, helps to identify at a glance if a particular strategy might be mis-implemented in a way that could affect its results.
Interestingly, the strategy that performed best in our string search tests (several intermediately slow strategies are omitted from the example for clarity) is a hybrid strategy that relies on some highly-optimized (mapped to native) methods from the .NET Framework for the heavy lifting of seeking quickly through a large string to a position of interest, then comparing the various combinations of bytes that could match one of our target patterns. In another test not included here, this strategy outperformed even classic multi-pattern string search algorithms like Rabin-Karp, likely due to the efficiency of Array.IndexOf().
Final Note on LINQPad:
On Stackify’s .NET Client Tools team, it’s extremely important that the code we create is lightweight and high performance, because sometimes that same code is in libraries that may execute inside of your applications! It turns out that implementing performant algorithms in managed code is an interesting challenge; often times well understood algorithms like Rabin-Karp, executed in managed code, have worse performance than using more direct strategies with simple but efficient BCL features like String.IndexOf() and Array.IndexOf(), since those .NET features often map to native code.
While software developers are often warned of the perils of “premature optimization,” I believe the more sage advice is that optimization should be early, but highly focused. Critically, optimization efforts should be restricted to areas where experience or intuition suggests there are benefits to be gained, its goal should be choosing between strategies that are quickly identifiable rather than inventing new paradigms, and above all, optimizations should support an already-defined design, rather than re-orienting design around a particular optimization.
Principal Engineer / Stackify