How to Partition Tables in Azure SQL
There are many reasons why Partitioned Tables in Azure SQL can be useful. In this post we’re going to focus on just one: deleting large quantities of rows from a tall and narrow Azure SQL table without burninating your Log IO.
Burninating your Log IO you say? Pssshh, Azure SQL can handle this if you work your TSQL like the late Muhammad Ali (rest in peace) worked the ring. Yes, while we could effectively TRUNCATE the entire table in seconds, our goal is to hack a large quantity of rows off while leaving the vast majority of this immense table alone. No problem, just DELETE FROM WHERE, right? Well, that’s where our Log IO comes into play. As you may already, know Azure SQL runs in Full Recovery Mode by default and currently does not allow adjustments. A DELETE is a logged operation and executing this request on such a large table would not only take an extraordinary amount of time, but it would also “peg” your Log IO usage on your database during processing. What we need is a way to achieve TRUNCATE like abilities with DELETE like selection criteria. Say hello to Partitioned Tables, which allow us to accomplish a very similar feat.
Let’s start with this amazing table, which as we already imagined is somewhere in the neighborhood of 50 million rows, but only contains a handful of columns. First up is the ID column of type INT. Next in line is DateUtc of type DATETIMEOFFSET, where we keep track of when this particular row entered this dimension. Our meat-and-potatoes value column of type INT quickly and unimaginatively rounds up our column list.
You wouldn’t be making these changes to a live table of the specified size. In our imaginary scenario we’ll be creating a new table for partitioning and then insert our existing data into it.
Let’s start with our Partition Function to define how the Partition column is split into separate table partitions. There are many ways to go about this, but one of my personal favorites is using a rolling-window based on date like the day of the year. If you use a rolling-window based on date, it means your partitions are static and you don’t need to futz around with adding new partitions or splitting existing partitions as your data grows. In this example we’ll be using day of the year, which is effectively 1 to 366.
RANGE LEFT or RANGE RIGHT define whether the values gather from the left or right. For instance, if we used RIGHT, we’d end up with an empty partition before 1 since it gathers only from it’s right. Since we’re using LEFT, we’ll end up with a typically empty/unused partition on the end numbered 366. In more advanced partitioning scenarios LEFT and RIGHT might require more thought (Note: I totally used LINQPad, Enumerable.Range, and String.Join to create that mondo list of day numbers).
We need a Partition Scheme as well to define where each Partition lives in storage, which is also used when hooking our table up for partitioning. I’m stashing all data into PRIMAR. Take that data file.
Now we need a column in the table to represent the partition assignment for each row. I’ve added a Partition column to our new table of type SMALLINT. Considering our partition column is based on the day of the year, at insert time, there’s no reason to go through the pain of populating it manually. Let’s create a quick User-Defined Function to determine the current day of the year and then add a Default Constraint that applies it auto-magically on insert.
Next let’s hook our fancy, new partitioning into our table. We’ll need to replace our ho-hum primary key with a combination of the existing ID and the new Partition column. Then we just make sure the table targets the partition and we’re golden.
While Imagination has a single partition with all rows, Imagination_Partitioned shows 1 row per partition since each of the 10 rows has a different day of the year assigned via DateUtc. Want to view your own partitioning? No sweat. Here’s some codez.
Exciting, isn’t it? Let’s insert our existing data and see how it looks, before and after.
Ok, for the grand-finale, let’s create a basic UDF to line-up partitions using a start and end date.
Now we’re ready for a Stored Procedure which can receive a start and end date, retrieve a partition range, and execute our truncates using partitions!
Take that, Big Data.
Sr. Developer / Stackify