How to Partition Tables in Azure SQL

There are many reasons why Partitioned Tables in Azure SQL can be useful. In this post we’re going to focus on just one: deleting large quantities of rows from a tall and narrow Azure SQL table without burninating your Log IO.

Burninating your Log IO you say? Pssshh, Azure SQL can handle this if you work your TSQL like the late Muhammad Ali (rest in peace) worked the ring. Yes, while we could effectively TRUNCATE the entire table in seconds, our goal is to hack a large quantity of rows off while leaving the vast majority of this immense table alone. No problem, just DELETE FROM WHERE, right? Well, that’s where our Log IO comes into play. As you may already, know Azure SQL runs in Full Recovery Mode by default and currently does not allow adjustments. A DELETE is a logged operation and executing this request on such a large table would not only take an extraordinary amount of time, but it would also “peg” your Log IO usage on your database during processing. What we need is a way to achieve TRUNCATE like abilities with DELETE like selection criteria. Say hello to Partitioned Tables, which allow us to accomplish a very similar feat.

Let’s start with this amazing table, which as we already imagined is somewhere in the neighborhood of 50 million rows, but only contains a handful of columns. First up is the ID column of type INT.   Next in line is DateUtc of type DATETIMEOFFSET, where we keep track of when this particular row entered this dimension. Our meat-and-potatoes value column of type INT quickly and unimaginatively rounds up our column list.

screenshot_1

 

You wouldn’t be making these changes to a live table of the specified size. In our imaginary scenario we’ll be creating a new table for partitioning and then insert our existing data into it.

Let’s start with our Partition Function to define how the Partition column is split into separate table partitions. There are many ways to go about this, but one of my personal favorites is using a rolling-window based on date like the day of the year. If you use a rolling-window based on date, it means your partitions are static and you don’t need to futz around with adding new partitions or splitting existing partitions as your data grows. In this example we’ll be using day of the year, which is effectively 1 to 366.

RANGE LEFT or RANGE RIGHT define whether the values gather from the left or right. For instance, if we used RIGHT, we’d end up with an empty partition before 1 since it gathers only from it’s right. Since we’re using LEFT, we’ll end up with a typically empty/unused partition on the end numbered 366. In more advanced partitioning scenarios LEFT and RIGHT might require more thought (Note: I totally used LINQPad, Enumerable.Range, and String.Join to create that mondo list of day numbers).

screenshot_2https://gist.github.com/strvmarv/07e79bc89bd75d2703650892fcc076d6

 

We need a Partition Scheme as well to define where each Partition lives in storage, which is also used when hooking our table up for partitioning. I’m stashing all data into PRIMAR. Take that data file.

screenshot_3

https://gist.github.com/strvmarv/aae6ded1ff00582f6fe30d8ffd419473

 

Now we need a column in the table to represent the partition assignment for each row. I’ve added a Partition column to our new table of type SMALLINT. Considering our partition column is based on the day of the year, at insert time, there’s no reason to go through the pain of populating it manually. Let’s create a quick User-Defined Function to determine the current day of the year and then add a Default Constraint that applies it auto-magically on insert.

screenshot_4

https://gist.github.com/strvmarv/4fa868f31d354d191f3f29bd485d6cff

 

Next let’s hook our fancy, new partitioning into our table. We’ll need to replace our ho-hum primary key with a combination of the existing ID and the new Partition column. Then we just make sure the table targets the partition and we’re golden.

screenshot_5

https://gist.github.com/strvmarv/58fd5739f4ad0258163d277f11f03c9b

 

While Imagination has a single partition with all rows, Imagination_Partitioned shows 1 row per partition since each of the 10 rows has a different day of the year assigned via DateUtc. Want to view your own partitioning? No sweat. Here’s some codez.

screenshot_8

https://gist.github.com/strvmarv/6f35d8679705f00c2aa3167c0f6661e7

 

Exciting, isn’t it?  Let’s insert our existing data and see how it looks, before and after.

Before:

screenshot_6

After:

screenshot_7

 

Ok, for the grand-finale, let’s create a basic UDF to line-up partitions using a start and end date.

screenshot_9

https://gist.github.com/strvmarv/d5c6b493f4cbc66774615b8032820e9d

 

Now we’re ready for a Stored Procedure which can receive a start and end date, retrieve a partition range, and execute our truncates using partitions!

screenshot_10

https://gist.github.com/strvmarv/cd6abd4769c657c4903ce7fb9f8b518d

 

Take that, Big Data.

 

 

Paul Marvin 

Sr. Developer / Stackify