The LINQ Join Operator: A Complete Tutorial

By: HitSubscribe
  |  March 12, 2024
The LINQ Join Operator: A Complete Tutorial

I think most C# developers would agree that LINQ is an integral part of the experience of writing code with the language. LINQ provides a fluent, intuitive, and consistent way to query data sets. In this post, we’ll help in your LINQ-mastering quest by covering the LINQ join operator.

We’ll start the post with a definition of LINQ itself, so we’re all on the same page. After that, you’ll see an explanation of join operations in LINQ. Then, it’s time to roll up your sleeves and get practical with our hands-on guide to the join operator.

Let’s get started.

What is LINQ?

LINQ stands for Language Integrated Query. It’s a C# feature that offers a unique and consistent syntax for query datasets, regardless of their origin. The main benefit of LINQ is that you can use the same syntax to query data in memory, from a database, XML files, and so on.

LINQ is available in two different flavors, the query syntax and the method syntax.

The query syntax leverages special keywords to create a syntax that is familiar to anyone who’s worked with SQL. Here’s an example that queries a sequence of numbers, filtering the ones greater than 5:

int[] numbers = { 2, 8, 4, 9, 3, 6, 1, 7, 5 };

var largerThanFive = from num in numbers    where num > 5    select num;

The method syntax allows you to use extension methods to perform the same query:

int[] numbers = { 2, 8, 4, 9, 3, 6, 1, 7, 5 };
var largerThanFive = numbers.Where(x => x > 5);

What is The LINQ Join Operator?

When working with data, a common scenario is having two data sources that you want to combine based on some criteria. For instance, you might have a Books table and an Authors table in your database, with a one-to-many relationship between them—i.e., an author can author many books, but each book has only one author. If you need to compile a list of books containing their author’s name, you’d need to perform a join in order to match each line from the Books table to its counterpart in the Authors table.

A join in LINQ is essentially the same: an operation where you can merge two collections according to some criteria you define.

The LINQ Join Operator in Practice

Examples always make things clearer. So, let’s see how to use the join operator in practice.

Starting With a Problem

Let’s say you have an e-commerce application with some data on categories:

IdName
1Electronics
4Toys
5Stationery
7Books
10Clothes

Okay, now let’s have some products:

IdNameCategory_Id
1Amazon Kindle1
2Refactoring7
3C# in Depth7
4Legal Pad 50 sheets5

You can see where this is leading, right? The next thing you’d want to do is to produce a single collection, having the list of products and the names of the categories they belong to. In other words, a view like this:

IdNameCategory
1Amazon KindleElectronics
2RefactoringBooks
3C# in DepthBooks
4Legal Pad 50 sheetsStationery

Solving The Problem: Performing a LINQ Inner Join

What would that operation look like in code? First of all, we need code to represent our categories and products. Thanks to C#’s record feature, two lines of code suffice for that:

public record Product(int Id, string Name, int CategoryId);
public record Category(int Id, string Name);

Now, let’s have a list of each type:

var categories = new List<Category>
{
    new Category(1, "Electronics"),
    new Category(4, "Toys"),
    new Category(5, "Stationery"),
    new Category(7, "Books"),
    new Category(10, "Clothes")
};

var products = new List<Product>
{
    new Product(1, "Amazon Kindle", 1),
    new Product(2, "Refactoring", 7),
    new Product(3, "C# In Depth", 7),
    new Product(4, "Legal Pad 50 Sheets", 5),
    new Product(5, "Surgical Gloves", 12)
};

As you can see, the list of products has an additional product (surgical gloves) whose category id doesn’t match any of the available categories. Keep this in mind; it’ll be relevant in a moment.

Now, let’s write code to perform this join. I’ll show the code in one go and then explain it:

var query =
    from p in products
    join c in categories
    on p.CategoryId equals c.Id
    select new
    {
        Id = p.Id,
        Name = p.Name,
        Category = c.Name
    };

foreach (var line in query)
{
    Console.WriteLine(line);
}

Now, the explanation:

  • from p in products -> we’re defining the origin of one of our data sources
  • join c in categories -> Here, we’re saying that we want to join the previous collection with this one
  • on p.CategoryId equals c.Id -> This is the condition for the join: the CategoryId on each product should match the Id of a category
  • select new… -> Here, we’re leveraging C#’s anonymous objects feature to create a new object on the fly, which has the properties we want

The result of this query is an IEnumerable of our anonymous object. We then iterate through each item of this collection, displaying it on the console. This is the result:

{ Id = 1, Name = Amazon Kindle, Category = Electronics }
{ Id = 2, Name = Refactoring, Category = Books }
{ Id = 3, Name = C# In Depth, Category = Books }
{ Id = 4, Name = Legal Pad 50 Sheets, Category = Stationery }

Those of you who remember your databases will notice that the LINQ join we performed is the equivalent of an inner join in SQL. In other words, only items that have a match are returned. In SQL, the equivalent query would look like this:

SELECT p.Id, p.Name, c.Name AS Category
FROM products AS p
JOIN categories AS c ON p.CategoryId = c.Id

Performing a LINQ Outer Join

What if you wanted to perform the equivalent of a SQL outer join? That is, you want to retrieve all products, even the ones that don’t match any category. How to go about that?

Here’s the updated query:

var query =
    from p in products
    join c in categories
    on p.CategoryId equals c.Id into joinedCategories
    from c in joinedCategories.DefaultIfEmpty()
    select new
    {
        Id = p.Id,
        Name = p.Name,
        Category = c?.Name
    };

  It looks similar, but there are two differences:

  • on p.CategoryId equals c.Id into joinedCategories -> here, after joining products with categories, we send the result, as a grouped sequence, to the joinedCategories range variable
  • from c in joinedCategories.DefaultIfEmpty() -> Then, we retrieve items from the groupedSequence, using the DefaultIfEmpty() method to return the default value when no matches are found
  • Category = c?.Name -> Finally, when assigning the category name to the Category property on our anonymous object, we have to use the null-conditional operator in order to avoid a null-reference exception (since the default value for Category is null because it’s a reference type.)

The result is now different:

{ Id = 1, Name = Amazon Kindle, Category = Electronics }
{ Id = 2, Name = Refactoring, Category = Books }
{ Id = 3, Name = C# In Depth, Category = Books }
{ Id = 4, Name = Legal Pad 50 Sheets, Category = Stationery }
{ Id = 5, Name = Surgical Gloves, Category =  }

As you can see, the “Surgical Gloves” product now appears, even if it doesn’t have a matching category.

LINQ Inner Join With Where Condition

Performing a join with a where clause is quite easy. In this example, we’ll perform an inner join, filtering only the products whose id are equal to or greater than 3:

var query =
    from p in products
    where p.Id >= 3
    join c in categories
    on p.CategoryId equals c.Id
    select new
    {
        Id = p.Id,
        Name = p.Name,
        Category = c.Name
    };

LINQ Inner Join With Multiple Conditions

If you want to use multiple conditions within your join, you can simply use more than one where clause. Let’s update our query once again:

var query =
    from p in products
    join c in categories
    on p.CategoryId equals c.Id
    where p.Id >= 3
    where c.Name.EndsWith('s')
    select new
    {
        Id = p.Id,
        Name = p.Name,
        Category = c.Name
    };

Here, we’re filtering only categories whose names end with the letter s.

LINQ Join With Composite Key

Up until now, all of our examples have used single keys to perform the matching. You can also use composite keys—that is, more than one value—for the matching.

Suppose both our Product and Category classes gained a new property called Status, which is an enum that can vary between three states: Pending, Active, and Archived. Now, the Status property also needs to be used for the match.

All of our products are active, but not all of the categories:

var categories = new List<Category>
{
    new Category(1, "Electronics", Status.Active),
    new Category(4, "Toys", Status.Active),
    new Category(5, "Stationery", Status.Archived),
    new Category(7, "Books", Status.Pending),
    new Category(10, "Clothes", Status.Active)
};

var products = new List<Product>
{
    new Product(1, "Amazon Kindle", 1,  Status.Active),
    new Product(2, "Refactoring", 7,  Status.Active),
    new Product(3, "C# In Depth", 7,  Status.Active),
    new Product(4, "Legal Pad 50 Sheets", 5,  Status.Active),
    new Product(5, "Surgical Gloves", 12,  Status.Active)
};

This is what our updated query looks like now:

var query =
    from p in products
    join c in categories
    on new { Id = p.CategoryId, Status = p.Status }
    equals new { Id = c.Id, Status = c.Status }
    select new
    {
        Id = p.Id,
        Name = p.Name,
        Category = c.Name
    };

It’s not much more complicated than before. The difference is that now, we use an anonymous object to perform the comparison using both the id and the status properties.

A single result is displayed from this query:

{ Id = 1, Name = Amazon Kindle, Category = Electronics }

Conclusion

As we’ve seen, LINQ is an essential part of working with C#. You can leverage LINQ in many different scenarios, from working with data in memory to XML to SQL. You can use LINQ in ORMs such as NHibernate and Entity Framework.

Teams that wish to make their LINQ experiences even better can use the tools at their disposal. For instance, Stackify’s Prefix and Retrace offer powerful capabilities of tracing, profiling, and centralizing logging that helps teams inspect their code to find opportunities for performance improvements, which includes LINQ queries.

This post was written by Carlos Schults. Carlos is a consultant and software engineer with experience in desktop, web, and mobile development. Though his primary language is C#, he has experience with a number of languages and platforms. His main interests include automated testing, version control, and code quality.

Improve Your Code with Retrace APM

Stackify's APM tools are used by thousands of .NET, Java, PHP, Node.js, Python, & Ruby developers all over the world.
Explore Retrace's product features to learn more.

Learn More

Want to contribute to the Stackify blog?

If you would like to be a guest contributor to the Stackify blog please reach out to [email protected]