Introduction

In my previous three articles on CodeProject.com, I have explained the fundamentals of Windows Communication Foundation (WCF), including:
If you have followed those three articles closely, you should be able to work with WCF now. Within the last two articles, I have explained how to utilize LINQ and the Entity Framework with WCF, so by now you should also be able to work with LINQ and EF. However, you may not fully understand LINQ and EF by just reading those two articles. So here I come, to explain all the fundamentals of LINQ and EF.
In addition to LINQ and EF, some people may still be using LINQ to SQL, which is the first ORM product from Microsoft, or a by-product of the C# team, or a simplified version of EF, or whatever you think and say it is. As LINQ to SQL (L2S) is so easy to work with, I will also write some articles to explain it.
Having said so, in the future, I will write the following five articles to explain LINQ, LINQ to SQL, and EF:
  • Introducing LINQ-Language Integrated Query (this article)
  • LINQ to SQL: Basic Concepts and Features (next article)
  • LINQ to SQL: Advanced Concepts and Features (future article)
  • LINQ to Entities: Basic Concepts and Features (future article)
  • LINQ to Entities: Advanced Concepts and Features (future article)
After finishing these five articles, I will come back to write some more articles on WCF from my real work experience, which will be definitely helpful for your real world work, if you are using WCF right now.
In this article, I will cover the following topics:
  • What is LINQ
  • New data type var
  • Automatic properties
  • Object initializer and Collection initializer
  • Anonymous types
  • Extension Methods
  • Lambda expressions
  • Built-in LINQ Extension Methods and method syntax
  • LINQ query syntax and query expression
  • Built-in LINQ operators

What is LINQ

Language-Integrated Query (LINQ) is a set of extensions to the .NET Framework that encompass language-integrated query, set, and transform operations. It extends C# and Visual Basic with native language syntax for queries and provides class libraries to take advantage of these capabilities.
Let us see an example first. Suppose there is a list of integers like this:
List<int> list = new List<int>() { 1, 2, 3, 4, 5, 6, 100 };
To find all the even numbers in this list, you might write code like this:
List<int> list1 = new List<int>();
foreach (var num in list)
{
    if (num % 2 == 0)
        list1.Add(num);
}
Now with LINQ, you can select all of the even numbers from this list and assign the query result to a variable, in just one sentence, like this:
var list2 = from number in list
            where number % 2 == 0
            select number;
In this example, list2 and list1 are equivalent. list2 contains the same numbers as list1 does. As you can see, you don't write a foreach loop. Instead, you write a SQL statement.
But what do from, where, and select mean here? Where are they defined? How and when can you use them? Let us start the exploration now.

Creating the test solution and project

To show these LINQ-related new features, we will need a test project to demonstrate what they are and how to use them. So we first need to create the test solution and the project.
Follow these steps to create the solution and the project:
  1. Start Visual Studio 2010.
  2. Select menu option File | New | Project... to create a new solution.
  3. In the New Project window, select Visual C# | Console Application as the Template.
  4. Enter TestLINQ as the Solution Name, and TestNewFeaturesApp as the (project) Name.
  5. Click OK to create the solution and the project.

New data type var

The first new feature that is very important for LINQ is the new data type var. This is a new keyword that can be used to declare a variable, and this variable can be initialized to any valid C# data.
In the C# 3.0 specification, such variables are called implicitly-typed local variables.
A var variable must be initialized when it is declared. The compile-time type of the initializer expression must not be of null type, but the run time expression can be null. Once it is initialized, its data type is fixed to the type of the initial data.
The following statements are valid uses of the var keyword:
// valid var statements
var x = "1";
var n = 0;
string s = "string";
var s2 = s;
s2 = null;
string s3 = null;
var s4 = s3;
At compile time, the above var statements are compiled to IL like this:
string x = "1";
int n = 0;
string s2 = s;
string s4 = s3;
The var keyword is only meaningful to the Visual Studio compiler. The compiled assembly is actually a valid .NET 2.0 assembly. It doesn't need any special instructions or libraries to support this feature.
The following statements are invalid usages of the var keyword:
// invalid var statements
var v;
var nu = null;
var v2 = "12"; v2 = 3;
The first one is illegal because it doesn't have an initializer.
The second one initializes the variable nu to null which is not allowed, although once defined, a var type variable can be assigned null. If you think that at compile time, the compiler needs to create a variable using this type of initializer, then you understand why the initializer can't be null at compile time.
The third one is illegal because once defined, an integer can't be converted to a string implicitly (v2 is of type string).

Automatic properties

In the past, for a class member, if we wanted to define it as a property member, we had to define a private member variable first. For example, for the Product class, we can define a property ProductName as follows:
private string productName;
public string ProductName
{
    get { return productName; }
    set { productName = value; }
}
This may be useful if we need to add some logic inside the get/set methods. But if we don't need to, the above format gets tedious, especially if there are many members.
Now, with C# 3.0 and above, the above property can be simplified in one statement:
public string ProductName { get; set; }
When Visual Studio compiles this statement, it will automatically create a private member variable productName and use the old style's get/set methods to define the property. This could save lots of typing.
Just as with the new type var, the automatic properties are only meaningful to the Visual Studio compiler. The compiled assembly is actually a valid .NET 2.0 assembly.
Interestingly, later on, if you find you need to add logic to the get/set methods, you can still convert this automatic property to the old style's property.
Now, let us create this class in the test project:
public class Product
{
    public int ProductID { get; set; }
    public string ProductName { get; set; }
    public decimal UnitPrice { get; set; }
}
We can put this class inside the Program.cs file, within the namespace TestNewFeaturesApp. We will use this class throughout this article, to test C# features related to LINQ.

Object initializer

In the past, we couldn't initialize an object without using a constructor. For example, we could create and initialize a Product object like this, if the Product class has a constructor with three parameters:
Product p = new product(1, "first candy", 100.0);
Or, we could create the object, and then initialize it later, like this:
Product p = new Product();
p.ProductID = 1;
p.ProductName = "first candy";
p.UnitPrice=(decimal)100.0;
Now with the new object initializer feature, we can do it as follows:
Product product = new Product
{
    ProductID = 1,
    ProductName = "first candy",
    UnitPrice = (decimal)100.0
};
At compile time, the compiler will automatically insert the necessary property setter code. So again, this new feature is a Visual Studio compiler feature. The compiled assembly is actually a valid .NET 2.0 assembly.
We can also define and initialize a variable with an array like this:
var arr = new[] { 1, 10, 20, 30 };
This array is called an implicitly typed array.

Collection initializer

Similar to the object initializer, we can also initialize a collection when we declare it, like this:
List products = new List {
    new Product { 
        ProductID = 1, 
        ProductName = "first candy", 
        UnitPrice = (decimal)10.0 },
    new Product { 
        ProductID = 2, 
        ProductName = "second candy", 
        UnitPrice = (decimal)35.0 },
    new Product { 
        ProductID = 3, 
        ProductName = "first vegetable", 
        UnitPrice = (decimal)6.0 },
    new Product { 
        ProductID = 4, 
        ProductName = "second vegetable", 
        UnitPrice = (decimal)15.0 },
    new Product { 
        ProductID = 5, 
        ProductName = "another product", 
        UnitPrice = (decimal)55.0 }
};
Here, we created a list and initialized it with five new products. For each new product, we used the object initializer to initialize its value.
Just as with the object initializer, this new feature, collection initializer, is also a Visual Studio compiler feature, and the compiled assembly is a valid .NET 2.0 assembly.

Anonymous types

With the new feature of the object initializer, and the new var data type, we can create anonymous data types easily in C# 3.0.
For example, if we define a variable like this:
var a = new { Name = "name1", Address = "address1" };
At compile time, the compiler will actually create an anonymous type as follows:
class __Anonymous1
{
    private string name;
    private string address;
    public string Name {
        get{
            return name;
        }
        set {
            name=value
        }
    }
    public string Address {
        get{
            return address;
        }
        set{
            address=value;
        }
    }
}
The name of the anonymous type is automatically generated by the compiler, and cannot be referenced in the program text.
If two anonymous types have the same members with the same data types in their initializers, then these two variables have the same types. For example, if there is another variable defined like this:
var b = new { Name = "name2", Address = "address2" };
Then we can assign a to b like this:
b = a;
The anonymous type is particularly useful for LINQ when the result of LINQ can be shaped to be whatever you like. We will give more examples of this when we discuss LINQ.
As mentioned earlier, this new feature is again a Visual Studio compiler feature, and the compiled assembly is a valid .NET 2.0 assembly.

Extension Methods

Extension Methods are static methods that can be invoked using the instance method syntax. In effect, Extension Methods make it possible for us to extend existing types and construct types with additional methods.
For example, we can define an Extension Method as follows:
public static class MyExtensions
{
    public static bool IsCandy(this Product p)
    {
        if (p.ProductName.IndexOf("candy") >= 0)
            return true;
        else
            return false;
    }
}
In this example, the static method IsCandy takes a this parameter of Product type, and searches for the word candy inside the product name. If it finds a match, it assumes this is a candy product and returns true. Otherwise, it returns false, meaning this is not a candy product.
Since all Extension Methods must be defined in top level static classes, to simplify the example, we put this class inside the same namespace as our main test application, TestNewFeaturesApp, and make this class on the same level as the Program class so it is a top level class. Now, in the program, we can call this Extension Method like this:
if (product.IsCandy())
    Console.WriteLine("yes, it is a candy");
else
    Console.WriteLine("no, it is not a candy");
It looks as if IsCandy is a real instance method of the Product class. Actually, it is a real method of the Product class, but it is not defined inside the Product class. Instead, it is defined in another static class, to extend the functionality of the Product class. This is why it is called an Extension Method.
Not only does it look like a real instance method, but this new Extension Method actually pops up when a dot is typed following the product variable. The following image shows the intellisense of the product variable within Visual Studio.

Under the hood in Visual Studio, when a method call on an instance is being compiled, the compiler first checks to see if there is an instance method in the class for this method. If there is no matching instance method, it looks for an imported static class, or any static class within the same namespace. It also searches for an extension method with the first parameter that is the same as the instance type (or is a super type of the instance type). If it finds a match, the compiler will call that extension method. This means that instance methods take precedence over Extension Methods, and Extension Methods that are imported in inner namespace declarations take precedence over Extension Methods that are imported in outer namespaces.
In our example, when product.IsCandy() is being compiled, the compiler first checks the Product class and doesn't find a method named IsCandy. It then searches the static class MyExtensions, and finds an Extension Method with the name IsCandy and with a first parameter of type Product.
At compile time, the compiler actually changes product.IsCandy() to this call:
MyExtensions.IsCandy(product)
Surprisingly, Extension Methods can be defined for sealed classes. In our example, you can change the Product class to be sealed and it still runs without any problem. This gives us great flexibility to extend system types, because many of the system types are sealed. On the other hand, Extension Methods are less discoverable and are harder to maintain, so they should be used with great caution. If your requirements can be achieved with an instance method, you should not define an Extension Method to do the same work.
Not surprisingly, this new feature is again a Visual Studio compiler feature, and the compiled assembly is a valid .NET 2.0 assembly.
Extension Methods are the bases of LINQ. We will discuss the various Extension Methods defined by .NET 3.5 in the namespace System.Linq, later.
Now, the Program.cs file should be like this:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace TestNewFeaturesApp
{
    class Program
    {
        static void Main(string[] args)
        {
            // valid var statements
            var x = "1";
            var n = 0;
            string s = "string";
            var s2 = s;
            s2 = null;
            string s3 = null;
            var s4 = s3;
            /*
            string x = "1";
            int n = 0;
            string s2 = s;
            string s4 = s3;
           */
            // invalid var statements
            /*
            var v;
            var nu = null;
            var v2 = "12"; v2 = 3;
            */
            // old way to create and initialize an object
            /*
            Product p = new product(1, "first candy", 100.0);
            Product p = new Product();
            p.ProductID = 1;
            p.ProductName = "first candy";
            p.UnitPrice=(decimal)100.0;
            */
           
            //object initializer
            Product product = new Product
            {
                 ProductID = 1,
                 ProductName = "first candy",
                 UnitPrice = (decimal)100.0
            };
            var arr = new[] { 1, 10, 20, 30 };
            // collection initializer
            List products = new List {
                 new Product { 
                           ProductID = 1, 
                           ProductName = "first candy", 
                           UnitPrice = (decimal)10.0 },
                 new Product { 
                           ProductID = 2, 
                           ProductName = "second candy", 
                           UnitPrice = (decimal)35.0 },
                 new Product { 
                           ProductID = 3, 
                           ProductName = "first vegetable", 
                           UnitPrice = (decimal)6.0 },
                 new Product { 
                           ProductID = 4, 
                           ProductName = "second vegetable", 
                           UnitPrice = (decimal)15.0 },
                 new Product { 
                           ProductID = 5, 
                           ProductName = "third product", 
                           UnitPrice = (decimal)55.0 }
            };
            // anonymous types
            var a = new { Name = "name1", Address = "address1" };
            var b = new { Name = "name2", Address = "address2" };
            b = a;
            /*
            class __Anonymous1
           {
                private string name;
                private string address;
                public string Name {
                    get{
                         return name;
                    }
                    set {
                         name=value
                    }
                }
                public string Address {
                     get{
                           return address;
                      }
                     set{
                           address=value;
                     }
                }
             }
            */
            // extension methods
            if (product.IsCandy()) //if(MyExtensions.IsCandy(product))
                   Console.WriteLine("yes, it is a candy");
            else
                   Console.WriteLine("no, it is not a candy");
         }
    }
     public sealed class Product
    {
        public int ProductID { get; set; }
        public string ProductName { get; set; }
        public decimal UnitPrice { get; set; }
    }
    public static class MyExtensions
    {
        public static bool IsCandy(this Product p)
       {
            if (p.ProductName.IndexOf("candy") >= 0)
                 return true;
            else
                 return false;
        }
    }
}
So far in Program.cs, we have:
  • Defined several var type variables
  • Defined a sealed class Product
  • Created a product with the name of "first candy"
  • Created a product list containing five products
  • Defined a static class, and added a static method IsCandy with a this parameter of type Product, to make this method an Extension Method
  • Called the Extension Method on the candy product, and printed out a message according to its name
If you run the program, the output will look like this:
Pic2.png

Lambda expressions

With the C# 3.0 new Extension Method feature, and the C# 2.0 new anonymous method (or inline method) feature, Visual Studio has introduced a new expression called lambda expression.
Lambda expression is actually a syntax change for anonymous methods. It is just a new way of writing anonymous methods. Next, let's see what a lambda expression is step by step.
First, in C# 3.0, there is a new generic delegate type, Func, which presents a function taking an argument of type A, and returns a value of type R:
delegate R Func (A Arg);
In fact, there are several overloaded versions of Func, of which Func is one.
Now, we will use this new generic delegate type to define an extension:
public static IEnumerable Get(this IEnumerable source, Funcbool