Foreach() VS LINQ Cross Join and how to make SQLBulkCopy Faster

Are you using foreach loop to iterate over a collection and then make a datatable and send the query to SQLBulkCopy API.

Here is a faster way to do it in Linq

Recently I had a scenario where I had a set of values and a set of keys -

Values

ID Values
Alpha 100
Beta 200
Gamma 300
Delta 400
keys 
-3 -4 1000 999 888
I Needed to create a datatable that looks like this 
-3 -4 1000 999 888 Alpha 100
-3 -4 1000 999 888 Beta 200
-3 -4 1000 999 888 Gamma 300
-3 -4 1000 999 888 Delta 400
Once I have this datatable I could pass it to SQLBulkCopy API and bulk insert these rows 
in SQL Server 2008 database.
The way the application was currently working was it was using a ForeachLoop which would 
iterate over each Values collection
and the one row of the Keys collection and create a datatable object. I was asked to perf 
tune it and after realising this 
was the bottleneck I used LINQ Cross Join and a EntityDataReader class mentioned below which 
you can download from code.msdn(link is
given below)
I have a small snippet that mimics the functionality and compares performance of both Foreach 
Loop in this scenarion and LINQ Cross JOIN
along with this EntityDataReader class.
First of all download this class from http://code.msdn.microsoft.com/LinqEntityDataReader and add 
it to your solution and use this
namespace in your code. Then copy paste this code in a console application .
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.Data.Objects;
using Microsoft.Samples.EntityDataReader;
// Download this from http://code.msdn.microsoft.com/LinqEntityDataReader
namespace LINQCrossJoin
{
    class Program
    {
        static void Main(string[] args)
        {
            //Simulate the Vectors and Context
            List<string> Vectors = new List<string>();
            List<string> Context = new List<string>();
            DataTable dt = new DataTable();
            dt.Columns.Add("Context");
            dt.Columns.Add("Vectors");
            
            for (int i = 0; i < 7500000; i++)
            {
                Vectors.Add(i.ToString());
            }
            Context.Add("Foo");
            //Simulate the Cross Join using Foreach
            Console.WriteLine("Started the Foreach Loop at {0}", DateTime.Now);
            foreach (var vector in Vectors)
            {
                dt.Rows.Add(Context, vector);

            }
            Console.WriteLine("Finished the foreach Loop at {0}", DateTime.Now);
            //Simulate Cross Join Using LINQ
            Console.WriteLine("Started the LINQ Loop at {0}", DateTime.Now);
            var combo = from C in Context
                        from V in Vectors
                        select new { C, V };
            IDataReader dr = combo.AsDataReader();
            //This dr can then be sent to the SQLBulkCopy API. These is a sample code to do so at 
            //http://code.msdn.microsoft.com/LinqEntityDataReader
            Console.WriteLine("Finished the LINQ Loop at {0}", DateTime.Now);
            Console.Read();

        }
       
    }
}

 

The output is

Total elapsed time in a foreach loop 25758 MilliSeconds

Total elapsed time in a LINQ and EntityDataReader loop 31 MilliSeconds

Performance optimization 830 times

Awesome !! In this scenario you are almost 1000 times faster . In my application with other bits of the code and

other bottlenecks like SAN IO , Network IO , SQL Server CPU bottleneck etc I could tune the application to go

from 200,000 rows / Sec to approx 2 Million rows/sec. I thought I will share the nugget with you guys.

 

2 thoughts on “Foreach() VS LINQ Cross Join and how to make SQLBulkCopy Faster

  1. claire says:

    Incredible! Are there scenarios where it wouldn’t be faster, do you think?

    • ashwaniroy says:

      Interesting question. I am not 100 % sure of a scenario where it will be slower but I will pose this to C# MVPs I know and see if they have something. The reason this is faster is because of the lazy evaluation due to “Yield” keyword. So unless you call any iterator over the LINQ object it is not holding any data in memory for you. So apart from being just faster it also becomes optimized in terms of memory foot prints.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.