Are you using foreach loop to iterate over a collection and then make a datatable and send the query to SQLBulkCopy API.
Here is a faster way to do it in Linq
Recently I had a scenario where I had a set of values and a set of keys -
Values
| ID | Values |
| Alpha | 100 |
| Beta | 200 |
| Gamma | 300 |
| Delta | 400 |
keys
| -3 | -4 | 1000 | 999 | 888 |
I Needed to create a datatable that looks like this
| -3 | -4 | 1000 | 999 | 888 | Alpha | 100 |
| -3 | -4 | 1000 | 999 | 888 | Beta | 200 |
| -3 | -4 | 1000 | 999 | 888 | Gamma | 300 |
| -3 | -4 | 1000 | 999 | 888 | Delta | 400 |
Once I have this datatable I could pass it to SQLBulkCopy API and bulk insert these rows
in SQL Server 2008 database.
The way the application was currently working was it was using a ForeachLoop which would
iterate over each Values collection
and the one row of the Keys collection and create a datatable object. I was asked to perf
tune it and after realising this
was the bottleneck I used LINQ Cross Join and a EntityDataReader class mentioned below which
you can download from code.msdn(link is
given below)
I have a small snippet that mimics the functionality and compares performance of both Foreach
Loop in this scenarion and LINQ Cross JOIN
along with this EntityDataReader class.
First of all download this class from http://code.msdn.microsoft.com/LinqEntityDataReader and add
it to your solution and use this
namespace in your code. Then copy paste this code in a console application .
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Data; using System.Data.SqlClient; using System.Data.Objects; using Microsoft.Samples.EntityDataReader;
// Download this from http://code.msdn.microsoft.com/LinqEntityDataReader
namespace LINQCrossJoin { class Program { static void Main(string[] args) { //Simulate the Vectors and Context List<string> Vectors = new List<string>(); List<string> Context = new List<string>(); DataTable dt = new DataTable(); dt.Columns.Add("Context"); dt.Columns.Add("Vectors"); for (int i = 0; i < 7500000; i++) { Vectors.Add(i.ToString()); } Context.Add("Foo"); //Simulate the Cross Join using Foreach Console.WriteLine("Started the Foreach Loop at {0}", DateTime.Now); foreach (var vector in Vectors) { dt.Rows.Add(Context, vector); } Console.WriteLine("Finished the foreach Loop at {0}", DateTime.Now); //Simulate Cross Join Using LINQ Console.WriteLine("Started the LINQ Loop at {0}", DateTime.Now); var combo = from C in Context from V in Vectors select new { C, V }; IDataReader dr = combo.AsDataReader(); //This dr can then be sent to the SQLBulkCopy API. These is a sample code to do so at
//http://code.msdn.microsoft.com/LinqEntityDataReader
Console.WriteLine("Finished the LINQ Loop at {0}", DateTime.Now); Console.Read(); } } }
The output is
Total elapsed time in a foreach loop 25758 MilliSeconds
Total elapsed time in a LINQ and EntityDataReader loop 31 MilliSeconds
Performance optimization 830 times
Awesome !! In this scenario you are almost 1000 times faster . In my application with other bits of the code and
other bottlenecks like SAN IO , Network IO , SQL Server CPU bottleneck etc I could tune the application to go
from 200,000 rows / Sec to approx 2 Million rows/sec. I thought I will share the nugget with you guys.
Incredible! Are there scenarios where it wouldn’t be faster, do you think?
Interesting question. I am not 100 % sure of a scenario where it will be slower but I will pose this to C# MVPs I know and see if they have something. The reason this is faster is because of the lazy evaluation due to “Yield” keyword. So unless you call any iterator over the LINQ object it is not holding any data in memory for you. So apart from being just faster it also becomes optimized in terms of memory foot prints.