Producer Consumer model using TPL, Tasks in .net 4.0 - c#

I have a fairly large XML file(around 1-2GB).
The requirement is to persist the xml data in to database.
Currently this is achieved in 3 steps.
Read the large file with less memory foot print as much as possible
Create entities from the xml-data
Store the data from the created entities in to the database using SqlBulkCopy.
To achieve better performance I want to create a Producer-consumer model where the producer creates a set of entities say a batch of 10K and adds it to a Queue. And the consumer should take the batch of entities from the queue and persist to the database using sqlbulkcopy.
Thanks,
Gokul
void Main()
{
int iCount = 0;
string fileName = #"C:\Data\CatalogIndex.xml";
DateTime startTime = DateTime.Now;
Console.WriteLine("Start Time: {0}", startTime);
FileInfo fi = new FileInfo(fileName);
Console.WriteLine("File Size:{0} MB", fi.Length / 1048576.0);
/* I want to change this loop to create a producer consumer pattern here to process the data parallel-ly
*/
foreach (var element in StreamElements(fileName,"title"))
{
iCount++;
}
Console.WriteLine("Count: {0}", iCount);
Console.WriteLine("End Time: {0}, Time Taken:{1}", DateTime.Now, DateTime.Now - startTime);
}
private static IEnumerable<XElement> StreamElements(string fileName, string elementName)
{
using (var rdr = XmlReader.Create(fileName))
{
rdr.MoveToContent();
while (!rdr.EOF)
{
if ((rdr.NodeType == XmlNodeType.Element) && (rdr.Name == elementName))
{
var e = XElement.ReadFrom(rdr) as XElement;
yield return e;
}
else
{
rdr.Read();
}
}
rdr.Close();
}
}

Is this what you are trying to do?
void Main()
{
const int inputCollectionBufferSize = 1024;
const int bulkInsertBufferCapacity = 100;
const int bulkInsertConcurrency = 4;
BlockingCollection<object> inputCollection = new BlockingCollection<object>(inputCollectionBufferSize);
Task loadTask = Task.Factory.StartNew(() =>
{
foreach (object nextItem in ReadAllElements(...))
{
// this will potentially block if there are already enough items
inputCollection.Add(nextItem);
}
// mark this collection as done
inputCollection.CompleteAdding();
});
Action parseAction = () =>
{
List<object> bulkInsertBuffer = new List<object>(bulkInsertBufferCapacity);
foreach (object nextItem in inputCollection.GetConsumingEnumerable())
{
if (bulkInsertBuffer.Length == bulkInsertBufferCapacity)
{
CommitBuffer(bulkInsertBuffer);
bulkInsertBuffer.Clear();
}
bulkInsertBuffer.Add(nextItem);
}
};
List<Task> parseTasks = new List<Task>(bulkInsertConcurrency);
for (int i = 0; i < bulkInsertConcurrency; i++)
{
parseTasks.Add(Task.Factory.StartNew(parseAction));
}
// wait before exiting
loadTask.Wait();
Task.WaitAll(parseTasks.ToArray());
}

Related

Parallel execution issue

I need a little education here with regards to the execution of parallel tasks.
I have created a small fiddle:
https://dotnetfiddle.net/JO2a4m
What I am trying to do send a few accounts to process in batches to another method and creating a unit of work (task) for each batch but when I execute the tasks, it only executes the last task which was added. This is something I am trying to break my head around.
Code:
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
public class Program
{
public static void Main()
{
var accounts = GenerateAccount();
var accountsProcess = new List<Account>();
var taskList = new List<Task>();
var batch = 4;
var count = 0;
foreach (var account in accounts)
{
if (count == batch)
{
taskList.Add(new Task(() => ProcessAccount(accountsProcess)));
count = 0;
accountsProcess.Clear();
}
count++;
accountsProcess.Add(account);
}
Parallel.ForEach(taskList, t =>
{
t.Start();
}
);
Task.WaitAll(taskList.ToArray());
if (accountsProcess.Count > 0)
ProcessAccount(accountsProcess);
}
public static List<Account> GenerateAccount()
{
var accounts = new List<Account>();
var first = "First";
var second = "Second";
for (int i = 0; i <= 1000; i++)
{
var account = new Account();
account.first = first + i;
account.second = second + i;
accounts.Add(account);
}
return accounts;
}
public static void ProcessAccount(List<Account> accounts)
{
Console.WriteLine(accounts.Count);
foreach (var account in accounts)
{
Console.WriteLine(account.first + account.second);
}
}
}
public class Account
{
public string first;
public string second;
}
foreach (var account in accounts)
{
if (count == batch)
{
taskList.Add(new Task(() => ProcessAccount(accountsProcess)));
count = 0;
accountsProcess.Clear();
}
count++;
accountsProcess.Add(account);
}
The issue is that all of the Tasks are sharing the same List<Account> object.
I would suggest changing the code to:
foreach (var account in accounts)
{
if (count == batch)
{
var bob = accountsProcess;
taskList.Add(new Task(() => ProcessAccount(bob)));
count = 0;
accountsProcess = new List<Account>();
}
count++;
accountsProcess.Add(account);
}
By using bob and assigning a new List to accountsProcess we ensure each Task gets its own List - rather than sharing a single List.
Also, consider using MoreLINQ's Batch rather than rolling your own.

How to process directory files in Task parallel library?

I have a scenario in which i have to process the multiple files(e.g. 30) parallel based on the processor cores. I have to assign these files to separate tasks based on no of processor cores. I don't know how to make a start and end limit of each task to process. For example each and every task knows how many files it has to process.
private void ProcessFiles(object e)
{
try
{
var diectoryPath = _Configurations.Descendants().SingleOrDefault(Pr => Pr.Name == "DirectoryPath").Value;
var FilePaths = Directory.EnumerateFiles(diectoryPath);
int numCores = System.Environment.ProcessorCount;
int NoOfTasks = FilePaths.Count() > numCores ? (FilePaths.Count()/ numCores) : FilePaths.Count();
for (int i = 0; i < NoOfTasks; i++)
{
Task.Factory.StartNew(
() =>
{
int startIndex = 0, endIndex = 0;
for (int Count = startIndex; Count < endIndex; Count++)
{
this.ProcessFile(FilePaths);
}
});
}
}
catch (Exception ex)
{
throw;
}
}
For problems such as yours, there are concurrent data structures available in C#. You want to use BlockingCollection and store all the file names in it.
Your idea of calculating the number of tasks by using the number of cores available on the machine is not very good. Why? Because ProcessFile() may not take the same time for each file. So, it would be better to start the number of tasks as the number of cores you have. Then, let each task read file name one by one from the BlockingCollection and then process the file, until the BlockingCollection is empty.
try
{
var directoryPath = _Configurations.Descendants().SingleOrDefault(Pr => Pr.Name == "DirectoryPath").Value;
var filePaths = CreateBlockingCollection(directoryPath);
//Start the same #tasks as the #cores (Assuming that #files > #cores)
int taskCount = System.Environment.ProcessorCount;
for (int i = 0; i < taskCount; i++)
{
Task.Factory.StartNew(
() =>
{
string fileName;
while (!filePaths.IsCompleted)
{
if (!filePaths.TryTake(out fileName)) continue;
this.ProcessFile(fileName);
}
});
}
}
And the CreateBlockingCollection() would be as follows:
private BlockingCollection<string> CreateBlockingCollection(string path)
{
var allFiles = Directory.EnumerateFiles(path);
var filePaths = new BlockingCollection<string>(allFiles.Count);
foreach(var fileName in allFiles)
{
filePaths.Add(fileName);
}
filePaths.CompleteAdding();
return filePaths;
}
You will have to modify your ProcessFile() to receive a file name now instead of taking all the file paths and processing its chunk.
The advantage of this approach is that now your CPU won't be over or under subscribed and the load will be evenly balanced too.
I haven't run the code myself, so there might be some syntax error in my code. Feel free to correct the error, if you come across any.
Based on my admittedly limited understanding of the TPL, I think your code could be rewritten as such:
private void ProcessFiles(object e)
{
try
{
var diectoryPath = _Configurations.Descendants().SingleOrDefault(Pr => Pr.Name == "DirectoryPath").Value;
var FilePaths = Directory.EnumerateFiles(diectoryPath);
Parallel.ForEach(FilePaths, path => this.ProcessFile(path));
}
catch (Exception ex)
{
throw;
}
}
regards

Use Task.Run instead of Delegate.BeginInvoke

I have recently upgraded my projects to ASP.NET 4.5 and I have been waiting a long time to use 4.5's asynchronous capabilities. After reading the documentation I'm not sure whether I can improve my code at all.
I want to execute a task asynchronously and then forget about it. The way that I'm currently doing this is by creating delegates and then using BeginInvoke.
Here's one of the filters in my project with creates an audit in our database every time a user accesses a resource that must be audited:
public override void OnActionExecuting(ActionExecutingContext filterContext)
{
var request = filterContext.HttpContext.Request;
var id = WebSecurity.CurrentUserId;
var invoker = new MethodInvoker(delegate
{
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
});
invoker.BeginInvoke(StopAsynchronousMethod, invoker);
base.OnActionExecuting(filterContext);
}
But in order to finish this asynchronous task, I need to always define a callback, which looks like this:
public void StopAsynchronousMethod(IAsyncResult result)
{
var state = (MethodInvoker)result.AsyncState;
try
{
state.EndInvoke(result);
}
catch (Exception e)
{
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(e, username);
}
}
I would rather not use the callback at all due to the fact that I do not need a result from the task that I am invoking asynchronously.
How can I improve this code with Task.Run() (or async and await)?
If I understood your requirements correctly, you want to kick off a task and then forget about it. When the task completes, and if an exception occurred, you want to log it.
I'd use Task.Run to create a task, followed by ContinueWith to attach a continuation task. This continuation task will log any exception that was thrown from the parent task. Also, use TaskContinuationOptions.OnlyOnFaulted to make sure the continuation only runs if an exception occurred.
Task.Run(() => {
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
}).ContinueWith(task => {
task.Exception.Handle(ex => {
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(ex, username);
});
}, TaskContinuationOptions.OnlyOnFaulted);
As a side-note, background tasks and fire-and-forget scenarios in ASP.NET are highly discouraged. See The Dangers of Implementing Recurring Background Tasks In ASP.NET
It may sound a bit out of scope, but if you just want to forget after you launch it, why not using directly ThreadPool?
Something like:
ThreadPool.QueueUserWorkItem(
x =>
{
try
{
// Do something
...
}
catch (Exception e)
{
// Log something
...
}
});
I had to do some performance benchmarking for different async call methods and I found that (not surprisingly) ThreadPool works much better, but also that, actually, BeginInvoke is not that bad (I am on .NET 4.5). That's what I found out with the code at the end of the post. I did not find something like this online, so I took the time to check it myself. Each call is not exactly equal, but it is more or less functionally equivalent in terms of what it does:
ThreadPool: 70.80ms
Task: 90.88ms
BeginInvoke: 121.88ms
Thread: 4657.52ms
public class Program
{
public delegate void ThisDoesSomething();
// Perform a very simple operation to see the overhead of
// different async calls types.
public static void Main(string[] args)
{
const int repetitions = 25;
const int calls = 1000;
var results = new List<Tuple<string, double>>();
Console.WriteLine(
"{0} parallel calls, {1} repetitions for better statistics\n",
calls,
repetitions);
// Threads
Console.Write("Running Threads");
results.Add(new Tuple<string, double>("Threads", RunOnThreads(repetitions, calls)));
Console.WriteLine();
// BeginInvoke
Console.Write("Running BeginInvoke");
results.Add(new Tuple<string, double>("BeginInvoke", RunOnBeginInvoke(repetitions, calls)));
Console.WriteLine();
// Tasks
Console.Write("Running Tasks");
results.Add(new Tuple<string, double>("Tasks", RunOnTasks(repetitions, calls)));
Console.WriteLine();
// Thread Pool
Console.Write("Running Thread pool");
results.Add(new Tuple<string, double>("ThreadPool", RunOnThreadPool(repetitions, calls)));
Console.WriteLine();
Console.WriteLine();
// Show results
results = results.OrderBy(rs => rs.Item2).ToList();
foreach (var result in results)
{
Console.WriteLine(
"{0}: Done in {1}ms avg",
result.Item1,
(result.Item2 / repetitions).ToString("0.00"));
}
Console.WriteLine("Press a key to exit");
Console.ReadKey();
}
/// <summary>
/// The do stuff.
/// </summary>
public static void DoStuff()
{
Console.Write("*");
}
public static double RunOnThreads(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var toProcess = calls;
var stopwatch = new Stopwatch();
var resetEvent = new ManualResetEvent(false);
var threadList = new List<Thread>();
for (var i = 0; i < calls; i++)
{
threadList.Add(new Thread(() =>
{
// Do something
DoStuff();
// Safely decrement the counter
if (Interlocked.Decrement(ref toProcess) == 0)
{
resetEvent.Set();
}
}));
}
stopwatch.Start();
foreach (var thread in threadList)
{
thread.Start();
}
resetEvent.WaitOne();
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnThreadPool(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var toProcess = calls;
var resetEvent = new ManualResetEvent(false);
var stopwatch = new Stopwatch();
var list = new List<int>();
for (var i = 0; i < calls; i++)
{
list.Add(i);
}
stopwatch.Start();
for (var i = 0; i < calls; i++)
{
ThreadPool.QueueUserWorkItem(
x =>
{
// Do something
DoStuff();
// Safely decrement the counter
if (Interlocked.Decrement(ref toProcess) == 0)
{
resetEvent.Set();
}
},
list[i]);
}
resetEvent.WaitOne();
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnBeginInvoke(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var beginInvokeStopwatch = new Stopwatch();
var delegateList = new List<ThisDoesSomething>();
var resultsList = new List<IAsyncResult>();
for (var i = 0; i < calls; i++)
{
delegateList.Add(DoStuff);
}
beginInvokeStopwatch.Start();
foreach (var delegateToCall in delegateList)
{
resultsList.Add(delegateToCall.BeginInvoke(null, null));
}
// We lose a bit of accuracy, but if the loop is big enough,
// it should not really matter
while (resultsList.Any(rs => !rs.IsCompleted))
{
Thread.Sleep(10);
}
beginInvokeStopwatch.Stop();
totalMs += beginInvokeStopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnTasks(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var resultsList = new List<Task>();
var stopwatch = new Stopwatch();
stopwatch.Start();
for (var i = 0; i < calls; i++)
{
resultsList.Add(Task.Factory.StartNew(DoStuff));
}
// We lose a bit of accuracy, but if the loop is big enough,
// it should not really matter
while (resultsList.Any(task => !task.IsCompleted))
{
Thread.Sleep(10);
}
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
}
Here's one of the filters in my project with creates an audit in our database every time a user accesses a resource that must be audited
Auditing is certainly not something I would call "fire and forget". Remember, on ASP.NET, "fire and forget" means "I don't care whether this code actually executes or not". So, if your desired semantics are that audits may occasionally be missing, then (and only then) you can use fire and forget for your audits.
If you want to ensure your audits are all correct, then either wait for the audit save to complete before sending the response, or queue the audit information to reliable storage (e.g., Azure queue or MSMQ) and have an independent backend (e.g., Azure worker role or Win32 service) process the audits in that queue.
But if you want to live dangerously (accepting that occasionally audits may be missing), you can mitigate the problems by registering the work with the ASP.NET runtime. Using the BackgroundTaskManager from my blog:
public override void OnActionExecuting(ActionExecutingContext filterContext)
{
var request = filterContext.HttpContext.Request;
var id = WebSecurity.CurrentUserId;
BackgroundTaskManager.Run(() =>
{
try
{
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
}
catch (Exception e)
{
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(e, username);
}
});
base.OnActionExecuting(filterContext);
}

How to start processing task simultaneously in azure worker role

I have developed worker role application to process different tasks. For example Task1, task2 has 100 records and i have stored theme in queue and i would like to start processing both simultaneously and span load over multiple instances.
In future there will be more task and records inside task to process are going to increase.
so how can i improve below method to process records efficiently?
Currently I have done code sequentially as below
private void ProcessTaskQueues()
{
var currentInterval1 = 0;
var maxInterval1 = 15;
var currentInterval2 = 0;
var maxInterval2 = 15;
string queueName1 = RoleEnvironment.GetConfigurationSettingValue("Task1Queue");
CloudQueue queue1 = storageAccount.CreateCloudQueueClient().GetQueueReference(queueName1);
queue1.CreateIfNotExists();
string queueName2 = RoleEnvironment.GetConfigurationSettingValue("Task2Queue");
CloudQueue queue2 = storageAccount.CreateCloudQueueClient().GetQueueReference(queueName2);
queue2.CreateIfNotExists();
while (true)
{
try
{
TaskPerformer tp = new TaskPerformer();
// Task 1
Trace.WriteLine(string.Format("[{0}] - [TASK1] Fetch Message queue", DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss")));
var cloudQueueMessage1 = queue1.GetMessage();
if (cloudQueueMessage1 != null)
{
currentInterval1 = 0;
if (cloudQueueMessage1.DequeueCount <= 1)
{
var item = cloudQueueMessage1.FromMessage<Task1Item>();
tp.ExecuteTask1(item);
Trace.WriteLine(string.Format("[{0}] - [TASK1] Message Executed for ID : {1}", DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss"), item.MLPID));
queue2.DeleteMessage(cloudQueueMessage1);
}
}
else
{
if (currentInterval1 < maxInterval1)
{
currentInterval1++;
Trace.WriteLine(string.Format("[{0}] - Waiting for {1} seconds", DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss"), currentInterval1));
}
Thread.Sleep(TimeSpan.FromSeconds(currentInterval1));
}
// Task 2
Trace.WriteLine(string.Format("[{0}] - [TASK2] Fetch Message queue", DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss")));
var cloudQueueMessage2 = queue2.GetMessage();
if (cloudQueueMessage2 != null)
{
currentInterval2 = 0;
if (cloudQueueMessage2.DequeueCount <= 1)
{
var dns = cloudQueueMessage2.FromMessage<DNS>();
tp.ExecuteTask2(dns);
Trace.WriteLine(string.Format("[{0}] - [TASK2] Message Executed for ID : {1}", DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss"), dns.ID));
queue2.DeleteMessage(cloudQueueMessage2);
}
}
else
{
if (currentInterval2 < maxInterval2)
{
currentInterval2++;
Trace.WriteLine(string.Format("[{0}] - Waiting for {1} seconds", DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss"), currentInterval2));
}
Thread.Sleep(TimeSpan.FromSeconds(currentInterval2));
}
}
catch (Exception)
{ }
}
}
From comment of this question it leads me to search for alternate solution with only 1 queue for multiple type of data items and i got solution of that here
How to use one object to store multiple type of data

Does a separate thread need it's own VersionControlServer instance to work simultaneously with another?

Suppose there are an arbitrary number of threads in my C# program. Each thread needs to look up the changeset ids for a particular path by looking up it's history. The method looks like so:
public List<int> GetIdsFromHistory(string path, VersionControlServer tfsClient)
{
IEnumerable submissions = tfsClient.QueryHistory(
path,
VersionSpec.Latest,
0,
RecursionType.None, // Assume that the path is to a file, not a directory
null,
null,
null,
Int32.MaxValue,
false,
false);
List<int> ids = new List<int>();
foreach(Changeset cs in submissions)
{
ids.Add(cs.ChangesetId);
}
return ids;
}
My question is, does each thread need it's own VersionControlServer instance or will one suffice? My intuition tells me that each thread needs its own instance since the TFS SDK uses webservices and I should probably have more than one connection open if I'm really going to get the parallel behavior. If I only use one connection, my intuition tells me that I'll get serial behavior even though I've got multiple threads.
If I need as many instances as there are threads, I think of using an Object-Pool pattern, but will the connections time out and close over a long period if not being used? The docs seem sparse in this regard.
It would appear that threads using the SAME client is the fastest option.
Here's the output from a test program that runs 4 tests 5 times each and returns the average result in milliseconds. Clearly using the same client across multiple threads is the fastest execution:
Parallel Pre-Alloc: Execution Time Average (ms): 1921.26044
Parallel AllocOnDemand: Execution Time Average (ms): 1391.665
Parallel-SameClient: Execution Time Average (ms): 400.5484
Serial: Execution Time Average (ms): 1472.76138
For reference, here's the test program itself (also on GitHub):
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.TeamFoundation;
using Microsoft.TeamFoundation.Client;
using Microsoft.TeamFoundation.VersionControl.Client;
using System.Collections;
using System.Threading.Tasks;
using System.Diagnostics;
namespace QueryHistoryPerformanceTesting
{
class Program
{
static string TFS_COLLECTION = /* TFS COLLECTION URL */
static VersionControlServer GetTfsClient()
{
var projectCollectionUri = new Uri(TFS_COLLECTION);
var projectCollection = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(projectCollectionUri, new UICredentialsProvider());
projectCollection.EnsureAuthenticated();
return projectCollection.GetService<VersionControlServer>();
}
struct ThrArg
{
public VersionControlServer tfc { get; set; }
public string path { get; set; }
}
static List<string> PATHS = new List<string> {
// ASSUME 21 FILE PATHS
};
static int NUM_RUNS = 5;
static void Main(string[] args)
{
var results = new List<TimeSpan>();
for (int i = NUM_RUNS; i > 0; i--)
{
results.Add(RunTestParallelPreAlloc());
}
Console.WriteLine("Parallel Pre-Alloc: Execution Time Average (ms): " + results.Select(t => t.TotalMilliseconds).Average());
results.Clear();
for (int i = NUM_RUNS; i > 0; i--)
{
results.Add(RunTestParallelAllocOnDemand());
}
Console.WriteLine("Parallel AllocOnDemand: Execution Time Average (ms): " + results.Select(t => t.TotalMilliseconds).Average());
results.Clear();
for (int i = NUM_RUNS; i > 0; i--)
{
results.Add(RunTestParallelSameClient());
}
Console.WriteLine("Parallel-SameClient: Execution Time Average (ms): " + results.Select(t => t.TotalMilliseconds).Average());
results.Clear();
for (int i = NUM_RUNS; i > 0; i--)
{
results.Add(RunTestSerial());
}
Console.WriteLine("Serial: Execution Time Average (ms): " + results.Select(t => t.TotalMilliseconds).Average());
}
static TimeSpan RunTestParallelPreAlloc()
{
var paths = new List<ThrArg>();
paths.AddRange( PATHS.Select( x => new ThrArg { path = x, tfc = GetTfsClient() } ) );
return RunTestParallel(paths);
}
static TimeSpan RunTestParallelAllocOnDemand()
{
var paths = new List<ThrArg>();
paths.AddRange(PATHS.Select(x => new ThrArg { path = x, tfc = null }));
return RunTestParallel(paths);
}
static TimeSpan RunTestParallelSameClient()
{
var paths = new List<ThrArg>();
var _tfc = GetTfsClient();
paths.AddRange(PATHS.Select(x => new ThrArg { path = x, tfc = _tfc }));
return RunTestParallel(paths);
}
static TimeSpan RunTestParallel(List<ThrArg> args)
{
var allIds = new List<int>();
var stopWatch = new Stopwatch();
stopWatch.Start();
Parallel.ForEach(args, s =>
{
allIds.AddRange(GetIdsFromHistory(s.path, s.tfc));
}
);
stopWatch.Stop();
return stopWatch.Elapsed;
}
static TimeSpan RunTestSerial()
{
var allIds = new List<int>();
VersionControlServer tfsc = GetTfsClient();
var stopWatch = new Stopwatch();
stopWatch.Start();
foreach (string s in PATHS)
{
allIds.AddRange(GetIdsFromHistory(s, tfsc));
}
stopWatch.Stop();
return stopWatch.Elapsed;
}
static List<int> GetIdsFromHistory(string path, VersionControlServer tfsClient)
{
if (tfsClient == null)
{
tfsClient = GetTfsClient();
}
IEnumerable submissions = tfsClient.QueryHistory(
path,
VersionSpec.Latest,
0,
RecursionType.None, // Assume that the path is to a file, not a directory
null,
null,
null,
Int32.MaxValue,
false,
false);
List<int> ids = new List<int>();
foreach(Changeset cs in submissions)
{
ids.Add(cs.ChangesetId);
}
return ids;
}

Categories