Hashset First() + Remove() performance - c#

I encountered a performance issue today involving hashset.remove and I'm still unclear on what the problem was. The question I'm left with is, why is the Approach2 method significantly faster than the Approach1 method? I'm assuming it's the calls to HashSet.Remove but the MSDN docs say HashSet.Remove is O(1).
public class HashSetTester
{
int TestNum = 20000;
public void Run()
{
var hashset2 = CreateTestHashSet();
var watch2 = new Stopwatch();
watch2.Start();
Approach2(hashset2);
watch2.Stop();
var hashset1 = CreateTestHashSet();
var watch1 = new Stopwatch();
watch1.Start();
Approach1(hashset1);
watch1.Stop();
Console.WriteLine("Approach1 is {0:0.0}x slower than Approach2", watch1.Elapsed.TotalSeconds / watch2.Elapsed.TotalSeconds);
}
HashSet<object> CreateTestHashSet()
{
var result = new HashSet<object>();
var rnd = new Random();
for (int i = 0; i < TestNum; i++)
{
result.Add(rnd.Next());
}
return result;
}
void Approach1(HashSet<object> hashset)
{
while (hashset.Any())
{
var instance = hashset.First();
hashset.Remove(instance);
DoSomething(instance, hashset);
}
}
void Approach2(HashSet<object> hashset)
{
var tempItems = new List<object>();
while (hashset.Any())
{
tempItems.Clear();
tempItems.AddRange(hashset);
hashset.Clear();
foreach (var instance in tempItems)
{
DoSomething(instance, hashset);
}
}
}
void DoSomething(object obj, HashSet<object> hashset)
{
// In some cases, hashset would be added to here
}
public static void Main()
{
new HashSetTester().Run();
}
}

If you need to do something for every entry of the hashset and then just remove the entries why not just use linq instead?
hashSet.ForEach(item => DoSomething(item, hashSet));
hashSet.Clear();
This will probably give you acceptable performance when dealing with big collections.

Related

Starting tasks inside another task is duplicating my WebRequests

I use the code below to check some pdf files online and return a string accordingly.
The problem is: When I added the second Task.Factory.StartNew() it started duplicating all requests, but still returning only one answer(as it should be).
I need this to be as fast as possible so I can't waste time sending two requests to the server.
public static void Main(string[] args)
{
var listT = new List<string>()
{
"24006025062"
};
var task = listT.Select(x => Task.Factory.StartNew(() => TesteTask(x)));
Task.WaitAll(task.ToArray(), TimeSpan.FromSeconds(120));
List<string> results = new List<string>();
foreach (var result in task)
{
results.Add(result.Result);
}
}
private static string TesteTask(string codCart)
{
var teste = new Consulta();
var retorno = string.Empty;
var session = teste.GetCaptcha();
for (int i = 0; i < 10; i++)
{
session.CaptchaResolvida = QuebraCaptcha(session.CaptchaCodificada).CaptchaResolvida;
if (session.CaptchaResolvida.Length > 0)
{
var links = teste.Consulta(codCart, session).Retorno;
if (links.Any())
{
var tasks = links.Select(x => Task.Factory.StartNew(() => Executa(teste, session, x)));
Task.WaitAll(tasks.ToArray(), TimeSpan.FromSeconds(120));
var modelList = from Result in tasks select Result.Result;
retorno = teste.FinalizaProcesso(modelList.ToList());
break;
}
}
}
return retorno;
}
private static string Executa(Consulta teste, Model<Request> session, string link)
{
var retorno = string.Empty;
for (int i = 0; i < 10; i++)
{
var CaptchaResolvida = QuebraCaptcha(teste.GetCaptchaPdf(session)).CaptchaResolvida;
if (CaptchaResolvida != null && CaptchaResolvida != string.Empty)
{
var status = teste.BaixaPdf(link, CaptchaResolvida, session);
if (status != string.Empty)
{
retorno = status;
break;
}
}
}
return retorno;
}
Ps: This is my first post on stack overflow, if I'm not clear enough please let me know!
You are getting this behavior because you are iterating twice on the Select returned IEnumerable. Try this:
public static void Main(string[] args)
{
var listT = new List<string>()
{
"24006025062"
};
var task = list
.Select(x => Task.Factory.StartNew(() => TesteTask(x)))
.ToArray();
Task.WaitAll(task, TimeSpan.FromSeconds(120));
List<string> results = new List<string>();
foreach (var result in task)
{
results.Add(result.Result);
}
}
By moving the ToArray() just after the Select() it creates the results IEnumerable only once instead of twice.
Hope it helps!

Task fired again after WaitAll

Using HttpClient.GetAsync or any of its async method, or any BCL async method in Linq Select might result in some strange twice shoot.
Here a unit test case:
[TestMethod]
public void TestTwiceShoot()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Select(d =>
{
k++;
var client = new System.Net.Http.HttpClient();
return client.GetAsync(new Uri("http://testdevserver.ibs.local:8020/prestashop/api/products/1"));
});
Task.WaitAll(tasks.ToArray());
foreach (var r in tasks)
{
}
Assert.AreEqual(1, k);
}
The test will fail, since k is 2. Somehow the program run the delegate of firing GetAsync twice. Why?
If I remove foreach (var r in tasks), the test pass. Why?
[TestMethod]
public void TestTwiceShoot()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Select(d =>
{
k++;
var client = new System.Net.Http.HttpClient();
return client.GetAsync(new Uri("http://testdevserver.ibs.local:8020/prestashop/api/products/1"));
});
Task.WaitAll(tasks.ToArray());
Assert.AreEqual(1, k);
}
If I use foreach instead of items.Select, the test pass. Why?
[TestMethod]
public void TestTwiceShoot()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = new List<Task<System.Net.Http.HttpResponseMessage>>();
foreach (var item in items)
{
k++;
var client = new System.Net.Http.HttpClient();
tasks.Add( client.GetAsync(new Uri("http://testdevserver.ibs.local:8020/prestashop/api/products/1")));
};
Task.WaitAll(tasks.ToArray());
foreach (var r in tasks)
{
}
Assert.AreEqual(1, k);
}
Apparently the enumerator returned by items.Select is not living well with the Task object returned, as soon as I walk the enumerator, the delegate got fired again.
This test pass.
[TestMethod]
public void TestTwiceShoot()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Select(d =>
{
k++;
var client = new System.Net.Http.HttpClient();
return client.GetAsync(new Uri("http://testdevserver.ibs.local:8020/prestashop/api/products/1"));
});
var tasksArray = tasks.ToArray();
Task.WaitAll(tasksArray);
foreach (var r in tasksArray)
{
}
Assert.AreEqual(1, k);
}
Scott mentioned that the Select may run again when walking the enumerator, however, this test pass
[TestMethod]
public void TestTwiceShoot()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Select(d =>
{
k++;
return int.Parse(d);
});
foreach (var r in tasks)
{
};
Assert.AreEqual(1, k);
}
I guess the Linq Select has some special treatment against Task.
After all, what's the good way of firing multiple async method in Linq and the examine the results after WaitAll?
It is because tasks is IEnumerable<Task> and each time you enumerate through the list it will re-run the .Select() operation. Currently you run through the list twice, one when you call .ToArray() and once when you pass it in to the foreach
To fix the problem just use the .ToArray() like you are but move it earlier up.
var tasks = items.Select(d =>
{
k++;
var client = new System.Net.Http.HttpClient();
return client.GetAsync(new Uri("http://testdevserver.ibs.local:8020/prestashop/api/products/1"));
}).ToArray(); //This makes tasks a "Task[]" instead of a IEnumerable<Task>.
Task.WaitAll(tasks);
foreach (var r in tasks)
{
};
Things like what happened to you is why Microsoft reccomends that when you write Linq statements that they do not have any side effects (like incrementing k) because it is hard to tell how many times the statement will be run, especially if the resultant IEnumerable<T> goes out of your scope of control by being returned as a result or passed in to a new function.
I think the problem is my misconception about how enumeration works. These tests pass:
[TestMethod]
public void TestTwiceShoot()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Select(d =>
{
k++;
return int.Parse(d);
});
foreach (var r in tasks)
{
};
foreach (var r in tasks)
{
};
Assert.AreEqual(2, k);
}
[TestMethod]
public void TestTwiceShoot2()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Where(d =>
{
k++;
return true;
});
foreach (var r in tasks)
{
};
foreach (var r in tasks)
{
};
Assert.AreEqual(2, k);
}
I had though the Linq statement returns an IEnumerable object which store the results of the delegate. However, obviously it stores only the shortcuts to the delegates, so each enumerator walk will trigger the delegate. Therefore, it is good to use ToArray() or ToList() to get a list of results, like this one:
[TestMethod]
public void TestTwiceShoot2()
{
List<string> items = new List<string>();
items.Add("1");
int k = 0;
var tasks = items.Where(d =>
{
k++;
return true;
}).ToList();
foreach (var r in tasks)
{
};
foreach (var r in tasks)
{
};
Assert.AreEqual(1, k);
}

Which is Faster List<string>.foreach or listString.GetEnumerator()

I have written the following code to print the item values inside the list. Now I want to measure which one is faster as going forward I will have huge list to deal with. And please specify why you think that's better (if any evidences)
So how do I calculate the processing time? Is manually creating a bulky List is the only option?
public void printMedod(string strPrintListVal)
{
Console.WriteLine(strPrintListVal);
}
static void Main(string[] args)
{
Program p1 = new Program();
List<string> listString = new List<string> { "Rox","Stephens","Manahat","Lexus",":)"};
listString.ForEach(p1.printMedod);
Console.ReadKey();
}
I can also do the same thing using GetEnumerator:
static void Main(string[] args)
{
List<string> listString = new List<string> { "Rox","Stephens","Manahat","Lexus",":)"};
var enumerator = listString.GetEnumerator();
while (enumerator.MoveNext())
{
var pair = enumerator.Current;
Console.WriteLine(pair);
}
Console.ReadKey();
}
I'd ForEach my way through the list, but I've been told a for loop is faster than anything else:
for(int i = 0;i < stringList.Length; i++)
{
Console.WriteLine(stringList[i]);
}
You can use LinqPad, it displays an execution time
Neither is the best option here, just do
static void Main(string[] args)
{
var stringList = new List<string>
{ "Rox", "Stephens", "Manahat", "Lexus", ":)" };
foreach(var s in stringList)
{
Console.WriteLine(s);
}
Console.ReadKey();
}
EDIT
use a StopWatch more info in this question
using System.Diagnotics;
var stopWatch = new StopWatch();
stopWatch.Start();
// Do somthing
// If somthing is really fast do lots of somthing
stopWatch.Stop();
// The duration is a TimeSpan in stopWatch.Elapsed;

Creating multiple array using a foreach loop and a select statement

I have a database that I call select all of its contents of a table. It has 18000+ items. I have a method uses a web service that can have an array of up to ten element pass into it. Right now I am doing item by item instead of by an array. I want to create an array of ten and then call the function. I could make an array of ten and then call the function be what is I have an extra three records?
public static void Main()
{
inventoryBLL inv = new inventoryBLL();
DataSet1.sDataTable dtsku = inv.SelectEverything();
foreach (DataSet1.Row row in dtsku)
{
webservicefunction(row.item);
}
}
My question is how would I transform this?
Generic solution of your problem could look like this:
static class LinqHelper
{
public static IEnumerable<T[]> SplitIntoGroups<T>(this IEnumerable<T> items, int N)
{
if (items == null || N < 1)
yield break;
T[] group = new T[N];
int size = 0;
var iter = items.GetEnumerator();
while (iter.MoveNext())
{
group[size++] = iter.Current;
if (size == N)
{
yield return group;
size = 0;
group = new T[N];
}
}
if (size > 0)
yield return group.Take(size).ToArray();
}
}
So your Main function become
public static void Main()
{
inventoryBLL inv = new inventoryBLL();
DataSet1.sDataTable dtsku = inv.SelectEverything();
foreach (var items in dtsku.Select(r => r.item).SplitIntoGroups(10))
{
webservicefunction(items);
}
}
var taken = 0;
var takecount = 10;
while(list.Count() >= taken)
{
callWebService(list.Skip(taken).Take(takecount));
taken += takecount;
}
Generic Extension Method version:
public static void AtATime<T>(this IEnumerable<T> list, int eachTime, Action<IEnumerable<T>> action)
{
var taken = 0;
while(list.Count() >= taken)
{
action(list.Skip(taken).Take(eachTime));
taken += eachTime;
}
}
Usage:
inv.SelectEverything().AtATime<Row>(10, webservicefunction);

How can I shuffle my list of strings? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Randomize a List<T> in C#
I thought I had my code working but now it seems not. Here's what I have:
public class NoteDetail
{
public NoteDetail()
{
_noteDetails = new List<string>();
}
public IList<string> NoteDetails { get { return _noteDetails; } }
private readonly List<string> _noteDetails;
}
I populate my details like this:
var noteDetail = new NoteDetail ();
noteDetail.NoteDetails.Add("aaa");
noteDetail.NoteDetails.Add("bbb");
noteDetail.NoteDetails.Add("ccc");
Now I want to shuffle so I used this routine:
public static void ShuffleGenericList<T>(IList<T> list)
{
//generate a Random instance
var rnd = new Random();
//get the count of items in the list
var i = list.Count();
//do we have a reference type or a value type
T val = default(T);
//we will loop through the list backwards
while (i >= 1)
{
//decrement our counter
i--;
//grab the next random item from the list
var nextIndex = rnd.Next(i, list.Count());
val = list[nextIndex];
//start swapping values
list[nextIndex] = list[i];
list[i] = val;
}
}
My problem is that I am not sure how to do the shuffle. I have tried the following but it gives:
Error 237 Argument 1: cannot convert from 'System.Collections.Generic.IList' to 'System.Collections.Generic.IList<.Storage.Models.NoteDetail>'
Sort.ShuffleGenericList<NoteDetail>(noteDetail.NoteDetails);
Can anyone see what I am doing wrong. It all looks okay to me and I can't see why I should get this error :-(
You should change this:
Sort.ShuffleGenericList<NoteDetail>(noteDetail.NoteDetails);
To:
Sort.ShuffleGenericList<string>(noteDetail.NoteDetails);
Because noteDetail.NoteDetails is a List<string>, not a List<NoteDetail>.
You are using the wrong type to parametrize your generic method, do this instead:
Sort.ShuffleGenericList(noteDetail.NoteDetails);
or more explicit (but unneccessary):
Sort.ShuffleGenericList<string>(noteDetail.NoteDetails);
You were passing NoteDetail as type, rather than string - that won't work.
I took your code and threw it into VS. The below execustes okay with a few small modifications:
using System;
using System.Collections.Generic;
using System.Linq;
namespace MsgBaseSerializeationTest
{
class StackOverflow
{
public void Test()
{
var noteDetail = new NoteDetail<string>();
noteDetail.NoteDetails.Add("aaa");
noteDetail.NoteDetails.Add("bbb");
noteDetail.NoteDetails.Add("ccc");
NoteDetail<string>.ShuffleGenericList(noteDetail);
}
}
public class NoteDetail<T> : List<T>
{
public NoteDetail()
{
_noteDetails = new List<string>();
}
public IList<string> NoteDetails { get { return _noteDetails; } }
private readonly List<string> _noteDetails;
public static void ShuffleGenericList(IList<T> list)
{
//generate a Random instance
var rnd = new Random();
//get the count of items in the list
var i = list.Count();
//do we have a reference type or a value type
T val = default(T);
//we will loop through the list backwards
while (i >= 1) {
//decrement our counter
i--;
//grab the next random item from the list
var nextIndex = rnd.Next(i, list.Count());
val = list[nextIndex];
//start swapping values
list[nextIndex] = list[i];
list[i] = val;
}
}
}
}

Categories