Unreliable parallel loop fails 4 out of 400 times - c#

I have a Parallel foreach function that creates a new instance of a class, that manipulates a picture, and saves it to the disk...
However approximately 4 times out of 400, the picture gets saved to the disk, but without being manipulated, my theory is that when it happens, some of the propperties existing in my class is null, when they are not suppost to...
The 4 (sometimes 3) errors mostly occurs in the first 10 images of the parallel loop.
There is no error message, it justs skip some of my code, for some reason... My breakpoint doesn't work when it is parralel, so it is hard to debug.
Any advice on how to proceed / debug / fix ?
The code as requested
private static void GenerateIcons(Effects effect)
{
DirectoryInfo dir = new DirectoryInfo(HttpContext.Current.Server.MapPath(#"~\Icons\Original\"));
FileInfo[] ff = dir.GetFiles();
string mappath = HttpContext.Current.Server.MapPath(#"~\Icons\");
List<string> paths = new List<string>();
string ids = GetAllEffectIds(effect.TinyUrlCode);
Parallel.ForEach(ff, item =>
{
if (!File.Exists(mappath + #"Generated\" + ids + "-" + item.Name))
{
paths.Add(mappath + #"Generated\" + ids + "-" + item.Name);
ApplyEffects f = new ApplyEffects(effect, item.Name, mappath);
f.SaveIcon();
}
});
//Zip icons!
ZipFiles(paths, effect.TinyUrlCode, ids, effect.General.Prefix);
}

You can re-write it in a more functional style, to hopefully remove the threading issues:
private static void GenerateIcons(Effects effect)
{
var dir = new DirectoryInfo(HttpContext.Current.Server.MapPath(#"~\Icons\Original\"));
var mappath = HttpContext.Current.Server.MapPath(#"~\Icons\");
var ids = GetAllEffectIds(effect.TinyUrlCode);
var filesToProcess = dir
.EnumerateFiles()
.AsParallel()
.Select(f => new { info = f, generated = File.Exists(mappath + #"Generated\" + ids + "-" + f.Name) })
.ToList();
Parallel.ForEach(filesToProcess.Where(f => !f.generated), file =>
{
new ApplyEffects(effect, file.info.Name, mappath).SaveIcon();
});
//Zip icons!
ZipFiles(filesToProcess.Select(f => f.info), effect.TinyUrlCode, ids, effect.General.Prefix);
}

My theory is that your list of paths is not being updated properly due to List<T> not being thread safe. Essentially if two threads try and add an item to the list at the same time any number of weird things could happen, like 4 items missing from the resulting list. Try using the lock statement.
Parallel.ForEach(ff, item =>
{
if (!File.Exists(mappath + #"Generated\" + ids + "-" + item.Name))
{
lock(paths)
{
paths.Add(mappath + #"Generated\" + ids + "-" + item.Name);
}
ApplyEffects f = new ApplyEffects(effect, item.Name, mappath);
f.SaveIcon();
}
});

Have you checked with the none parallel version?
Are you using any
API functions that is not marked as thread safe?
To answer 1), mutex lock the entire function and test for errors.
To answer 2) reduce the amount of code in the mutex until you find the offending function. You can do this by bisection.
ConcurrentBag<T> Is a thread safe container.

Related

LINQ results change at end of for loop

When performing a set of LINQ queries against a data-source (I'm using LINQ-to-SQL, but it happens here too using just a List<string> object), I end up getting a different result at the end of my checks.
Specifically, the code below is trying to find if a Fully Qualified Domain Name (FQDN) either exists in a list of host names (not all of which will be FQDNs or in the same domain, but the host identifier is what matters to me). The search is trying to find whether "host-6.domain.local" or any of its sub-components exist (i.e, "host-6.domain" or "host-6") in the list, which they do not. While inside the for-loop, we get the results we expect, but as soon as the for loop is finished, I get a result that has all of the contents of the list, which to me sounds like it is trying to find elements that match the empty string.
void MyMethod()
{
string fqdn = "host-6.domain.local";
string[] splitFqdn = fqdn.Split('.');
List<string> values = new List<string>();
values.add("host-1");
values.add("host-2.domain.local");
values.add("host-3.domain.local");
values.add("host-4");
values.add("host-5.other.local");
IEnumerable<string> queryResult = null;
for (int i = splitFqdn.Length; i > 0; i--)
{
result =
from value in values
where value.StartsWith(
string.Join(".", splitFqdn.Take(i)))
select value;
Console.WriteLine(
"Inside for loop, take " + i + ": " + result.Count());
}
Console.WriteLine();
Console.WriteLine(
"Outside for loop: " + result.Count());
}
Why is this happening and how can I get accurate results that I can still access after the for loop is finished?
You are getting bitten by LINQ's lazy execution and closure.
When you create an enumerable like you are doing here...
result =
from value in values
where value.StartsWith(
string.Join(".", splitFqdn.Take(i)))
select value;
It doesn't get evaluated until you do something that forces it to get evaluated... for instance when you do result.count()
Then later outside of your loop when you evaluate it again result.count() is evaluated with the last value of i that existed in your for loop which is not giving you what you want.
Try forcing evaluation by doing .ToList() on your enumerable like so... This code shows both values so you can compare.
void MyMethod()
{
string fqdn = "host-6.domain.local";
string[] splitFqdn = fqdn.Split('.');
List<string> values = new List<string>();
values.add("host-1");
values.add("host-2.domain.local");
values.add("host-3.domain.local");
values.add("host-4");
values.add("host-5.other.local");
IEnumerable<string> queryResult = null;
List<string> correctResult = null;
for (int i = splitFqdn.Length; i > 0; i--)
{
queryResult =
from value in values
where value.StartsWith(
string.Join(".", splitFqdn.Take(i)))
select value;
correctResult = queryResult.ToList();
Console.WriteLine(
"Inside for loop, take " + i + ": " + queryResult.Count());
}
Console.WriteLine();
Console.WriteLine(
"Outside for loop queryResult: " + queryResult.Count());
Console.WriteLine(
"Outside for loop correctResult: " + correctResult.Count());
}
EDIT: Thanks nlips for pointing out that I hadn't fully answered the question... and apologies for converting to method syntax but it would have taken longer to convert to query syntax.
void MyMethod()
{
string fqdn = "host-6.domain.local";
string[] splitFqdn = fqdn.Split('.');
List<string> values = new List<string>();
values.Add("host-1");
values.Add("host-2.domain.local");
values.Add("host-3.domain.local");
values.Add("host-4");
values.Add("host-5.other.local");
values.Add("host-5.other.local");
IEnumerable<string> queryResult = null;
List<string> correctResult = new List<string>();
for (int i = splitFqdn.Length; i > 0; i--)
{
correctResult = correctResult
.Union(values.Where(
value => value.StartsWith(string.Join(".", splitFqdn.Take(i)))))
.ToList();
}
}
I really like Kevin's answer to my question, but I wasn't a huge fan of calling .ToList() on the result since this would cause all of the objects that matched to be pulled from the database (eating up more memory) rather than executing a query that simply got the count of matching objects (which is a little faster and doesn't take the memory to store the objects), so using the information from his post, I have this additional solution that doesn't require pulling all objects from a database, and only runs a COUNT query (in the SQL sense).
To avoid the issue caused by capturing i which then becomes 0 at the end of the for-loop, I simply set up a temporary variable to hold the value I'm searching for.
void MyMethod()
{
string fqdn = "host-6.domain.local";
string[] splitFqdn = fqdn.Split('.');
List<string> values = new List<string>();
values.add("host-1");
values.add("host-2.domain.local");
values.add("host-3.domain.local");
values.add("host-4");
values.add("host-5.other.local");
IEnumerable<string> queryResult = null;
for (int i = splitFqdn.Length; i > 0; i--)
{
//taking the line referencing i out of the
//query expression prevents referencing i
//after it is set to 0 outside the for loop
string temp = string.Join(".", splitFqdn.Take(i));
//since temp isn't changed anywhere else, it won't
//get set to an invalid value after the loop exits
result =
from value in values
where value.StartsWith(temp)
select value;
Console.WriteLine(
"Inside for loop, take " + i + ": " + result.Count());
}
Console.WriteLine();
Console.WriteLine(
"Outside for loop: " + result.Count());
}
I think you need to call ToList when assigning to the result variable like this:
result =
(from value in values
where value.StartsWith(
string.Join(".", splitFqdn.Take(i)))
select value).ToList();

Perform a linq expression for 'contains' with searching through a list for 'like' not exact matches

Okay so I am stumped and have looked around for this and I know I am doing the implementation of something very simple more complex than it needs to be. Basically I have a POCO object that will have a member that contains a string of other members. This is labeled as 'st' and it may have strings that are comma seperated series in one string. Thus I may have two members of strings be 'images, reports' and another 'cms, crm'. I have a list of objects that I want to match for PART OF those strings but not necessarily all as a DISTINCT LIST. So a member of 'cms' would return the value of anything that contained 'cms' thus 'cms, crm' would be returned.
I want to hook this up so a generic List can be queried but I cannot get it to work and was looking at other threads but there methods do not work in my case. I keep thinking it is something simple but I am missing it completely. Please let me know if anyone has better ideas. I was looking here but could not get the logic to apply correctly:
Linq query list contains a list
I keep trying methods of 'Select', 'SelectMany', 'Contains', 'Any', 'All' at different levels of scope of the continuations to no avail. Here is a simple excerpt of where I am at with a simple Console app example:
public class Program
{
public class StringModel
{
public string name { get; set; }
public string str { get; set; }
}
static void Main(string[] args)
{
string s = "";
List<StringModel> sm = new List<StringModel>
{
new StringModel
{
name = "Set1",
str = "images, reports"
},
new StringModel
{
name = "Set2",
str = "cms, crm"
},
new StringModel
{
name = "Set3",
str = "holiday, pto, cms"
}
};
sm.ForEach(x => s += x.name + "\t" + x.str + "\n");
var selected = new List<object> {"cms", "crm"};
s += "\n\nITEMS TO SELECT: \n\n";
selected.ForEach(x => s += x + "\n");
s += "\n\nSELECTED ITEMS: \n\n";
// works on a single item just fine
var result = sm.Where(p => p.str.Contains("cms")).Select(x => new { x.name, x.str}).ToList();
// I am not using select to get POCO on other methods till I can get base logic to work.
// Does not return anything
var result2 = sm.Where(p => selected.Any(x => x == p.str)).ToList();
// Does not return anything
var result3 = sm.Where(p => selected.Any(x => selected.Contains(p.str))).ToList();
result.ForEach(y => s += y + "\n");
s += "\n\n2nd SET SELECTED: \n\n";
result2.ForEach(y => s += y + "\n");
s += "\n\n3rd SET SELECTED: \n\n";
result3.ForEach(y => s += y + "\n");
Console.WriteLine(s);
Console.ReadLine();
}
}
result2 is empty because you're comparing an object (x) with a string (StringModel.str). This will be a reference comparison. Even if you convert x to a string, you'll be comparing each value in selected ("cms", "crm") with your comma-separated string values ("images, reports", "cms, crm", "holiday, pto, cms").
result3 is empty because selected ("cms", "crm") does not contain any of the string values ("images, reports", "cms, crm", "holiday, pto, cms"), although in this case at least the comparisons are value comparisons.
I think you're looking for something like:
var result = sm.Where(p => selected.Any(x => p.str.Contains((string)x)));

Adding to a Session ASP.NET C#

I would like to add a Session["i"] with more results
currently it will only shows one set of results
e.g. School1 11/10/2011 14/11/2011 GCSE AAA
I would like to add more sets but they do not seem to be getting stored in the Session
e.g.
School1 11/10/2011 14/11/2011 GCSE AAA
School2 11/10/2012 14/11/2012 ALevels AAA
Education addResults = new Education(schoolName, fromDate, toDate , qualification , grades);
Session["i"] = (addResults );
//schoolarraylist.Add(addResults );
foreach (Education currentschool in schoolarraylist)
{
Session["i"] = currentschool.Schoollocation + "," + currentschool.Datefrom + "," + currentschool.Dateto + "," + currentschool.Qualifications + "," + currentschool.Grade + "<br />";
string tmp = Session["i"].ToString();
string[] sb = tmp.Split(',');
string [] ii = new string[sb.GetUpperBound(0) + 1];
for (int i = 0; i <= sb.GetUpperBound(0); i++)
{
ib[i] = (sb[i]);
}
foreach (string j in ii)
{
Response.Write(ii);
}
}
You can assign list of object to session and later get it back. But you should not put data in seesion without need. Session are maintained on server side for each user and putting data in session takes memory of server and it is could degrade the performance of the application. Its worth reading about sessions before using them.
List<string> lst = new List<string>();
Session["i"] = lst;
Getting list back from session object.
List<string> lst = (List<string>)Session["i"];
The Problem is, that you assign something to Session["i"] and when you try to add something to the session, you actually overwrite your previous value. In order to add objects to the Session, you either have to chose another name e.g. Session["j"] or wrap some sort of container around your objects (List, Array, Dictionary, etc.) and store that container in your Session.
Also try to find better names for your Sessions, if you take a look at your code at a later point, you probably won't know what Session["i"] is actually supposed to be.
You can also use a ArrayList :
ArrayList list = new ArrayList();
foreach (Education currentschool in schoolarraylist)
{
list.Add(currentschool.Schoollocation + "," + currentschool.Datefrom + "," + currentschool.Dateto + "," + currentschool.Qualifications + "," + currentschool.Grade)
}
Then loop through the arraylist and display in whatecver format you want to display

Rewriting foreach using IObservable and Reactive Framework

I'm in VS2008 with Entity Framework. I'm accessing objects from the database using esql for WHERE IN functionality. I'm passing a ton of IDs to the select statement so I chunk it up into sets of 800. Then I merge the results together from each chunk. My goal is to obtain results for each chunk in parallel, rather than waiting synchronously. I installed Reactive Framework and am pretty sure I need to make use of ForkJoin. However, I can't figure out how to convert this function to use it. Here's my existing code:
public static IList<TElement> SelectWhereIn<TElement, TValue>(this ObjectContext context, string fieldName, IList<TValue> idList)
{
var chunkedIds = idList.Split(CHUNK_SIZE);
string entitySetName = typeof(TElement).Name + "Set";
var retList = new List<TElement>();
foreach (var idChunk in chunkedIds)
{
string delimChunk = string.Join(",", idChunk.Select(x => x.ToString()).ToArray());
ObjectQuery<TElement> query = context.CreateQuery<TElement>("SELECT VALUE x FROM " + entitySetName + " AS x");
query = query.Where("it." + fieldName + " IN {" + delimChunk + "}");
retList.AddRange(query);
}
return retList;
}
Thanks!
EDIT >>>
I modified the code to use Poor Man's as below:
public static IList<TElement> SelectWhereIn<TElement, TValue>(this ObjectContext context, string fieldName, IList<TValue> idList)
{
var chunkedIds = idList.Split(CHUNK_SIZE);
string entitySetName = typeof(TElement).Name + "Set";
var chunkLists = new List<IEnumerable<TElement>>();
Parallel.ForEach(chunkedIds, idChunk =>
{
string delimChunk = string.Join(",", idChunk.Select(x => x.ToString()).ToArray());
ObjectQuery<TElement> query = context.CreateQuery<TElement>("SELECT VALUE x FROM " + entitySetName + " AS x");
query = query.Where("it." + fieldName + " IN {" + delimChunk + "}");
chunkLists.Add(query.ToList());
});
var retList = new List<TElement>();
foreach (var chunkList in chunkLists)
{
retList.AddRange(chunkList);
}
return retList;
}
It worked great the first time. But the second time I ran it, I got this error:
The connection was not closed. The connection's current state is connecting.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.InvalidOperationException: The connection was not closed. The connection's current state is connecting.
Source Error:
Line 49: foreach (var iAsyncResult in resultList)
Line 50: {
Line 51: del.EndInvoke(iAsyncResult);
Line 52: iAsyncResult.AsyncWaitHandle.Close();
Line 53: }
It's interesting, b/c Emre (the author of the library) has an edit to his original post talking about how he added those lines of code for added safety. Am i using it right? Or was his v1 safer after all?
VS2010 does have that with PLINQ. Using the extensions AsParallel().WithDegreeOfParallelism(nbProcessors) would do what you need.
With VS2008, I've used Poor Man's Parallel.ForEach Iterator by Emre Aydinceren in the past when I was trying to work around a performance bottleneck, try to give it a shot.
EDIT: In reaction to the error you added, it might be a random shot in the dark, but seperate contexts for each thread ? Like so:
Parallel.ForEach(chunkedIds, idChunk =>
{
ObjectContext context = new MyContext(connStr);//depending what's your config
// like, with or w/o conn string
string delimChunk = string.Join(",", idChunk.Select(x => x.ToString()).ToArray());
ObjectQuery<TElement> query = context.CreateQuery<TElement>("SELECT VALUE x FROM " + entitySetName + " AS x");
query = query.Where("it." + fieldName + " IN {" + delimChunk + "}");
chunkLists.Add(query.ToList());
});
You might have to tweak around some things (like take the connextion string from the Context extended to instantiate new Contexts).

Get list of titles from xml files

I am trying to get titles of xml files from a folder call "bugs".
My code:
public virtual List<IBug> FillBugs()
{
string folder = xmlStorageLocation + "bugs" + Path.DirectorySeparatorChar;
List<IBug> bugs = new List<IBug>();
foreach (string file in Directory.GetFiles(folder, "*.xml", SearchOption.TopDirectoryOnly))
{
var q = from b in bugs
select new IBug
{
Title = b.Title,
Id = b.Id,
};
return q.ToList();
}
return bugs;
}
But I'm not geting out the titles from all the xml files in the folder "bugs".
the biggest problem is to get eatch files to singel string and not string[].
Your code as written doesn't make any sense. Perhaps you meant something more like this:
public virtual List<IBug> FillBugs()
{
// is this actually correct or did you mix up the concatenation order?
// either way, I suggest Path.Combine() instead
string folder = xmlStorageLocation + "bugs" + Path.DirectorySeparatorChar;
List<IBug> bugs = new List<IBug>();
foreach (string file in Directory.GetFiles(folder, "*.xml",
SearchOption.TopDirectoryOnly))
{
// i guess IBug is not actually an interface even though it starts
// with "I" since you made one in your code
bugs.Add(new IBug {
Title = file, Id = 0 /* don't know where you get an ID */ });
}
return bugs;
}
"from b in bugs" selects from an empty list. you need to initialize bugs from the file at the start of your foreach loop
Do you need a backslash (Path.DirectorySeparatorChar) between xmlStorageLocation and "bugs"?
You don't use file in your loop anywhere - Is that correct or did you miss to push it into the collection?

Categories