Looking for a better pattern for a visitor-IAsyncEnumerable? - c#

Consider this snippet (much simplified than the original code):
async IAsyncEnumerable<(DateTime, double)> GetSamplesAsync()
{
// ...
var cbpool = new CallbackPool(
HandleBool: (dt, b) => { },
HandleDouble: (dt, d) =>
{
yield return (dt, d); //not allowed
});
while (await cursor.MoveNextAsync(token))
{
this.Parse(cursor.Current, cbpool);
}
}
private record CallbackPool(
Action<DateTime, bool> HandleBool,
Action<DateTime, double> HandleDouble
);
Then, the below Parse is just a behavior-equivalent of the original.
Random _rnd = new Random();
void Parse(object cursor, CallbackPool cbpool)
{
double d = this._rnd.NextDouble(); //d = [0..1)
if (d >= 0.5)
{
cbpool.HandleDouble(new DateTime(), d);
}
else if (d >= 0.25)
{
cbpool.HandleBool(new DateTime(), d >= 0.4);
}
}
However, I do like the GetSamplesAsync code, but the compiler does not: the yield cannot be used within a lambda.
So, I changed the function as follows, although it became much less readable (and also error-prone):
async IAsyncEnumerable<(DateTime, double)> GetSamplesAsync()
{
// ...
(DateTime, double) pair = default;
bool flag = false;
var cbpool = new CallbackPool(
HandleBool: (dt, b) => { },
HandleDouble: (dt, d) =>
{
pair = (dt, d);
flag = true;
});
while (await cursor.MoveNextAsync(token))
{
this.Parse(cursor.Current, cbpool);
if (flag)
{
yield return pair;
}
flag = false;
}
}
I wonder if there is a better way to solve this kind of pattern.

The external flag/pair is pretty dangerous and unnecessary (and it forces a closure); it seems like this bool could be returned from the Parse method, for example:
await foreach (var item in cursor)
{
if (Parse(item, cbpool, out var result))
yield return result;
}
(everything could also be returned via a value-tuple if you don't like the out)

Related

Make using statement usable for multiple disposable objects

I have a bunch of text files in a folder, and all of them should have identical headers. In other words the first 100 lines of all files should be identical. So I wrote a function to check this condition:
private static bool CheckHeaders(string folderPath, int headersCount)
{
var enumerators = Directory.EnumerateFiles(folderPath)
.Select(f => File.ReadLines(f).GetEnumerator())
.ToArray();
//using (enumerators)
//{
for (int i = 0; i < headersCount; i++)
{
foreach (var e in enumerators)
{
if (!e.MoveNext()) return false;
}
var values = enumerators.Select(e => e.Current);
if (values.Distinct().Count() > 1) return false;
}
return true;
//}
}
The reason I am using enumerators is memory efficiency. Instead of loading all file contents in memory I enumerate the files concurrently line-by-line until a mismatch is found, or all headers have been examined.
My problem is evident by the commented lines of code. I would like to utilize a using block to safely dispose all the enumerators, but unfortunately using (enumerators) doesn't compile. Apparently using can handle only a single disposable object. I know that I can dispose the enumerators manually, by wrapping the whole thing in a try-finally block, and running the disposing logic in a loop inside finally, but is seems awkward. Is there any mechanism I could employ to make the using statement a viable option in this case?
Update
I just realized that my function has a serious flaw. The construction of the enumerators is not robust. A locked file can cause an exception, while some enumerators have already been created. These enumerators will not be disposed. This is something I want to fix. I am thinking about something like this:
var enumerators = Directory.EnumerateFiles(folderPath)
.ToDisposables(f => File.ReadLines(f).GetEnumerator());
The extension method ToDisposables should ensure that in case of an exception no disposables are left undisposed.
You can create a disposable-wrapper over your enumerators:
class DisposableEnumerable : IDisposable
{
private IEnumerable<IDisposable> items;
public event UnhandledExceptionEventHandler DisposalFailed;
public DisposableEnumerable(IEnumerable<IDisposable> items) => this.items = items;
public void Dispose()
{
foreach (var item in items)
{
try
{
item.Dispose();
}
catch (Exception e)
{
var tmp = DisposalFailed;
tmp?.Invoke(this, new UnhandledExceptionEventArgs(e, false));
}
}
}
}
and use it with the lowest impact to your code:
private static bool CheckHeaders(string folderPath, int headersCount)
{
var enumerators = Directory.EnumerateFiles(folderPath)
.Select(f => File.ReadLines(f).GetEnumerator())
.ToArray();
using (var disposable = new DisposableEnumerable(enumerators))
{
for (int i = 0; i < headersCount; i++)
{
foreach (var e in enumerators)
{
if (!e.MoveNext()) return false;
}
var values = enumerators.Select(e => e.Current);
if (values.Distinct().Count() > 1) return false;
}
return true;
}
}
The thing is you have to dispose those objects separately one by one anyway. But it's up to you where to encapsulate that logic. And the code I've suggested has no manual try-finally,)
To the second part of the question. If I get you right this should be sufficient:
static class DisposableHelper
{
public static IEnumerable<TResult> ToDisposable<TSource, TResult>(this IEnumerable<TSource> source,
Func<TSource, TResult> selector) where TResult : IDisposable
{
var exceptions = new List<Exception>();
var result = new List<TResult>();
foreach (var i in source)
{
try { result.Add(selector(i)); }
catch (Exception e) { exceptions.Add(e); }
}
if (exceptions.Count == 0)
return result;
foreach (var i in result)
{
try { i.Dispose(); }
catch (Exception e) { exceptions.Add(e); }
}
throw new AggregateException(exceptions);
}
}
Usage:
private static bool CheckHeaders(string folderPath, int headersCount)
{
var enumerators = Directory.EnumerateFiles(folderPath)
.ToDisposable(f => File.ReadLines(f).GetEnumerator())
.ToArray();
using (new DisposableEnumerable(enumerators))
{
for (int i = 0; i < headersCount; i++)
{
foreach (var e in enumerators)
{
if (!e.MoveNext()) return false;
}
var values = enumerators.Select(e => e.Current);
if (values.Distinct().Count() > 1) return false;
}
return true;
}
}
and
try
{
CheckHeaders(folderPath, headersCount);
}
catch(AggregateException e)
{
// Prompt to fix errors and try again
}
I'm going to suggest an approach that uses recursive calls to Zip to allow parallel enumeration of a normal IEnumerable<string> without the need to resort to using IEnumerator<string>.
bool Zipper(IEnumerable<IEnumerable<string>> sources, int take)
{
IEnumerable<string> ZipperImpl(IEnumerable<IEnumerable<string>> ss)
=> (!ss.Skip(1).Any())
? ss.First().Take(take)
: ss.First().Take(take).Zip(
ZipperImpl(ss.Skip(1)),
(x, y) => (x == null || y == null || x != y) ? null : x);
var matching_lines = ZipperImpl(sources).TakeWhile(x => x != null).ToArray();
return matching_lines.Length == take;
}
Now build up your enumerables:
IEnumerable<string>[] enumerables =
Directory
.EnumerateFiles(folderPath)
.Select(f => File.ReadLines(f))
.ToArray();
Now it's simple to call:
bool headers_match = Zipper(enumerables, 100);
Here's a trace of running this code against three files with more than 4 lines:
Ben Petering at 5:28 PM ACST
Ben Petering at 5:28 PM ACST
Ben Petering at 5:28 PM ACST
From a call 2019-05-23, James mentioned he’d like the ability to edit the current shipping price rules (eg in shipping_rules.xml) via the admin.
From a call 2019-05-23, James mentioned he’d like the ability to edit the current shipping price rules (eg in shipping_rules.xml) via the admin.
From a call 2019-05-23, James mentioned he’d like the ability to edit the current shipping price rules (eg in shipping_rules.xml) via the admin.
He also mentioned he’d like to be able to set different shipping price rules for a given time window, e.g. Jan 1 to Jan 30.
He also mentioned he’d like to be able to set different shipping price rules for a given time window, e.g. Jan 1 to Jan 30.
He also mentioned he’d like to be able to set different shipping price rules for a given time window, e.g. Jan 1 to Jan 30.
These storyishes should be considered when choosing the appropriate module to use.
These storyishes should be considered when choosing the appropriate module to use.X
These storyishes should be considered when choosing the appropriate module to use.
Note that the enumerations stop when they encountered a mismatch header in the 4th line on the second file. All enumerations then stopped.
Creating an IDisposable wrapper as #Alex suggested is correct. It needs just a logic to dispose already opened files if some of them is locked and probably some logic for error states. Maybe something like this (error state logic is very simple):
public class HeaderChecker : IDisposable
{
private readonly string _folderPath;
private readonly int _headersCount;
private string _lockedFile;
private readonly List<IEnumerator<string>> _files = new List<IEnumerator<string>>();
public HeaderChecker(string folderPath, int headersCount)
{
_folderPath = folderPath;
_headersCount = headersCount;
}
public string LockedFile => _lockedFile;
public bool CheckFiles()
{
_lockedFile = null;
if (!TryOpenFiles())
{
return false;
}
if (_files.Count == 0)
{
return true; // Not sure what to return here.
}
for (int i = 0; i < _headersCount; i++)
{
if (!_files[0].MoveNext()) return false;
string currentLine = _files[0].Current;
for (int fileIndex = 1; fileIndex < _files.Count; fileIndex++)
{
if (!_files[fileIndex].MoveNext()) return false;
if (_files[fileIndex].Current != currentLine) return false;
}
}
return true;
}
private bool TryOpenFiles()
{
bool result = true;
foreach (string file in Directory.EnumerateFiles(_folderPath))
{
try
{
_files.Add(File.ReadLines(file).GetEnumerator());
}
catch
{
_lockedFile = file;
result = false;
break;
}
}
if (!result)
{
DisposeCore(); // Close already opened files.
}
return result;
}
private void DisposeCore()
{
foreach (var item in _files)
{
try
{
item.Dispose();
}
catch
{
}
}
_files.Clear();
}
public void Dispose()
{
DisposeCore();
}
}
// Usage
using (var checker = new HeaderChecker(folderPath, headersCount))
{
if (!checker.CheckFiles())
{
if (checker.LockedFile is null)
{
// Error while opening files.
}
else
{
// Headers do not match.
}
}
}
I also removed .Select() and .Distinct() when checking the lines. The first just iterates over the enumerators array - the same as foreach above it, so you are enumerating this array twice. Then creates a new list of lines and .Distinct() enumerates over it.

How to use IEnumerable.GroupBy comparing multiple properties between elements?

How do I group "adjacent" Sites:
Given data:
List<Site> sites = new List<Site> {
new Site { RouteId="A", StartMilepost=0.00m, EndMilepost=1.00m },
new Site { RouteId="A", StartMilepost=1.00m, EndMilepost=2.00m },
new Site { RouteId="A", StartMilepost=5.00m, EndMilepost=7.00m },
new Site { RouteId="B", StartMilepost=3.00m, EndMilepost=5.00m },
new Site { RouteId="B", StartMilepost=11.00m, EndMilepost=13.00m },
new Site { RouteId="B", StartMilepost=13.00m, EndMilepost=14.00m },
};
I want result:
[
[
Site { RouteId="A", StartMilepost=0.00m, EndMilepost=1.00m },
Site { RouteId="A", StartMilepost=1.00m, EndMilepost=2.00m }
],
[
Site { RouteId="A", StartMilepost=5.00m, EndMilepost=7.00m }
],
[
Site { RouteId="B", StartMilepost=3.00m, EndMilepost=5.00m }
],
[
Site { RouteId="B", StartMilepost=11.00m, EndMilepost=13.00m },
Site { RouteId="B", StartMilepost=13.00m, EndMilepost=14.00m }
]
]
I tried using GroupBy with a custom comparer function checking routeIds match and first site's end milepost is equal to the next sites start milepost. My HashKey function just checks out routeId so all sites within a route will get binned together but I think the comparer makes an assumption like if A = B, and B = C, then A = C, so C won't get grouped with A,B,C since in my adjacency case, A will not equal C.
First, let Site class be (for debugging / demonstration)
public class Site {
public Site() { }
public string RouteId;
public Decimal StartMilepost;
public Decimal EndMilepost;
public override string ToString() => $"{RouteId} {StartMilepost}..{EndMilepost}";
}
Well, as you can see we have to break the rules: equality must be transitive, i.e. whenever
A equals B
B equals C
then
A equals C
It's not the case in your example. However, if we sort the sites by StartMilepost we, technically, can implement IEqualityComparer<Site> like this:
public class MySiteEqualityComparer : IEqualityComparer<Site> {
public bool Equals(Site x, Site y) {
if (ReferenceEquals(x, y))
return true;
else if (null == x || null == y)
return false;
else if (x.RouteId != y.RouteId)
return false;
else if (x.StartMilepost <= y.StartMilepost && x.EndMilepost >= y.StartMilepost)
return true;
else if (y.StartMilepost <= x.StartMilepost && y.EndMilepost >= x.StartMilepost)
return true;
return false;
}
public int GetHashCode(Site obj) {
return obj == null
? 0
: obj.RouteId == null
? 0
: obj.RouteId.GetHashCode();
}
}
then GroupBy as usual; please, note that OrderBy is required, since order of comparision matters here. Suppose we have
A = {RouteId="X", StartMilepost=0.00m, EndMilepost=1.00m}
B = {RouteId="X", StartMilepost=1.00m, EndMilepost=2.00m}
C = {RouteId="X", StartMilepost=2.00m, EndMilepost=3.00m}
Here A == B, B == C (so in case of A, B, C all items will be in the same group) but A != C (and thus in A, C, B will end up with 3 groups)
Code:
List<Site> sites = new List<Site> {
new Site { RouteId="A", StartMilepost=0.00m, EndMilepost=1.00m },
new Site { RouteId="A", StartMilepost=1.00m, EndMilepost=2.00m },
new Site { RouteId="A", StartMilepost=5.00m, EndMilepost=7.00m },
new Site { RouteId="B", StartMilepost=3.00m, EndMilepost=5.00m },
new Site { RouteId="B", StartMilepost=11.00m, EndMilepost=13.00m },
new Site { RouteId="B", StartMilepost=13.00m, EndMilepost=14.00m },
};
var result = sites
.GroupBy(item => item.RouteId)
.Select(group => group
// Required Here, since MySiteEqualityComparer breaks the rules
.OrderBy(item => item.StartMilepost)
.GroupBy(item => item, new MySiteEqualityComparer())
.ToArray())
.ToArray();
// Let's have a look
var report = string.Join(Environment.NewLine, result
.Select(group => string.Join(Environment.NewLine,
group.Select(g => string.Join("; ", g)))));
Console.Write(report);
Outcome:
A 0.00..1.00; A 1.00..2.00
A 5.00..7.00
B 3.00..5.00
B 11.00..13.00; B 13.00..14.00
Here are a couple of implementations where order of Site's does not matter. You can use the LINQ Aggregate function:
return sites.GroupBy(x => x.RouteId)
.SelectMany(x =>
{
var groupedSites = new List<List<Site>>();
var aggs = x.Aggregate(new List<Site>(), (contiguous, next) =>
{
if (contiguous.Count == 0 || contiguous.Any(y => y.EndMilepost == next.StartMilepost))
{
contiguous.Add(next);
}
else if (groupedSites.Any(y => y.Any(z => z.EndMilepost == next.StartMilepost)))
{
var groupMatchIndex = groupedSites.FindIndex(y => y.Any(z => z.EndMilepost == next.StartMilepost));
var el = groupedSites.ElementAt(groupMatchIndex);
el.Add(next);
groupedSites[groupMatchIndex] = el;
}
else
{
groupedSites.Add(contiguous);
contiguous = new List<Site>();
contiguous.Add(next);
}
return contiguous;
}, final => { groupedSites.Add(final); return final; });
return groupedSites;
});
Alternatively, just with foreach:
return sites.GroupBy(x => x.RouteId)
.SelectMany(x =>
{
var groupedSites = new List<List<Site>>();
var aggList = new List<Site>();
foreach (var item in x)
{
if (aggList.Count == 0 || aggList.Any(y => y.EndMilepost == item.StartMilepost))
{
aggList.Add(item);
continue;
}
var groupMatchIndex = groupedSites.FindIndex(y => y.Any(z => z.EndMilepost == item.StartMilepost));
if (groupMatchIndex > -1)
{
var el = groupedSites.ElementAt(groupMatchIndex);
el.Add(item);
groupedSites[groupMatchIndex] = el;
continue;
}
groupedSites.Add(aggList);
aggList = new List<Site>();
aggList.Add(item);
}
groupedSites.Add(aggList);
return groupedSites;
});
I was surprised that GroupBy doesn't have overload with Func<..., bool> for in-place grouping without hassle of implementing custom class.
So I've created one:
public static IEnumerable<IEnumerable<T>> GroupBy<T>(this IEnumerable<T> source, Func<T, T, bool> func)
{
var items = new List<T>();
foreach (var item in source)
{
if (items.Count != 0)
if (!func(items[0], item))
{
yield return items;
items = new List<T>();
}
items.Add(item);
}
if (items.Count != 0)
yield return items;
}
The usage:
var result = sites.GroupBy((x, y) => x.RouteId == y.RouteId &&
x.StartMilepost <= y.EndMilepost && x.EndMilepost >= y.StartMilepost).ToList();
This should produce wanted result.
Few words about implementation. In above extension method you have to supply delegate which should return true if x and y should be grouped. The method is dumb and will simply compare adjacent items in same order as they come. Your input is ordered, but you may have to use OrderBy/ThenBy before using it with something else.
Here is an extension method for grouping lists of the specific class (Site). It is implemented with an inner iterator function GetGroup that produces one group with adjacent sites. This function is called in a while loop to produce all groups.
public static IEnumerable<IEnumerable<Site>> GroupAdjacent(
this IEnumerable<Site> source)
{
var ordered = source
.OrderBy(item => item.RouteId)
.ThenBy(item => item.StartMilepost);
IEnumerator<Site> enumerator;
bool finished = false;
Site current = null;
using (enumerator = ordered.GetEnumerator())
{
while (!finished)
{
yield return GetGroup();
}
}
IEnumerable<Site> GetGroup()
{
if (current != null) yield return current;
while (enumerator.MoveNext())
{
var previous = current;
current = enumerator.Current;
if (previous != null)
{
if (current.RouteId != previous.RouteId) yield break;
if (current.StartMilepost != previous.EndMilepost) yield break;
}
yield return current;
}
finished = true;
}
}
Usage example:
var allGroups = sites.GroupAdjacent();
foreach (var group in allGroups)
{
foreach (var item in group)
{
Console.WriteLine(item);
}
Console.WriteLine();
}
Output:
A 0,00..1,00
A 1,00..2,00
A 5,00..7,00
B 3,00..5,00
B 11,00..13,00
B 13,00..14,00

How to use an object from try in catch - c#

I want to use an object in catch block, which get me an exception in try block. I'm parsing some strings to int and need to catch the exception when it's impossible and see, what object was mistaken and in what line. Is that possible or not?
Some code dor example. Thanks.
static void Main(string[] args)
{
var result = Parse(new List<string>() { "3;5;7", "qwe;3;70" });
}
public static List<int[]> Parse(List<string> list)
{
try
{
return list.Select(str => str.Split(';'))
.Select(str => Tuple.Create(int.Parse(str[0]), int.Parse(str[1]), int.Parse(str[2])))
/// something happening
.ToList();
}
catch
{
//here in braces I want to know, which element was wrong
//"qwe" and whole line "qwe;3;70"
throw new FormatException($"Wrong line [{}]");
}
}
Declare the line and value item counters outside the try/catch block and increase them in the LINQ expression body:
public static List<int[]> Parse(List<string> list)
{
int line = 0;
int item = 0;
try
{
return list
.Select(str => {
line++;
item = 0;
return str
.Split(';')
.Select(i => { item++; return int.Parse(i); })
.ToArray();
})
.ToList();
}
catch
{
throw new FormatException($"Wrong line [{line}]; item [{item}]");
}
}
Demo: https://dotnetfiddle.net/uGtw7A
You need a reference to the object causing the exception. However as the instance lives only in the scope of the try-block you can´t access it any more (try and catch don´t share the same scope and thus can´t access the same variables) unless you´d declare the reference to that instance outside the try-bloc
As already mentioned in the comments you should use a normal foreach-loop to have access to the current line:
public static List<int[]> Parse(List<string> list)
{
var result = new List<int[]>();
foreach(var str in list)
{
try
{
var values = str.Split(';');
result.Add(Tuple.Create(
int.Parse(values[0]),
int.Parse(values[1]),
int.Parse(values[2]))
);
}
catch
{
//here in braces I want to know, which element was wrong
throw new FormatException($"Wrong line " + str");
}
}
return result;
}
However you can simply avoid all those exceptions by useing TryParse instead which returns false if parsing failed. So this boils down to something like this:
var values = str.Split(';');
int v0, v1, v2;
if(int.TryParse(values[0], out v0 &&
int.TryParse(values[1], out v1 &&
int.TryParse(values[2], out v2 &&))
result.Add(Tuple.Create(v0, v1, v2));
else
throw new FormatException($"Wrong line " + str");
I recommend manually looping through, splitting the data, checking you have enough elements, and then using TryParse on the numbers. I know this is a departure from using Linq, but it's the better way to do this with error checking:
public static List<int[]> Parse(List<string> list)
{
if (list == null)
{
throw new ArgumentNullException("list");
// you can use nameof(list) instead of "list" in newer versions of C#
}
List<int[]> result = new List<int[]>();
// Loop through the entries
for (int i = 0; i < list.Count; ++i)
{
// Be safe and check we don't have a null value
// I'm just skipping the 'bad' entries for now but
// you can throw an error, etc.
if (list[i] == null)
{
// do something about this? (an exception of your choosing, etc.)
continue;
}
// split the entry
string[] entryData = list[i].Split(';');
// check we have 3 items
if (entryData.Length != 3)
{
// do something about this?
continue;
}
// try to parse each item in turn
int a;
int b;
int c;
if (!int.TryParse(entryData[0], out a))
{
// do something about this?
continue;
}
if (!int.TryParse(entryData[1], out b))
{
// do something about this?
continue;
}
if (!int.TryParse(entryData[2], out c))
{
// do something about this?
continue;
}
// add to the results list
result.Add(new int[] { a, b, c });
}
// return the result
return result;
}
Scope is scope. Anything you define inside your try block and don't explicitly pass on is not going to be available in your catch block.
If you need this information you have to iterate manually over the list and try catch each attempt individually...
There are too many problems with your code, you're assuming that parameter list is not null and contains items that can be splitted in 3 strings, and that every string can be safely parsed to int.
If you not have all the above guaranties just check everything:
public static List<int[]> Parse(List<string> list)
{
if (list == null)
{
throw new ArgumentNullException(nameof(list));
}
var arrayOfStringArray = list
.Select(x => x.Split(';'))
.ToArray();
var resultList = new List<int[]>();
for (var i = 0; i < arrayOfStringArray.Length; i++)
{
var arrayOfString = arrayOfStringArray[i];
if (arrayOfString.Length != 3)
{
throw new InvalidOperationException("meaningfull message there!");
}
var arrayOfInt = new int[3];
for (var j = 0; j < arrayOfInt.Length; j++)
{
arrayOfInt[j] = TryParse(arrayOfString[j], i, j);
}
resultList.Add(arrayOfInt);
}
return resultList;
}
static int TryParse(string value, int line, int position)
{
int result;
if (!int.TryParse(value, out result))
{
throw new FormatException($"Item at position {line},{position} is invalid.");
}
return result;
}
I think that you just got a wrong approach here. Yes, using Tuple + Linq would be the laziest way to get your result but you can't generate custom errors as so.
Here is an example of how you can achieve something alike:
static void Main(string[] args)
{
var result = Parse(new List<string>() { "3;5;7", "qwe;3;70" });
}
public static List<Tuple<int, int, int>> Parse(List<string> list)
{
List<Tuple<int, int, int>> result = new List<Tuple<int, int, int>>();
int line = 0;
int errorCol = 0;
try
{
for (line = 0; line < list.Count; line++)
{
string[] curentLine = list[line].Split(';');
int result0, result1, result2;
errorCol = 1;
if (curentLine.Length > 0 && int.TryParse(curentLine[0], out result0))
errorCol = 2;
else
throw new Exception();
if (curentLine.Length > 1 && int.TryParse(curentLine[1], out result1))
errorCol = 3;
else
throw new Exception();
if (curentLine.Length > 2 && int.TryParse(curentLine[2], out result2))
result.Add(new Tuple<int, int, int>(result0, result1, result2));
else
throw new Exception();
}
return result;
}
catch
{
//here in braces I want to know, which element was wrong
throw new FormatException("Wrong line " + line + " col" + errorCol);
}
}
PS: Line and column start at 0 here.

Using LinqToSql without load base in memory

I have one problem. If i'm using LinqToSql, my program load my database in memory.
little example:
//pageNumber = 1; pageSize = 100;
var result =
(
from a in db.Stats.AsEnumerable()
where (DictionaryFilter(a, sourceDictionary) && DateFilter(a, beginTime, endTime) && ErrorFilter(a, WarnLevel))
select a
);
var size = result.Count(); // size = 1007
var resultList = result.Skip((pageNumber-1)*pageSize).Take(pageSize).ToList();
return resultList;
DictionaryFilter, DateFilter and ErrorFilter are functions that filter my datebase.
after this my program use ~250Mb of Ram.
if i dont use:
var size = result.Count();
My program use ~120MB Ram.
Before use this code, my program use ~35MB Ram.
How can I use count and take functions not loading all my datebase in memory?
static bool DateFilter(Stat table, DateTime begin, DateTime end)
{
if ((table.RecordTime >= begin.ToFileTime()) && (table.RecordTime <= end.ToFileTime()))
{
return true;
}
return false;
}
static bool ErrorFilter(Stat table, bool[] WarnLevel)
{
if (WarnLevel[table.WarnLevel]) return true;
else return false;
}
static bool DictionaryFilter(Stat table, Dictionary<GetSourcesNameResult, bool> sourceDictionary)
{
foreach (var word in sourceDictionary)
{
if (table.SourceName == word.Key.SourceName)
{
return word.Value;
}
}
//
return false;
}
Simple: don't use .AsEnumerable(). That means "switch to LINQ-to-Objects". Before that, db.Stats was IQueryable<T>, which is a composable API, and would do what you expect.
That, however, means that you can't use C# methods like DictionaryFilter and DateFilter, and must instead compose things in terms of the Expression API. If you can illustrate what they do I can probably advise further.
With your edit, the filtering can be tweaked, for example:
static IQueryable<Stat> ErrorFilter(IQueryable<Stat> source, bool[] WarnLevel) {
// extract the enabled indices (match to values)
int[] levels = WarnLevel.Select((val, index) => new { val, index })
.Where(pair => pair.val)
.Select(pair => pair.index).ToArray();
switch(levels.Length)
{
case 0:
return source.Where(x => false);
case 1:
int level = levels[0];
return source.Where(x => x.WarnLevel == level);
case 2:
int level0 = levels[0], level1 = levels[1];
return source.Where(
x => x.WarnLevel == level0 || x.WarnLevel == level1);
default:
return source.Where(x => levels.Contains(x.WarnLevel));
}
}
the date filter is simpler:
static IQueryable<Stat> DateFilter(IQueryable<Stat> source,
DateTime begin, DateTime end)
{
var from = begin.ToFileTime(), to = end.ToFileTime();
return source.Where(table => table.RecordTime >= from
&& table.RecordTime <= to);
}
and the dictionary is a bit like the levels:
static IQueryable<Stat> DictionaryFilter(IQueryable<Stat> source,
Dictionary<GetSourcesNameResult, bool> sourceDictionary)
{
var words = (from word in sourceDictionary
where word.Value
select word.Key.SourceName).ToArray();
switch (words.Length)
{
case 0:
return source.Where(x => false);
case 1:
string word = words[0];
return source.Where(x => x.SourceName == word);
case 2:
string word0 = words[0], word1 = words[1];
return source.Where(
x => x.SourceName == word0 || x.SourceName == word1);
default:
return source.Where(x => words.Contains(x.SourceName));
}
}
and:
IQueryable<Stat> result = db.Stats;
result = ErrorFilter(result, WarnLevel);
result = DateFiter(result, beginTime, endTime);
result = DictionaryFilter(result, sourceDictionary);
// etc - note we're *composing* a filter here
var size = result.Count(); // size = 1007
var resultList = result.Skip((pageNumber-1)*pageSize).Take(pageSize).ToList();
return resultList;
The point is we're now using IQueryable<T> and Expression exclusively.
The following SO question might explain things: Understanding .AsEnumerable in Linq To Sql
.AsEnumerable() loads the entire table.

Parallel.Foreach + yield return?

I want to process something using parallel loop like this :
public void FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
});
}
Ok, it works fine. But How to do if I want the FillLogs method return an IEnumerable ?
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
yield return cpt // KO, don't work
});
}
EDIT
It seems not to be possible... but I use something like this :
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
return computers.AsParallel().Select(cpt => cpt);
}
But where I put the cpt.Logs = cpt.GetRawLogs().ToList(); instruction
Short version - no, that isn't possible via an iterator block; the longer version probably involves synchronized queue/dequeue between the caller's iterator thread (doing the dequeue) and the parallel workers (doing the enqueue); but as a side note - logs are usually IO-bound, and parallelising things that are IO-bound often doesn't work very well.
If the caller is going to take some time to consume each, then there may be some merit to an approach that only processes one log at a time, but can do that while the caller is consuming the previous log; i.e. it begins a Task for the next item before the yield, and waits for completion after the yield... but that is again, pretty complex. As a simplified example:
static void Main()
{
foreach(string s in Get())
{
Console.WriteLine(s);
}
}
static IEnumerable<string> Get() {
var source = new[] {1, 2, 3, 4, 5};
Task<string> outstandingItem = null;
Func<object, string> transform = x => ProcessItem((int) x);
foreach(var item in source)
{
var tmp = outstandingItem;
// note: passed in as "state", not captured, so not a foreach/capture bug
outstandingItem = new Task<string>(transform, item);
outstandingItem.Start();
if (tmp != null) yield return tmp.Result;
}
if (outstandingItem != null) yield return outstandingItem.Result;
}
static string ProcessItem(int i)
{
return i.ToString();
}
I don't want to be offensive, but maybe there is a lack of understanding. Parallel.ForEach means that the TPL will run the foreach according to the available hardware in several threads. But that means, that ii is possible to do that work in parallel! yield return gives you the opportunity to get some values out of a list (or what-so-ever) and give them back one-by-one as they are needed. It prevents of the need to first find all items matching the condition and then iterate over them. That is indeed a performance advantage, but can't be done in parallel.
Although the question is old I've managed to do something just for fun.
class Program
{
static void Main(string[] args)
{
foreach (var message in GetMessages())
{
Console.WriteLine(message);
}
}
// Parallel yield
private static IEnumerable<string> GetMessages()
{
int total = 0;
bool completed = false;
var batches = Enumerable.Range(1, 100).Select(i => new Computer() { Id = i });
var qu = new ConcurrentQueue<Computer>();
Task.Run(() =>
{
try
{
Parallel.ForEach(batches,
() => 0,
(item, loop, subtotal) =>
{
Thread.Sleep(1000);
qu.Enqueue(item);
return subtotal + 1;
},
result => Interlocked.Add(ref total, result));
}
finally
{
completed = true;
}
});
int current = 0;
while (current < total || !completed)
{
SpinWait.SpinUntil(() => current < total || completed);
if (current == total) yield break;
current++;
qu.TryDequeue(out Computer computer);
yield return $"Completed {computer.Id}";
}
}
}
public class Computer
{
public int Id { get; set; }
}
Compared to Koray's answer this one really uses all the CPU cores.
You can use the following extension method
public static class ParallelExtensions
{
public static IEnumerable<T1> OrderedParallel<T, T1>(this IEnumerable<T> list, Func<T, T1> action)
{
var unorderedResult = new ConcurrentBag<(long, T1)>();
Parallel.ForEach(list, (o, state, i) =>
{
unorderedResult.Add((i, action.Invoke(o)));
});
var ordered = unorderedResult.OrderBy(o => o.Item1);
return ordered.Select(o => o.Item2);
}
}
use like:
public void FillLogs(IEnumerable<IComputer> computers)
{
cpt.Logs = computers.OrderedParallel(o => o.GetRawLogs()).ToList();
}
Hope this will save you some time.
How about
Queue<string> qu = new Queue<string>();
bool finished = false;
Task.Factory.StartNew(() =>
{
Parallel.ForEach(get_list(), (item) =>
{
string itemToReturn = heavyWorkOnItem(item);
lock (qu)
qu.Enqueue(itemToReturn );
});
finished = true;
});
while (!finished)
{
lock (qu)
while (qu.Count > 0)
yield return qu.Dequeue();
//maybe a thread sleep here?
}
Edit:
I think this is better:
public static IEnumerable<TOutput> ParallelYieldReturn<TSource, TOutput>(this IEnumerable<TSource> source, Func<TSource, TOutput> func)
{
ConcurrentQueue<TOutput> qu = new ConcurrentQueue<TOutput>();
bool finished = false;
AutoResetEvent re = new AutoResetEvent(false);
Task.Factory.StartNew(() =>
{
Parallel.ForEach(source, (item) =>
{
qu.Enqueue(func(item));
re.Set();
});
finished = true;
re.Set();
});
while (!finished)
{
re.WaitOne();
while (qu.Count > 0)
{
TOutput res;
if (qu.TryDequeue(out res))
yield return res;
}
}
}
Edit2: I agree with the short No answer. This code is useless; you cannot break the yield loop.

Categories