Join two List<T> with foreach loop - c#

I am attempting to move slightly away from LINQ which has proven very useful overall, but also quite difficult to read at times.
I used to use LINQ to perform joins (full outer join) but would prefer to do so using for/foreach loops for their simplicity. I just converted one LINQ statement (not PLINQ) into a nested foreach loop and the performance took a severe hit. What used to take seconds is now taking around a minute, see code below.
foreach (var p in PortfolioELT)
{
double meanloss;
double expvalue;
double stddevc;
double stddevi;
bool matched = false;
foreach (var a in AccountELT)
{
if (a.eventid == p.eventid)
{ DO SOME MATH HERE <-----
Any ideas on either
Why this is slower than LINQ Join and
How can I speed it up?
The program fairly obviously does what it needs to, but is too slow.
EDIT:
OLD CODE FULL
public static ConcurrentList<Event> CreateNewELTSUB(IList<Event> AccountELT, IList<Event> PortfolioELT)
{
if (AccountELT == null)
{
return (ConcurrentList<Event>)PortfolioELT;
}
else
{
//Subtract the Account ELT from the Portfolio ELT
var newELT = from p in PortfolioELT
join a in AccountELT
on p.eventid equals a.eventid into g
from e in g.DefaultIfEmpty()
select new
{
EventID = p.eventid,
Rate = p.rate,
meanloss = p.meanloss - (e == null ? 0d : e.meanloss),
expValue = p.expValue - (e == null ? 0d : e.expValue),
stddevc = Math.Sqrt(Math.Pow(p.stddevc, 2) - (e == null ? 0d : Math.Pow(e.stddevc, 2))),
stddevi = Math.Sqrt(Math.Pow(p.stddevi, 2) - (e == null ? 0d : Math.Pow(e.stddevi, 2)))
};
ConcurrentList<Event> list = new ConcurrentList<Event>();
foreach (var x in newELT)
{
list.Add(new Event(x.meanloss, x.EventID, x.expValue, x.Rate, x.stddevc, x.stddevi));
}
return list;
}
}
NEW CODE FULL:
public static ConcurrentList<Event> CreateNewELTSUB(IList<Event> AccountELT, IList<Event> PortfolioELT)
{
if (AccountELT == null)
{
return (ConcurrentList<Event>)PortfolioELT;
}
else
{
//Subtract the Account ELT from the Portfolio ELT
ConcurrentList<Event> newlist = new ConcurrentList<Event>();
//Outer Join on Portfolio ELT
foreach (var p in PortfolioELT)
{
double meanloss;
double expvalue;
double stddevc;
double stddevi;
bool matched = false;
foreach (var a in AccountELT)
{
if (a.eventid == p.eventid)
{
matched = true;
meanloss = p.meanloss - a.meanloss;
expvalue = p.expValue - a.expValue;
stddevc = Math.Sqrt((Math.Pow(p.stddevc, 2)) - (Math.Pow(a.stddevc, 2)));
stddevi = Math.Sqrt((Math.Pow(p.stddevi, 2)) - (Math.Pow(a.stddevi, 2)));
newlist.Add(new Event(meanloss, p.eventid, expvalue, p.rate, stddevc, stddevi));
}
else if (a.eventid != p.eventid) //Outer Join on Account
{
newlist.Add(a);
}
}
if (!matched)
{
newlist.Add(p);
}
}
return newlist;
}

Why this is slower than LINQ Join and
Im skipping answering this on purpose
How can I speed it up?
You're looping over the entire AccountELT collection for every PortfolioELT. You should loop one, and have the other converted to a Dictionary to make finding a specifiec record easier. Something like:
var accountELTIdx = AccountELT.ToDictionary(k => k.eventid);
then
foreach (var p in PortfolioELT)
{
double meanloss;
double expvalue;
double stddevc;
double stddevi;
bool matched = false;
if(accountELTIdx.ContainsKey(p.eventid)
{
var acct = accountELTIdx[p.eventid];
// some maths
}
....

You're creating local variables every iteration, that may or may not ever be used.
double meanloss;
double expvalue;
double stddevc;
double stddevi;
bool matched = false;
You are doing a linear search for matching event id's, if just 1 of the Lists is ordered by "eventid" you could use binary search instead of the wasted effort of a full linear search.
foreach (var a in AccountELT)
{
if (a.eventid == p.eventid)

Related

Split a list of objects into sub-lists of contiguous elements using LINQ?

I have a simple class Item:
public class Item
{
public int Start { get; set;}
public int Stop { get; set;}
}
Given a List<Item> I want to split this into multiple sublists of contiguous elements. e.g. a method
List<Item[]> GetContiguousSequences(Item[] items)
Each element of the returned list should be an array of Item such that list[i].Stop == list[i+1].Start for each element
e.g.
{[1,10], [10,11], [11,20], [25,30], [31,40], [40,45], [45,100]}
=>
{{[1,10], [10,11], [11,20]}, {[25,30]}, {[31,40],[40,45],[45,100]}}
Here is a simple (and not guaranteed bug-free) implementation that simply walks the input data looking for discontinuities:
List<Item[]> GetContiguousSequences(Item []items)
{
var ret = new List<Item[]>();
var i1 = 0;
for(var i2=1;i2<items.Length;++i2)
{
//discontinuity
if(items[i2-1].Stop != items[i2].Start)
{
var num = i2 - i1;
ret.Add(items.Skip(i1).Take(num).ToArray());
i1 = i2;
}
}
//end of array
ret.Add(items.Skip(i1).Take(items.Length-i1).ToArray());
return ret;
}
It's not the most intuitive implementation and I wonder if there is a way to have a neater LINQ-based approach. I was looking at Take and TakeWhile thinking to find the indices where discontinuities occur but couldn't see an easy way to do this.
Is there a simple way to use IEnumerable LINQ algorithms to do this in a more descriptive (not necessarily performant) way?
I set of a simple test-case here: https://dotnetfiddle.net/wrIa2J
I'm really not sure this is much better than your original, but for the purpose of another solution the general process is
Use Select to project a list working out a grouping
Use GroupBy to group by the above
Use Select again to project the grouped items to an array of Item
Use ToList to project the result to a list
public static List<Item[]> GetContiguousSequences2(Item []items)
{
var currIdx = 1;
return items.Select( (item,index) => new {
item = item,
index = index == 0 || items[index-1].Stop == item.Start ? currIdx : ++currIdx
})
.GroupBy(x => x.index, x => x.item)
.Select(x => x.ToArray())
.ToList();
}
Live example: https://dotnetfiddle.net/mBfHru
Another way is to do an aggregation using Aggregate. This means maintaining a final Result list and a Curr list where you can aggregate your sequences, adding them to the Result list as you find discontinuities. This method looks a little closer to your original
public static List<Item[]> GetContiguousSequences3(Item []items)
{
var res = items.Aggregate(new {Result = new List<Item[]>(), Curr = new List<Item>()}, (agg, item) => {
if(!agg.Curr.Any() || agg.Curr.Last().Stop == item.Start) {
agg.Curr.Add(item);
} else {
agg.Result.Add(agg.Curr.ToArray());
agg.Curr.Clear();
agg.Curr.Add(item);
}
return agg;
});
res.Result.Add(res.Curr.ToArray()); // Remember to add the last group
return res.Result;
}
Live example: https://dotnetfiddle.net/HL0VyJ
You can implement ContiguousSplit as a corutine: let's loop over source and either add item into current range or return it and start a new one.
private static IEnumerable<Item[]> ContiguousSplit(IEnumerable<Item> source) {
List<Item> current = new List<Item>();
foreach (var item in source) {
if (current.Count > 0 && current[current.Count - 1].Stop != item.Start) {
yield return current.ToArray();
current.Clear();
}
current.Add(item);
}
if (current.Count > 0)
yield return current.ToArray();
}
then if you want materialization
List<Item[]> GetContiguousSequences(Item []items) => ContiguousSplit(items).ToList();
Your solution is okay. I don't think that LINQ adds any simplification or clarity in this situation. Here is a fast solution that I find intuitive:
static List<Item[]> GetContiguousSequences(Item[] items)
{
var result = new List<Item[]>();
int start = 0;
while (start < items.Length) {
int end = start + 1;
while (end < items.Length && items[end].Start == items[end - 1].Stop) {
end++;
}
int len = end - start;
var a = new Item[len];
Array.Copy(items, start, a, 0, len);
result.Add(a);
start = end;
}
return result;
}

Calculate sum of a column and display in a label

I am using this query to calculate the sum of amount and display it in a label. Just because I am using in string it doesn't display values? Any suggestions?
string a;
var query = from r in dt.AsEnumerable()
where r.Field<string>("Code") == strCode
select decimal.Parse(
r.Field<string>("Amount")
.Replace("$", "")
.Replace(",", "")
);
if (query.Count() == 0)
{
a = "0";
}
else
{
foreach (var item in query)
{
a = item.ToString();
}
}
return a;
Parsing dollar amounts
The NumberStyles enumeration from System.Globalization can be used to parse dollar amounts into decimals:
decimal.Parse(r.Field<string>("Amount"), NumberStyles.Currency)
See also Problem parsing currency text to decimal type.
Summing dollar amounts
Also, your for-loop isn't actually adding any of the amounts, it's just assigning each amount to the a variable, so that a is only equal to the last amount at the end:
foreach (var item in query)
{
a = item.ToString();
}
return a;
What you really want to do is to add the amounts:
decimal a = 0;
foreach (var item in query)
{
a = a + item;
// Or even shorter:
a += item;
}
return a.ToString();
Summing with LINQ
But LINQ provides the method Sum() which can be used to replace the for-loop altogether. Also, when Sum() is called on an empty-set of decimals, it already returns 0, so you don't need your if block that checks if the count is 0:
// Don't need this IF block
if (query.Count() == 0)
{
a = "0";
}
return query.Sum().ToString();
Altogther now...
So putting it all together, you get the code below:
// Top of file...
using System.Globalization;
// In your method...
var query =
from r in dt.AsEnumerable()
where r.Field<string>("Code") == strCode
select decimal.Parse(r.Field<string>("Amount"), NumberStyles.Currency);
return query.Sum().ToString();
Try this,
You need to use Sum to get the total amount
string TotalSum = "";
var query = from r in dt.AsEnumerable()
where r.Field<string>("Code") == strCode
select decimal.Parse(r.Field<string>("Amount"), System.Globalization.NumberStyles.Any);
if (query.Count() > 0)
{
TotalSum = string.Format("{0:C}", query.Sum());
}
Label1.Text = TotalSum;

iterating through IEnumerable<string> causing serious performance issue

I am clue less about what has happend to performance of for loop when i tried to iterate through IEnumerable type.
Following is the code that cause serious performance issue
foreach (IEdge ed in edcol)
{
IEnumerable<string> row =
from r in dtRow.AsEnumerable()
where (((r.Field<string>("F1") == ed.Vertex1.Name) &&
(r.Field<string>("F2") == ed.Vertex2.Name))
|| ((r.Field<string>("F1") == ed.Vertex2.Name) &&
(r.Field<string>("F2") == ed.Vertex1.Name)))
select r.Field<string>("EdgeId");
int co = row.Count();
//foreach (string s in row)
//{
//}
x++;
}
The upper foreach(IEdge ed in edcol) has about 11000 iteration to complete.
It runs in fraction of seconds if i remove the line
int co = row.Count();
from the code.
The row.Count() have maximum value of 10 in all loops.
If i Uncomment the
//foreach (string s in row)
//{
//}
it goes for about 10 minutes to complete the execution of code.
Does IEnumerable type have such a serious performance issues.. ??
This answer is for the implicit question of "how do I make this much faster"? Apologies if that's not actually what you were after, but...
You can go through the rows once, grouping by the names. (I haven't done the ordering like Marc has - I'm just looking up twice when querying :)
var lookup = dtRow.AsEnumerable()
.ToLookup(r => new { F1 = r.Field<string>("F1"),
F2 = r.Field<string>("F2") });
Then:
foreach (IEdge ed in edcol)
{
// Need to check both ways round...
var first = new { F1 = ed.Vertex1.Name, F2 = ed.Vertex2.Name };
var second = new { F1 = ed.Vertex2.Name, F2 = ed.Vertex1.Name };
var firstResult = lookup[first];
var secondResult = lookup[second];
// Due to the way Lookup works, this is quick - much quicker than
// calling query.Count()
var count = firstResult.Count() + secondResult.Count();
var query = firstResult.Concat(secondResult);
foreach (var row in query)
{
...
}
}
At the moment you have O(N*M) performance, which could be probematic if both N and M are large. I would be inclined to pre-compute some of the DataTable info. For example, we could try:
var lookup = dtRows.AsEnumerable().ToLookup(
row => string.Compare(row.Field<string>("F1"),row.Field<string>("F2"))<0
? Tuple.Create(row.Field<string>("F1"), row.Field<string>("F2"))
: Tuple.Create(row.Field<string>("F2"), row.Field<string>("F1")),
row => row.Field<string>("EdgeId"));
then we can iterate that:
foreach(IEdge ed in edCol)
{
var name1 = string.Compare(ed.Vertex1.Name,ed.Vertex2.Name) < 0
? ed.Vertex1.Name : ed.Vertex2.Name;
var name2 = string.Compare(ed.Vertex1.Name,ed.Vertex2.Name) < 0
? ed.Vertex2.Name : ed.Vertex1.Name;
var matches = lookup[Tuple.Create(name1,name2)];
// ...
}
(note I enforced ascending alphabetical pairs in there, for convenience)

C# Lookup ptimisation suggestion are welcome

I have the code below which works for the purpose of what I need but I have an idea that it could be made faster. Please let me know if this code can be improved in any way...
The main issue is that I need to query "data" several time. I just need to make sure that there is no shortcut that I could have used instead.
data= GetData()// this return ILookup<Tuple(string, string, string),string>
foreach (var v0 in data)
{
if (v0.Key.Item3 == string.Empty)
{
//Get all related data
var tr_line = data[v0.Key];
sb.AppendLine(tr_line.First());
foreach (var v1 in data)
{
if (v1.Key.Item2 == string.Empty && v1.Key.Item1 == v0.Key.Item1)
{
var hh_line = data[v1.Key];
sb.AppendLine(hh_line.First());
foreach (var v2 in data)
{
if (v2.Key.Item1 == v0.Key.Item1 && v2.Key.Item2 != string.Empty && v2.Key.Item3 != string.Empty)
{
var hl_sl_lines = data[v2.Key].OrderByDescending(r => r);
foreach (var v3 in hl_sl_lines)
{
sb.AppendLine(v3);
}
}
}
}
}
}
}
Neater, more linq:
var data = GetData();
foreach (var v0 in data)
{
if (v0.Key.Item3 != string.Empty) continue;
//Get all related data
var tr_line = data[v0.Key];
sb.AppendLine(tr_line.First());
var hhLines = from v1 in data
where v1.Key.Item2 == string.Empty &&
v1.Key.Item1 == v0.Key.Item1
select data[v1.Key];
foreach (var hh_line in hhLines)
{
sb.AppendLine(hh_line.First());
var grouping = v0;
var enumerable = from v2 in data
where v2.Key.Item1 == grouping.Key.Item1 &&
v2.Key.Item2 != string.Empty &&
v2.Key.Item3 != string.Empty
select data[v2.Key].OrderByDescending(r => r)
into hl_sl_lines from v3 in hl_sl_lines select v3;
foreach (var v3 in enumerable)
{
sb.AppendLine(v3);
}
}
}
First of all, try to avoid using Tuple for this kind of code, because, even to you, a few months from now, this code will be incomprehensible. Make a class, or even better, an immutable struct with the correct property names. Even the fastest code is worthless if it is not maintainable.
That said, you have three nested loops that iterate the same collection. It would be plausible that a sorted collection will perform faster, as you will need to compare only with adjacent items.
Please try to explain what you are trying to accomplish, so someone would try to offer more specific help.

How do I compare items from a list to all others without repetition?

I have a collection of objects (lets call them MyItem) and each MyItem has a method called IsCompatibleWith which returns a boolean saying whether it's compatible with another MyItem.
public class MyItem
{
...
public bool IsCompatibleWith(MyItem other) { ... }
...
}
A.IsCompatibleWith(B) will always be the same as B.IsCompatibleWith(A). If for example I have a collection containing 4 of these, I am trying to find a LINQ query that will run the method on each distinct pair of items in the same collection. So if my collection contains A, B, C and D I wish to do the equivalent of:
A.IsCompatibleWith(B); // A & B
A.IsCompatibleWith(C); // A & C
A.IsCompatibleWith(D); // A & D
B.IsCompatibleWith(C); // B & C
B.IsCompatibleWith(D); // B & D
C.IsCompatibleWith(D); // C & D
The code initially used was:
var result = from item in myItems
from other in myItems
where item != other &&
item.IsCompatibleWith(other)
select item;
but of course this will still do both A & B and B & A (which is not required and not efficient). Also it's probably worth noting that in reality these lists will be a lot bigger than 4 items, hence the desire for an optimal solution.
Hopefully this makes sense... any ideas?
Edit:
One possible query -
MyItem[] items = myItems.ToArray();
bool compatible = (from item in items
from other in items
where
Array.IndexOf(items, item) < Array.IndexOf(items, other) &&
!item.IsCompatibleWith(other)
select item).FirstOrDefault() == null;
Edit2: In the end switched to using the custom solution from LukeH as it was more efficient for bigger lists.
public bool AreAllCompatible()
{
using (var e = myItems.GetEnumerator())
{
var buffer = new List<MyItem>();
while (e.MoveNext())
{
if (buffer.Any(item => !item.IsCompatibleWith(e.Current)))
return false;
buffer.Add(e.Current);
}
}
return true;
}
Edit...
Judging by the "final query" added to your question, you need a method to determine if all the items in the collection are compatible with each other. Here's how to do it reasonably efficiently:
bool compatible = myItems.AreAllItemsCompatible();
// ...
public static bool AreAllItemsCompatible(this IEnumerable<MyItem> source)
{
using (var e = source.GetEnumerator())
{
var buffer = new List<MyItem>();
while (e.MoveNext())
{
foreach (MyItem item in buffer)
{
if (!item.IsCompatibleWith(e.Current))
return false;
}
buffer.Add(e.Current);
}
}
return true;
}
Original Answer...
I don't think there's an efficient way to do this using only the built-in LINQ methods.
It's easy enough to build your own though. Here's an example of the sort of code you'll need. I'm not sure exactly what results you're trying to return so I'm just writing a message to the console for each compatible pair. It should be easy enough to change it to yield the results that you need.
using (var e = myItems.GetEnumerator())
{
var buffer = new List<MyItem>();
while (e.MoveNext())
{
foreach (MyItem item in buffer)
{
if (item.IsCompatibleWith(e.Current))
{
Console.WriteLine(item + " is compatible with " + e.Current);
}
}
buffer.Add(e.Current);
}
}
(Note that although this is reasonably efficient, it does not preserve the original ordering of the collection. Is that an issue in your situation?)
this should do it:
var result = from item in myItems
from other in myItems
where item != other &&
myItems.indexOf(item) < myItems.indexOf(other) &&
item.IsCompatibleWith(other)
select item;
But i dont know if it makes it faster, because in the query has to check the indices of the rows each row.
Edit:
if you have an index in myItem you should use that one instead of indexOf. And you can remove the "item != other" from the where clause, little bit redundant now
Here's an idea:
Implement IComparable so that your MyItem becomes sortable, then run this linq-query:
var result = from item in myItems
from other in myItems
where item.CompareTo(other) < 0 &&
item.IsCompatibleWith(other)
select item;
If your MyItem collection is small enough, you can storage the results of item.IsCompatibleWith(otherItem) in a boolean array:
var itemCount = myItems.Count();
var compatibilityTable = new bool[itemCount, itemCount];
var itemsToCompare = new List<MyItem>();
var i = 0;
var j = 0;
foreach (var item in myItems)
{
j = 0;
foreach (var other in itemsToCompare)
{
compatibilityTable[i,j] = item.IsCompatibleWith(other);
compatibilityTable[j,i] = compatibilityTable[i,j];
j++;
}
itemsToCompare.Add(item);
i++;
}
var result = myItems.Where((item, i) =>
{
var compatible = true;
var j = 0;
while (compatible && j < itemCount)
{
compatible = compatibilityTable[i,j];
}
j++;
return compatible;
}
So, we have
IEnumerable<MyItem> MyItems;
To get all the combinations we could use a function like this.
//returns all the k sized combinations from a list
public static IEnumerable<IEnumerable<T>> Combinations<T>(IEnumerable<T> list,
int k)
{
if (k == 0) return new[] {new T[0]};
return list.SelectMany((l, i) =>
Combinations(list.Skip(i + 1), k - 1).Select(c => (new[] {l}).Concat(c))
);
}
We can then apply this function to our problem like this.
var combinations = Combinations(MyItems, 2).Select(c => c.ToList<MyItem>());
var result = combinations.Where(c => c[0].IsCompatibleWith(c[1]))
This will perform IsCompatableWith on all the combinations without repetition.
You could of course perform the the checking inside the Combinations functions. For further work you could make the Combinations function into an extention that takes a delegate with a variable number of parameters for several lengths of k.
EDIT: As I suggested above, if you wrote these extension method
public static class Extenesions
{
IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> list, int k)
{
if (k == 0) return new[] { new T[0] };
return list.SelectMany((l, i) =>
list.Skip(i + 1).Combinations<T>(k - 1)
.Select(c => (new[] { l }).Concat(c)));
}
IEnumerable<Tuple<T, T>> Combinations<T> (this IEnumerable<T> list,
Func<T, T, bool> filter)
{
return list.Combinations(2).Where(c =>
filter(c.First(), c.Last())).Select(c =>
Tuple.Create<T, T>(c.First(), c.Last()));
}
}
Then in your code you could do the rather more elegant (IMO)
var compatibleTuples = myItems.Combinations(a, b) => a.IsCompatibleWith(b)))
then get at the compatible items with
foreach(var t in compatibleTuples)
{
t.Item1 // or T.item2
}

Categories