IEnumerable<T>.Union(IEnumerable<T>) overwrites contents instead of unioning

IEnumerable<T>.Union(IEnumerable<T>) overwrites contents instead of unioning - c#

I've got a collection of items (ADO.NET Entity Framework), and need to return a subset as search results based on a couple different criteria. Unfortunately, the criteria overlap in such a way that I can't just take the collection Where the criteria are met (or drop Where the criteria are not met), since this would leave out or duplicate valid items that should be returned.
I decided I would do each check individually, and combine the results. I considered using AddRange, but that would result in duplicates in the results list (and my understanding is it would enumerate the collection every time - am I correct/mistaken here?). I realized Union does not insert duplicates, and defers enumeration until necessary (again, is this understanding correct?).
The search is written as follows:
IEnumerable<MyClass> Results = Enumerable.Empty<MyClass>();
IEnumerable<MyClass> Potential = db.MyClasses.Where(x => x.Y); //Precondition
int parsed_key;
//For each searchable value
foreach(var selected in SelectedValues1)
{
IEnumerable<MyClass> matched = Potential.Where(x => x.Value1 == selected);
Results = Results.Union(matched); //This is where the problem is
}
//Ellipsed....
foreach(var selected in SelectedValuesN) //Happens to be integer
{
if(!int.TryParse(selected, out parsed_id))
continue;
IEnumerable<MyClass> matched = Potential.Where(x => x.ValueN == parsed_id);
Results = Results.Union(matched); //This is where the problem is
}
It seems, however, that Results = Results.Union(matched) is working more like Results = matched. I've stepped through with some test data and a test search. The search asks for results where the first field is -1, 0, 1, or 3. This should return 4 results (two 0s, a 1 and a 3). The first iteration of the loops works as expected, with Results still being empty. The second iteration also works as expected, with Results containing two items. After the third iteration, however, Results contains only one item.
Have I just misunderstood how .Union works, or is there something else going on here?

Because of deferred execution, by the time you eventually consume Results, it is the union of many Where queries all of which are based on the last value of selected.
So you have
Results = Potential.Where(selected)
.Union(Potential.Where(selected))
.Union(potential.Where(selected))...
and all the selected values are the same.
You need to create a var currentSelected = selected inside your loop and pass that to the query. That way each value of selected will be captured individually and you won't have this problem.

You can do this much more simply:
Reuslts = SelectedValues.SelectMany(s => Potential.Where(x => x.Value == s));
(this may return duplicates)
Or
Results = Potential.Where(x => SelectedValues.Contains(x.Value));

As pointed out by others, your LINQ expression is a closure. This means your variable selected is captured by the LINQ expression in each iteration of your foreach-loop. The same variable is used in each iteration of the foreach, so it will end up having whatever the last value was. To get around this, you will need to declare a local variable within the foreach-loop, like so:
//For each searchable value
foreach(var selected in SelectedValues1)
{
var localSelected = selected;
Results = Results.Union(Potential.Where(x => x.Value1 == localSelected));
}
It is much shorter to just use .Contains():
Results = Results.Union(Potential.Where(x => SelectedValues1.Contains(x.Value1)));
Since you need to query multiple SelectedValues collections, you could put them all inside their own collection and iterate over that as well, although you'd need some way of matching the correct field/property on your objects.
You could possibly do this by storing your lists of selected values in a Dictionary with the name of the field/property as the key. You would use Reflection to look up the correct field and perform your check. You could then shorten the code to the following:
// Store each of your searchable lists here
Dictionary<string, IEnumerable<MyClass>> DictionaryOfSelectedValues = ...;
Type t = typeof(MyType);
// For each list of searchable values
foreach(var selectedValues in DictionaryOfSelectedValues) // Returns KeyValuePair<TKey, TValue>
{
// Try to get a property for this key
PropertyInfo prop = t.GetProperty(selectedValues.Key);
IEnumerable<MyClass> localSelected = selectedValues.Value;
if( prop != null )
{
Results = Results.Union(Potential.Where(x =>
localSelected.Contains(prop.GetValue(x, null))));
}
else // If it's not a property, check if the entry is for a field
{
FieldInfo field = t.GetField(selectedValues.Key);
if( field != null )
{
Results = Results.Union(Potential.Where(x =>
localSelected.Contains(field.GetValue(x, null))));
}
}
}

No, your use of union is absoloutely correct.
The only thing to keep in mind is it excludes duplicates as based on the equality operator. Do you have sample data?

Okay, I think you are are haveing a problem because Union uses deferred execution.
What happens if you do,
var unionResults = Results.Union(matched).ToList();
Results = unionResults;

Related

Iterating through two Lists and checking the item value of the first list to the item values of the second list with a lambda expression

So in the program I'm trying to run I receive two lists, one with objects that contain an id in string format (looks something like "bb_b1203322") and one list with the id's(which in this place is only named "b1203322" for reasons unknown) and a description of the actually id's meaning.
var forms = await _tRepository.GetAllFormsAsync(lastUpdate);
var formDefinitions = await _deRepository.GetAllFormDefintionsAsync();
foreach (var form in forms)
{
foreach (var def in formDefinitions)
{
if (form.SetupFormName.Contains(def.BLKID))
form.SetupFormName = def.DESCR;
}
}
return forms;
Now this piece of code does exactly what I want it to, but I'd rather have it as a lambda expression because ... reasons :)
Now I've tried several different things but with my current knowledge of lambda expressions I can't get it to work.

Try this code. Note that you can use it if formDefinitions with suitable DESCR always exists.
forms.ForEach(f => f.SetupFormName = formDefinitions.FirstOrDefault(fd =>
f.SetupFormName.Contains(fd.DESCR)).DESCR);

This code uses a bit of LINQ to find the definition:
foreach(var form in forms)
{
var def = formDefinitions.FirstOrDefault(x => form.SetupFormName.Contains(x.DESCR));
if(def != null)
form.SetupFormName = def.DESCR
}
As you can see, it's not really saving all that much code.
Please note:
As Jon correctly comments, the behavior of this code is a bit different from your original one. This code uses the first occurrence if there are multiple and your code uses the last occurrence.
If this is actually a use case for your code, replace FirstOrDefault with LastOrDefault.
Extending the code above, you can do something like this:
foreach(var tuple in forms.Select(x => new { Form = x,
Definition =
formDefinitions.FirstOrDefault(y =>
x.SetupFormName.Contains(y.DESCR)) })
.Where(x => x.Definition != null))
{
tuple.Form.SetupFormName = tuple.Definition.DESCR;
}
But as you can see, this gets messy real quick.

Linq to Entities, match where value contains one or more strings

I have a data set I need to filter on a string value. If a property contains a specific string, the item is selected. That works fine
I need to change this to allow for a LIST of strings to test against
I could iterate objects and loop through selected values, storing matches in seperate list, but feel that there should be a better way.
Hope someone has a good example on how to acomplish this
//Gets a set of addresses, objects have several properties, one of them beeing (example):
// o.ZipCity ="1000 Copenhagen"
List<AddresObjectType> result = getAllAddresses();
// Example : 1000,2000
var listOfZip = context.Request["zip"].Split(Convert.ToChar(","));
//Current code, just one value
result = result.Where(t => t.ZipCity.Contains(context.Request["zip"])).ToList();
//Code I need...
//IF any of the passed values are matched then include
result = result.Where(t => t.ZipCity.Contains(listOfZip)).ToList();
SO desiered effect :
- Requested values "1000,2000,3000" (one to many values)
- Result set includes all that has a ZipCity value that contains at least one of the values

You could try this one:
result = result.Where(t => listofZip.Contains(t.ZipCity)).ToList();
The listOfZip will contain the values of 1000, 2000, 3000 and you try to get all the cities whose zip is one of them.
Update
result = result.Where(t => listofZip.Any(zip=>t.ZipCity.Contains(zip)).ToList();
The Any extension method return true if there is any element in the listofZip that satisfies the predicate:
zip=>t.ZipCity.Contains(zip)
or false if there isn't any.
What does the predicate checks?
It checks if the current zip, is contained in the ZipCity. If so returns true. Otherwise, it returns false.

How do I make a streaming LINQ expression that delivers the filtered out items as well as the filtered items?

I am transforming an Excel spreadsheet into a list of "Elements" (this is a domain term). During this transformation, I need to skip the header rows and throw out malformed rows that cannot be transformed.
Now comes the fun part. I need to capture those malformed records so that I can report on them. I constructed a crazy LINQ statement (below). These are extension methods hiding the messy LINQ operations on the types from the OpenXml library.
var elements = sheet
.Rows() <-- BEGIN sheet data transform
.SkipColumnHeaders()
.ToRowLookup()
.ToCellLookup()
.SkipEmptyRows() <-- END sheet data transform
.ToElements(strings) <-- BEGIN domain transform
.RemoveBadRecords(out discard)
.OrderByCompositeKey();
The interesting part starts at ToElements, where I transform the row lookup to my domain object list (details: it's called an ElementRow, which is later transformed into an Element). Bad records are created with just a key (the Excel row index) and are uniquely identifiable vs. a real element.
public static IEnumerable<ElementRow> ToElements(this IEnumerable<KeyValuePair<UInt32Value, Cell[]>> map)
{
return map.Select(pair =>
{
try
{
return ElementRow.FromCells(pair.Key, pair.Value);
}
catch (Exception)
{
return ElementRow.BadRecord(pair.Key);
}
});
}
Then, I want to remove those bad records (it's easier to collect all of them before filtering). That method is RemoveBadRecords, which started like this...
public static IEnumerable<ElementRow> RemoveBadRecords(this IEnumerable<ElementRow> elements)
{
return elements.Where(el => el.FormatId != 0);
}
However, I need to report the discarded elements! And I don't want to muddy my transform extension method with reporting. So, I went to the out parameter (taking into account the difficulties of using an out param in an anonymous block)
public static IEnumerable<ElementRow> RemoveBadRecords(this IEnumerable<ElementRow> elements, out List<ElementRow> discard)
{
var temp = new List<ElementRow>();
var filtered = elements.Where(el =>
{
if (el.FormatId == 0) temp.Add(el);
return el.FormatId != 0;
});
discard = temp;
return filtered;
}
And, lo! I thought I was hardcore and would have this working in one shot...
var discard = new List<ElementRow>();
var elements = data
/* snipped long LINQ statement */
.RemoveBadRecords(out discard)
/* snipped long LINQ statement */
discard.ForEach(el => failures.Add(el));
foreach(var el in elements)
{
/* do more work, maybe add more failures */
}
return new Result(elements, failures);
But, nothing was in my discard list at the time I looped through it! I stepped through the code and realized that I successfully created a fully-streaming LINQ statement.
The temp list was created
The Where filter was assigned (but not yet run)
And the discard list was assigned
Then the streaming thing was returned
When discard was iterated, it contained no elements, because the elements weren't iterated over yet.
Is there a way to fix this problem using the thing I constructed? Do I have to force an iteration of the data before or during the bad record filter? Is there another construction that I've missed?
Some Commentary
Jon mentioned that the assignment /was/ happening. I simply wasn't waiting for it. If I check the contents of discard after the iteration of elements, it is, in fact, full! So, I don't actually have an assignment problem. Unless I take Jon's advice on what's good/bad to have in a LINQ statement.

When the statement was actually iterated, the Where clause ran and temp filled up, but discard was never assigned again!
It doesn't need to be assigned again - the existing list which will have been assigned to discard in the calling code will be populated.
However, I'd strongly recommend against this approach. Using an out parameter here is really against the spirit of LINQ. (If you iterate over your results twice, you'll end up with a list which contains all the bad elements twice. Ick!)
I'd suggest materializing the query before removing the bad records - and then you can run separate queries:
var allElements = sheet
.Rows()
.SkipColumnHeaders()
.ToRowLookup()
.ToCellLookup()
.SkipEmptyRows()
.ToElements(strings)
.ToList();
var goodElements = allElements.Where(el => el.FormatId != 0)
.OrderByCompositeKey();
var badElements = allElements.Where(el => el.FormatId == 0);
By materializing the query in a List<>, you only process each row once in terms of ToRowLookup, ToCellLookup etc. It does mean you need to have enough memory to keep all the elements at a time, of course. There are alternative approaches (such as taking an action on each bad element while filtering it) but they're still likely to end up being fairly fragile.
EDIT: Another option as mentioned by Servy is to use ToLookup, which will materialize and group in one go:
var lookup = sheet
.Rows()
.SkipColumnHeaders()
.ToRowLookup()
.ToCellLookup()
.SkipEmptyRows()
.ToElements(strings)
.OrderByCompositeKey()
.ToLookup(el => el.FormatId == 0);
Then you can use:
foreach (var goodElement in lookup[false])
{
...
}
and
foreach (var badElement in lookup[true])
{
...
}
Note that this performs the ordering on all elements, good and bad. An alternative is to remove the ordering from the original query and use:
foreach (var goodElement in lookup[false].OrderByCompositeKey())
{
...
}
I'm not personally wild about grouping by true/false - it feels like a bit of an abuse of what's normally meant to be a key-based lookup - but it would certainly work.

Search in a List<DataRow>?

I have a List which I create from a DataTabe which only has one column in it. Lets say the column is called MyColumn. Each element in the list is an object array containing my columns, in this case, only one (MyColumn). Whats the most elegant way to check if that object array contains a certain value?

var searchValue = SOME_VALUE;
var result = list.Where(row => row["MyColumn"].Equals(searchValue)); // returns collection of DataRows containing needed value
var resultBool = list.Any(row => row["MyColumn"].Equals(searchValue)); // checks, if any DataRows containing needed value exists

If you should make this search often, I think it's not convenient to write LINQ-expression each time. I'd write extension-method like this:
private static bool ContainsValue(this List<DataRow> list, object value)
{
return list.Any(dataRow => dataRow["MyColumn"].Equals(value));
}
And after that make search:
if (list.ContainsValue("Value"))

http://dotnetperls.com/list-find-methods has something about exists & find.

Well, it depends on what version C# and .NET you are on, for 3.5 you could do it with LINQ:
var qualiyfyingRows =
from row in rows
where Equals(row["MyColumn"], value)
select row;
// We can see if we found any at all through.
bool valueFound = qualifyingRows.FirstOrDefault() != null;
That will give you both the rows that match and a bool that tells you if you found any at all.
However if you don't have LINQ or the extension methods that come with it you will have to search the list "old skool":
DataRow matchingRow = null;
foreach (DataRow row in rows)
{
if (Equals(row["MyColumn"], value))
{
matchingRow = row;
break;
}
}
bool valueFound = matchingRow != null;
Which will give you the first row that matches, it can obviously be altered to find all the rows that match, which would make the two examples more or less equal.
The LINQ version has a major difference though, the IEnumerable you get from it is deferred, so the computation will not be done until you actually enumerate it's members. I do not know enough about DataRow or your application to know if this can be a problem or not, but it was a problem in a piece of my code that dealt with NHibernate. Basically I was enumerating a sequence which members where no longer valid.
You can create your own deferred IEnumerables easily through the iterators in C# 2.0 and higher.

I may have misread this but it seems like the data is currently in a List<object[]> and not in a datatable so to get the items that match a certain criteria you could do something like:
var matched = items.Where(objArray => objArray.Contains(value));
items would be your list of object[]:s and matched would be an IEnumerable[]> with the object[]:s with the value in.

Union two List in C#

I want to union, merge in a List that contains both references, so this is my code, how can I define a list ready for this porpouses?
if (e.CommandName == "AddtoSelected")
{
List<DetalleCita> lstAux = new List<DetalleCita>();
foreach (GridViewRow row in this.dgvEstudios.Rows)
{
var GridData = GetValues(row);
var GridData2 = GetValues(row);
IList AftList2 = GridData2.Values.Where(r => r != null).ToList();
AftList2.Cast<DetalleCita>();
chkEstudio = dgvEstudios.Rows[index].FindControl("ChkAsignar") as CheckBox;
if (chkEstudio.Checked)
{
IList AftList = GridData.Values.Where(r => r != null).ToList();
lstAux.Add(
new DetalleCita
{
codigoclase = Convert.ToInt32(AftList[0]),
nombreestudio = AftList[1].ToString(),
precioestudio = Convert.ToDouble(AftList[2]),
horacita = dt,
codigoestudio = AftList[4].ToString()
});
}
index++;
//this line to merge
lstAux.ToList().AddRange(AftList2);
}
dgvEstudios.DataSource = lstAux;
dgvEstudios.DataBind();
}
this is inside a rowcommand event.

If you want to add all entries from AftList2 to lstAux you should define AftList2 as IEnumerable<> with elements of type DetalleCita (being IEnumerable<DetalleCita> is enough to be used as parameter of AddRange() on List<DetalleCita>). For example like this:
var AftList2 = GridData2.Values.Where(r => r != null).Cast<DetalleCita>();
And then you can add all its elements to lstAux:
lstAux.AddRange(AftList2);
Clarification:
I think you are misunderstanding what extension method ToList() does. It creates new list from IEnumerable<T> and its result is not connected with original IEnumerable<T> that it is applied to.
That is why you are just do nothing useful trying to do list.ToList().AddRange(...) - you are copying list to (another newly created by ToList()) list, update it and then basically throwing away it (because you are not even doing something like list2 = var1.ToList(), original var1 stays unchanged after that!!! you most likely want to save result of ToList() if you are calling it).
Also you don't usually need to convert one list to another list, ToList() is useful when you need list (List<T>) but have IEnumerable<T> (that is not indexable and you may need fast access by index, or lazy evaluates but you need all results calculated at this time -- both situations may arise while trying to use result of LINQ to objects query for example: IEnumerable<int> ints = from i in anotherInts where i > 20 select i; -- even if anotherInts was List<int> result of query ints cannot be cast to List<int> because it is not list but implementation of IEnumerable<int>. In this case you could use ToList() to get list anyway: List<int> ints = (from i in anotherInts where i > 20 select i).ToList();).
UPDATE:
If you really mean union semantics (e.g. for { 1, 2 } and { 1, 3 } union would be something like { 1, 2, 3 }, with no duplication of equal elements from two collections) consider switching to HashSet<T> (it most likely available in your situation 'cause you are using C# 3.0 and I suppose yoou have recent .NET framework) or use Union() extension method instead of AddRange (I don't think this is better than first solution and be careful because it works more like ToList() -- a.Union(b) return new collection and does NOT updates either a or b).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

IEnumerable<T>.Union(IEnumerable<T>) overwrites contents instead of unioning - c#

You can do this much more simply: Reuslts = SelectedValues.SelectMany(s => Potential.Where(x => x.Value == s)); (this may return duplicates) Or Results = Potential.Where(x => SelectedValues.Contains(x.Value));

No, your use of union is absoloutely correct. The only thing to keep in mind is it excludes duplicates as based on the equality operator. Do you have sample data?

Okay, I think you are are haveing a problem because Union uses deferred execution. What happens if you do, var unionResults = Results.Union(matched).ToList(); Results = unionResults;

Related

Iterating through two Lists and checking the item value of the first list to the item values of the second list with a lambda expression

Linq to Entities, match where value contains one or more strings

How do I make a streaming LINQ expression that delivers the filtered out items as well as the filtered items?

Search in a List<DataRow>?

Union two List in C#

Categories

Resources