Search in a List<DataRow>? - c#

I have a List which I create from a DataTabe which only has one column in it. Lets say the column is called MyColumn. Each element in the list is an object array containing my columns, in this case, only one (MyColumn). Whats the most elegant way to check if that object array contains a certain value?

var searchValue = SOME_VALUE;
var result = list.Where(row => row["MyColumn"].Equals(searchValue)); // returns collection of DataRows containing needed value
var resultBool = list.Any(row => row["MyColumn"].Equals(searchValue)); // checks, if any DataRows containing needed value exists

If you should make this search often, I think it's not convenient to write LINQ-expression each time. I'd write extension-method like this:
private static bool ContainsValue(this List<DataRow> list, object value)
{
return list.Any(dataRow => dataRow["MyColumn"].Equals(value));
}
And after that make search:
if (list.ContainsValue("Value"))

http://dotnetperls.com/list-find-methods has something about exists & find.

Well, it depends on what version C# and .NET you are on, for 3.5 you could do it with LINQ:
var qualiyfyingRows =
from row in rows
where Equals(row["MyColumn"], value)
select row;
// We can see if we found any at all through.
bool valueFound = qualifyingRows.FirstOrDefault() != null;
That will give you both the rows that match and a bool that tells you if you found any at all.
However if you don't have LINQ or the extension methods that come with it you will have to search the list "old skool":
DataRow matchingRow = null;
foreach (DataRow row in rows)
{
if (Equals(row["MyColumn"], value))
{
matchingRow = row;
break;
}
}
bool valueFound = matchingRow != null;
Which will give you the first row that matches, it can obviously be altered to find all the rows that match, which would make the two examples more or less equal.
The LINQ version has a major difference though, the IEnumerable you get from it is deferred, so the computation will not be done until you actually enumerate it's members. I do not know enough about DataRow or your application to know if this can be a problem or not, but it was a problem in a piece of my code that dealt with NHibernate. Basically I was enumerating a sequence which members where no longer valid.
You can create your own deferred IEnumerables easily through the iterators in C# 2.0 and higher.

I may have misread this but it seems like the data is currently in a List<object[]> and not in a datatable so to get the items that match a certain criteria you could do something like:
var matched = items.Where(objArray => objArray.Contains(value));
items would be your list of object[]:s and matched would be an IEnumerable[]> with the object[]:s with the value in.

Related

Optimum way to validate DataTable for duplicate or invalid fields in a specific column with LINQ

I am trying to find the best way to determine if a DataTable
Contains duplicate data in a specific column
or
If the fields within said column are not found in an external Dictionary<string, string> and the resulting value matches a string literal.
This is what I've come up with:
List<string> dtSKUsColumn = _dataTable.Select()
.Select(x => x.Field<string("skuColumn"))
.ToList();
bool hasError = dtSKUsColumn.Distinct().Count() != dtSKUsColumn.Count() ||
!_dataTable.AsEnumerable()
.All(r => allSkuTypes
.Any(s => s.Value == "normalSKU" &&
s.Key == r.Field<string>("skuColumn")));
allSkuTypes is a Dictionary<string, string> where the key is the SKU itself, and the value is the SKU type.
I cannot just operate on a 'distinct' _dataTable, because there is a column that must contain identical fields (Said column cannot be removed and inferred, since I need to preserve the state of _dataTable).
So my question:
Am I handling this in the best possible way, or is there a simpler and faster method?
UPDATE:
The DataTable is not obtained via an SQL query, rather it is generated by a set of rules from an spreadsheet or csv. I have to make do with only the allSKuTypes and _dataTable objects as my only 'outside information.'
Your solution is not optimal.
Let N = _dataTable.Rows.Count and M = allSkuTypes.Count. Your algorithm has O(2 * N) space complexity (the memory allocated by ToList and Disctinct calls) and O(N * M) time complexity (due to linear search in the allSkuTypes for each _dataTable record).
Here is IMO the optimal solution. It uses single pass over the _dataTable records, a HashSet<string> for detecting the duplicates and TryGetValue method of the Dictionary for checking the second rule, thus ending up with O(N) space and time complexity:
var dtSkus = new HashSet<string>();
bool hasError = false;
foreach (var row in _dataTable.AsEnumerable())
{
var sku = row.Field<string>("skuColumn");
string type;
if (!dtSkus.Add(sku) || !allSkuTypes.TryGetValue(sku, out type) || type != "normalSKU")
{
hasError = true;
break;
}
}
The additional benefit is that you have the row with the broken rule and the code can easily be modified to take different actions depending of the which rule is broken, collect/count only the first or all invalid records etc.

Loop through table of datacontext

What I would like to do is, loop through a datacontext and for each table found, select two different rows and compare the individual columns and see if the rows are equal.
So far I've made a method to compare the values of two rows and return true if all values of the rows are equal.
Now I would like to put this method into a foreach loop along the lines outline below:
using (DataClassesDataContext db = new DataClassesDataContext(Utillities.dbconnection))
{
foreach (Table t in db)
{
var row1 = from r1 in t where r1.id == constraint1 select;
var row2 = from r2 in t where r2.id == constraint2 select;
bool compResult = CompareRows(row1, row2);
}
}
But I don't know how to construct the foreach loop, so I can make the above selections :(
I've tried db.Mapping.GetTables(), but I can't see how this gets me closer - I can only get the table-names in the datacontext, not the tables themselves. Are there a way to get a table entity from a string containing the tablename? Or am I missing something (likely something obvious)?
Any help or hints with the above foreach loop will be much appreciated.
This will not work unless you implement CompareRows for every combination of te possible types available. You cannot pas an anonymous type.
You can use this approach to get all the tables/columns
http://blogs.msdn.com/b/jomo_fisher/archive/2007/07/30/linq-to-sql-trick-get-all-table-names.aspx
I would create dynamic sql statements and use DB.Executequery insted

Filtering a string list with logic?

I have a list of strings. I need to be able to filter them in a similar way to a Google query.
Ex: NOT water OR (ice AND "fruit juice")
Meaning return strings that do not have the word water or return strings that can have water if they have ice and "fruit juice".
Is there a mechanism in .NET that can allow the user to write queries in this form (say in a textbox) and given a List or IEnumerable of string, return the ones that contain this.
Can LINQ maybe do something like this?
I am aware that I can do this with LINQ, I'm more concerned with parsing an arbitrary string into an executable expression.
There is nothing built in.
You will need to parse such a string yourself and possibly use the Expression classes to build up an executable expression from which to filter.
For this query: Meaning return strings that do not have the word water or return strings that can have water if they have ice and "fruit juice".
Try something like this if you are going to use LINQ
yourList.Where(i => !i.Contains("water") ||
(i.Contains("water") &&
i.Contains("ice") &&
i.Contains("fruit juice")));
I think we can't answer you using a code sample since as you said this is more logical.
what I would do in such a scenario is to have predefined conditions (saved somewhere) which will contain all the conditions you need along with their "coding translations" like:
and
or
and not
and or
etc...
and what you will do at runtime is to translate these conditions/criteria into sql or linq or whatever language you need to pass it to.
In LINQ: list.Where(item => !item.Contains("water") || (item.Contains("ice") && item.Contains("fruit juice")))
You can try to use the DataTable.Select method:
public static class ExpressionExtensions {
public static IEnumerable<T> Select<T>(this IEnumerable<T> self, string expression) {
var table = new DataTable();
table.Columns.Add("Value", typeof(T));
foreach (var item in self) {
var row = table.NewRow();
row["Value"] = item;
table.Rows.Add(item);
}
return table.Select(expression).Select(row => (T)row["Value"]);
}
}
But you have to follow its format to create your expression:
var filtered = strings.Select("NOT Value LIKE '*water*' OR (Value LIKE '*ice*' AND Value LIKE '*fruit juice*')");
Also note that in this case, since the string fruit juice already contains the string ice, the second condition is redundant. You'd have to find a way to express "words" and not "substrings".
In the end, you might be better off implementing the parsing logic yourself.

IEnumerable<T>.Union(IEnumerable<T>) overwrites contents instead of unioning

I've got a collection of items (ADO.NET Entity Framework), and need to return a subset as search results based on a couple different criteria. Unfortunately, the criteria overlap in such a way that I can't just take the collection Where the criteria are met (or drop Where the criteria are not met), since this would leave out or duplicate valid items that should be returned.
I decided I would do each check individually, and combine the results. I considered using AddRange, but that would result in duplicates in the results list (and my understanding is it would enumerate the collection every time - am I correct/mistaken here?). I realized Union does not insert duplicates, and defers enumeration until necessary (again, is this understanding correct?).
The search is written as follows:
IEnumerable<MyClass> Results = Enumerable.Empty<MyClass>();
IEnumerable<MyClass> Potential = db.MyClasses.Where(x => x.Y); //Precondition
int parsed_key;
//For each searchable value
foreach(var selected in SelectedValues1)
{
IEnumerable<MyClass> matched = Potential.Where(x => x.Value1 == selected);
Results = Results.Union(matched); //This is where the problem is
}
//Ellipsed....
foreach(var selected in SelectedValuesN) //Happens to be integer
{
if(!int.TryParse(selected, out parsed_id))
continue;
IEnumerable<MyClass> matched = Potential.Where(x => x.ValueN == parsed_id);
Results = Results.Union(matched); //This is where the problem is
}
It seems, however, that Results = Results.Union(matched) is working more like Results = matched. I've stepped through with some test data and a test search. The search asks for results where the first field is -1, 0, 1, or 3. This should return 4 results (two 0s, a 1 and a 3). The first iteration of the loops works as expected, with Results still being empty. The second iteration also works as expected, with Results containing two items. After the third iteration, however, Results contains only one item.
Have I just misunderstood how .Union works, or is there something else going on here?
Because of deferred execution, by the time you eventually consume Results, it is the union of many Where queries all of which are based on the last value of selected.
So you have
Results = Potential.Where(selected)
.Union(Potential.Where(selected))
.Union(potential.Where(selected))...
and all the selected values are the same.
You need to create a var currentSelected = selected inside your loop and pass that to the query. That way each value of selected will be captured individually and you won't have this problem.
You can do this much more simply:
Reuslts = SelectedValues.SelectMany(s => Potential.Where(x => x.Value == s));
(this may return duplicates)
Or
Results = Potential.Where(x => SelectedValues.Contains(x.Value));
As pointed out by others, your LINQ expression is a closure. This means your variable selected is captured by the LINQ expression in each iteration of your foreach-loop. The same variable is used in each iteration of the foreach, so it will end up having whatever the last value was. To get around this, you will need to declare a local variable within the foreach-loop, like so:
//For each searchable value
foreach(var selected in SelectedValues1)
{
var localSelected = selected;
Results = Results.Union(Potential.Where(x => x.Value1 == localSelected));
}
It is much shorter to just use .Contains():
Results = Results.Union(Potential.Where(x => SelectedValues1.Contains(x.Value1)));
Since you need to query multiple SelectedValues collections, you could put them all inside their own collection and iterate over that as well, although you'd need some way of matching the correct field/property on your objects.
You could possibly do this by storing your lists of selected values in a Dictionary with the name of the field/property as the key. You would use Reflection to look up the correct field and perform your check. You could then shorten the code to the following:
// Store each of your searchable lists here
Dictionary<string, IEnumerable<MyClass>> DictionaryOfSelectedValues = ...;
Type t = typeof(MyType);
// For each list of searchable values
foreach(var selectedValues in DictionaryOfSelectedValues) // Returns KeyValuePair<TKey, TValue>
{
// Try to get a property for this key
PropertyInfo prop = t.GetProperty(selectedValues.Key);
IEnumerable<MyClass> localSelected = selectedValues.Value;
if( prop != null )
{
Results = Results.Union(Potential.Where(x =>
localSelected.Contains(prop.GetValue(x, null))));
}
else // If it's not a property, check if the entry is for a field
{
FieldInfo field = t.GetField(selectedValues.Key);
if( field != null )
{
Results = Results.Union(Potential.Where(x =>
localSelected.Contains(field.GetValue(x, null))));
}
}
}
No, your use of union is absoloutely correct.
The only thing to keep in mind is it excludes duplicates as based on the equality operator. Do you have sample data?
Okay, I think you are are haveing a problem because Union uses deferred execution.
What happens if you do,
var unionResults = Results.Union(matched).ToList();
Results = unionResults;

C# return generic list of objects using linq

i got a generic list that looks like this:
List<PicInfo> pi = new List<PicInfo>();
PicInfo is a class that looks like this:
[ProtoContract]
public class PicInfo
{
[ProtoMember(1)]
public string fileName { get; set; }
[ProtoMember(2)]
public string completeFileName { get; set; }
[ProtoMember(3)]
public string filePath { get; set; }
[ProtoMember(4)]
public byte[] hashValue { get; set; }
public PicInfo() { }
}
what i'm trying to do is:
first, filter the list with duplicate file names and return the duplicate objects;
than, filter the returned list with duplicate hash value's;
i can only find examples on how to do this which return anonymous types. but i need it to be a generic list.
if someone can help me out, I'd appreciate it. also please explain your code. it's a learning process for me.
thanks in advance!
[EDIT]
the generic list contains a list of objects. these objects are pictures. every picture has a file name, hash value (and some more data which is irrelevant at this point). some pictures have the same name (duplicate file names). and i want to get a list of the duplicate file names from this generic list 'pi'.
But those pictures also have a hash value. from the file names that are identical, i want another list of those identical files names that also have identical hash values.
[/EDIT]
Something like this should work. Whether it is the best method I am not sure. It is not very efficient because for each element you are iterating through the list again to get the count.
List<PicInfo> pi = new List<PicInfo>();
IEnumerable<PicInfo> filt = pi.Where(x=>pi.Count(z=>z.FileName==x.FileName)>1);
I hope the code isn't too complicated to need explaining. I always think its best to work it out on your own anyway but if anythign is confusing then just ask and I'll explain.
If you want the second filter to be filtering for the same filename and same hash being a duplicate then you just need to extend the lambda in the Count to check against hash too.
Obviously if you just want filenames at the end then it is easy enough to do a Select to get just an enumerable list of those filenames, possibly with a Distinct if you only want them to appear once.
NB. Code written by hand so do forgive typos. May not compile first time, etc. ;-)
Edit to explain code - spoilers! ;-)
In english what we want to do is the following:
for each item in the list we want to select it if and only if there is more than one item in the list with the same filename.
Breaking this down to iterate over the list and select things based on a criteria we use the Where method. The condition of our where method is
there is more than one item in the list with the same filename
for this we clearly need to count the list so we use pi.Count. However we have a condition that we are only counting if the filename matches so we pass in an expression to tell it only to count those things.
The expression will work on each item of the list and return true if we want to count it and false if we don't want to.
The filename we are interested in is on x, the item we are filtering. So we want to count how many items have a filename the same as x.FileName. Thus our expression is z=>z.FileName==x.FileName. So z is our variable in this expression and x.FileName in this context is unchanging as we iterate over z.
We then of course put our criteria in of >1 to get the boolean value we want.
If you wanted those that are duplicates when considering the filename and hashvalue then you would expand the part in the Count to be z=>z.FileName==x.FileName && z.hashValue==x.hashValue.
So your final code to get the distinct on both values would be:
List pi = new List();
List filt = pi.Where(x=>pi.Count(z=>z.FileName==x.FileName && z.hashValue==x.hashValue)>1).ToList();
If you wanted those that are duplicates when considering the filename and hashvalue then you would expand the part in the Count to compare the hashValue as well. Since this is an array you will want to use the SequenceEqual method to compare them value by value.
So your final code to get the distinct on both values would be:
List<PicInfo> pi = new List<PicInfo>();
List<PicInfo> filt = pi.Where(x=>pi.Count(z=>z.FileName==x.FileName && z.hashValue.SequenceEqual(x.hashValue))>1).ToList();
Note that I didn't create the intermediary list and just went straight from the original list. You could go from the intermediate list but the code would be much the same if going from the original as from a filtered list.
I think, you have to use SequenceEqual method for finding dublicate
(http://msdn.microsoft.com/ru-ru/library/bb348567.aspx).
For filter use
var p = pi.GroupBy(rs => rs.fileName) // group by name
.Where(rs => rs.Count() > 1) // find group whose count greater than 1
.Select(rs => rs.First()) // select 1st element from each group
.GroupBy(rs => rs.hashValue) // now group by hash value
.Where(rs => rs.Count() > 1) // find group has multiple values
.Select(rs => rs.First()) // select first element from group
.ToList<PicInfo>() // make the list of picInfo of result

Categories