Validation in DataTable C# - c#

I have a DataTable with 20 columns and 25000 Rows. There is a column called URL and a column Language.
I need to ensure that all same URLs have the same Language.
Presently I have achieved this by following steps
Get all distinct (unique) URLs
Created a foreach loop on URLs and create a DataView (filtered on the URL)
Now in the dataview I can check if all values in the Language columns are the same.
List<string> all_Distinct_Urls = helperFunction.DataTableToList(master_table, "URL");
foreach (var url in all_Distinct_Urls)
{
if (!string.IsNullOrEmpty(url))
{
DataView dv = new DataView(master_table);
dv.RowFilter = "[URL] = '" + url + "'";
DataTable temp_MasterTable = dv.ToTable();
List<string> all_languages = helperFunction.DataTableToList(temp_MasterTable, "Language");
if (all_languages.Count > 1)
{
Assert.Fail();
}
}
public List<string> DataTableToList(DataTable masterDataTable, string columnName, bool isDistinct = true)
{
List<string> list = new List<string>();
foreach (DataRow dataRow in masterDataTable.Rows)
{
string ID = dataRow[columnName].ToString().Trim();
list.Add(ID);
}
if (isDistinct)
{
list = list.Distinct().ToList();
}
return list;
}
But the problem is that this is consuming a lot of time, given the number of rows and column. Is there any faster way to achieve this?

I would use LINQ. I'm sure this approach will be a lot faster:
var invalidUrlLanguageGroups = master_table.AsEnumerable()
.GroupBy(r => r.Field<string>("Url"))
.Where(g => g.Select(r => r.Field<string>("Language")).Distinct().Skip(1).Any())
.ToList();
I groups by the url and then selects all distinct languages and echecks if theres more than one.
Testcase:
var master_table = new DataTable();
master_table.Columns.Add("Url");
master_table.Columns.Add("Language");
master_table.Rows.Add("/en-us/sample-page1", "english");
master_table.Rows.Add("/en-us/sample-page1", "german"); // fail
master_table.Rows.Add("/de-de/sample-page2", "german");
master_table.Rows.Add("/en-de/sample-page2", "english");
Note that the query collects all invalid urls and their DataRows. If you want an even more efficient query that only determines if there's at least one(to make the test fail), use:
bool anyInvalidUrlLanguageGroups = master_table.AsEnumerable()
.GroupBy(r => r.Field<string>("Url"))
.Any(g => g.Select(r => r.Field<string>("Language")).Distinct().Skip(1).Any());
how about if I want to validate that all columns are the same, not
just the Language column? So if the URL is the same then all column values should be the same
Well, then this method would be helpful to check if all columns(for each url-group) are equal. You can use it in many other cases too, so would be a good candidate for an extension:
public static bool AllItemsEqual<T>(IEnumerable<IEnumerable<T>> allSequences, IEqualityComparer<T> comparer = null)
{
if (comparer == null) comparer = EqualityComparer<T>.Default;
IEnumerable<T> first = null;
foreach(IEnumerable<T> items in allSequences)
{
if (first == null)
first = items;
else
{
if (!items.SequenceEqual(first, comparer))
return false;
}
}
return true;
}
You will use it then in this way:
List<string> columnsExceptUrl = master_table.Columns.Cast<DataColumn>()
.Select(c => c.ColumnName)
.Where(n => n != "Url")
.ToList();
var urlRowsWithDifferentColumns = master_table.AsEnumerable()
.GroupBy(r => r.Field<string>("Url"))
.Where(g => !AllItemsEqual(g.Select(r => columnsExceptUrl.Select(c => r[c].ToString()))))
.ToList();
again, if you just want to know if it fails, you can make it more efficient:
bool anyUrlRowsWithDifferentColumns = master_table.AsEnumerable()
.GroupBy(r => r.Field<string>("Url"))
.Any(g => !AllItemsEqual(g.Select(r => columnsExceptUrl.Select(c => r[c].ToString()))));

Related

C# linq query - Where with multiple ANDs

I am trying to query using EF. The user can use up to 3 search words but they are not required. How do I write an EF query that will work as an AND for all of the search words that are used but be able to remove an AND for any search word that is empty?
Example, I want the following to return the first two elements in the array for s1='mobile', s2='', and s3='laptop'. It's not returning any. It should return the first two if s2 is changed to s2='burke'.
Example:
using System;
using System.Linq;
public class Simple {
public static void Main() {
string[] names = { "Burke laptop mobile", "laptop burke mobile", "Computer Laptop",
"Mobile", "Ahemed", "Sania",
"Kungada", "David","United","Sinshia" };
//search words
string s1 = "mobile";
string s2 = "";
string s3 = "laptop";
var query = from s in names
where (!string.IsNullOrEmpty(s1) && s.ToLower().Contains(s1))
&& (!string.IsNullOrEmpty(s2) && s.ToLower().Contains(s2))
&& (!string.IsNullOrEmpty(s3) && s.ToLower().Contains(s3))
orderby s
select s.ToUpper();
foreach (string item in query)
Console.WriteLine(item);
}
}
Instead of trying to write a single where statement, you can optionally extend an existing IQueryable<T> (or IEnumerable<T> like our code sample would be using) by chaining Where statements.
var query = names;
if (!string.IsNullOrEmpty(s1))
{
query = query.Where(x => x.Contains(s1));
}
if (!string.IsNullOrEmpty(s2))
{
query = query.Where(x => x.Contains(s2));
}
// ... (or make a foreach loop if s1,s2,s3 were an array.
var results = query.OrderBy(x => x).Select(x => x.ToUpper());
Chaining Where like this is equivalent to "anding" all where predicates together.
EDIT:
To update why your specific implementation doesn't work is because your && operators are incorrect for the given use-case.
(string.IsNullOrEmpty(s1) || s.ToLower().Contains(s1)) &&
(string.IsNullOrEmpty(s2) || s.ToLower().Contains(s2)) &&
(string.IsNullOrEmpty(s3) || s.ToLower().Contains(s3))
Remember that && requires both left and right statements to be true, so in your case !string.IsNullOrEmpty(s2) && s.ToLower().Contains(s2) this is saying that s2 must always be not-empty/null.
Please consider this:
using System;
using System.Linq;
public class Simple {
public static void Main() {
string[] names = { "Burke laptop mobile", "laptop burke mobile", "Computer Laptop",
"Mobile", "Ahemed", "Sania",
"Kungada", "David","United","Sinshia" };
//search words
string s1 = "mobile";
string s2 = "";
string s3 = "laptop";
var query = from s in names
where (s1 != null && s.ToLower().Contains(s1))
&& (s2 != null && s.ToLower().Contains(s2))
&& (s3 != null && s.ToLower().Contains(s3))
orderby s
select s.ToUpper();
foreach (string item in query)
Console.WriteLine(item);
}
}
If you use an array or list instead of multiple strings, you could do something like
List<string> searchWords = new List<string>
{
"mobile",
"",
"laptop"
};
var query = names
.Where(n => searchWords
.Where(s => !string.IsNullOrEmpty(s))
.All(s => n.ToLower().Contains(s)))
.Select(n => n.ToUpper())
.OrderBy(n => n);
This also is more flexible, as you can have any number of search words, without changing the query.

Inserting value in List of values of a Key in Dictionary

I have a rowsDictionary that its keys point to a list of EmployeeSummary classes.
In those EmployeeSummary classes we also have a string property of Delivery_System
I am looping through this in this way but now stuck in the part that I want to have a deliverySystemFinder dictioanry that its keys are combinedKey as below and the value for each key is a list of distinct delivery_system values
//rowsDictionary is a Dictionary<string, List<EmployeeSummary>>
Dictionary<string, List<string>> deliverySystemFinder = new Dictionary<string, List<string>>();
foreach (string key in rowsDictionary.Keys)
{
List<EmployeeSummary> empList = rowsDictionary[key];
foreach (EmployeeSummary emp in empList)
{
string combinedKey = emp.LastName.Trim().ToUpper() + emp.FirstName.Trim().ToUpper();
string delivery_system = emp.Delivery_System;
// so now I should go and
//A) does deliverySystemFinder have this combinedKey? if not add it.
//B) Does combinedKey in the list of its values already have the value for delivery_system? if it does not then add it
}
}
This would work, for start:
foreach (string key in rowsDictionary.Keys)
{
List<EmployeeSummary> empList = rowsDictionary[key];
foreach (EmployeeSummary emp in empList)
{
string combinedKey = emp.LastName.Trim().ToUpper() +
emp.FirstName.Trim().ToUpper();
string delivery_system = emp.Delivery_System;
List<string> systems = null;
// check if the dictionary contains the list
if (!deliverySystemFinder.TryGetValue(combinedKey, out systems))
{
// if not, create it and add it
systems = new List<string>();
deliverySystemFinder[combinedKey] = systems;
}
// check if the list contains the value and add it
if (!systems.Contains(delivery_system))
systems.Add(delivery_system);
}
}
Now, a couple of remarks:
It doesn't make sense to iterate through Keys, and then do a lookup in each iteration. You can directly iterate KeyValuePairs using a foreach loop.
Using concatenated strings as unique keys often fails. In this case, what happens if you have users { LastName="Some", FirstName="Body" } and { LastName="So", FirstName="Mebody" } in your list?
Checking if a List contains a value is a O(n) operation. You would greatly improve performance if you used a HashSet<string> instead.
Finally, the simplest way to achieve what you're trying to do is to ditch those loops and simply use:
// returns a Dictionary<EmployeeSummary, List<string>>
// which maps each distinct EmployeeSummary into a list of
// distinct delivery systems
var groupByEmployee = rowsDictionary
.SelectMany(kvp => kvp.Value)
.GroupBy(s => s, new EmployeeSummaryEqualityComparer())
.ToDictionary(
s => s.Key,
s => s.Select(x => x.Delivery_System).Distinct().ToList());
With EmployeeSummaryEqualityComparer defined something like:
class EmployeeSummaryEqualityComparer : IEqualityComparer<EmployeeSummary>
{
public bool Equals(EmployeeSummary x, EmployeeSummary y)
{
if (object.ReferenceEquals(x, null))
return object.ReferenceEquals(y, null);
return
x.FirstName == y.FirstName &&
x.LastName == y.LastName &&
... (depending on what constitutes 'equal' for you)
}
public int GetHashCode(EmployeeSummary x)
{
unchecked
{
var h = 31; // null checks might not be necessary?
h = h * 7 + (x.FirstName != null ? x.FirstName.GetHashCode() : 0);
h = h * 7 + (x.LastName != null ? x.LastName.GetHashCode() : 0);
... other properties similarly ...
return h;
}
}
}
If you really think that using the string key will work in all your cases, you can do it without the custom equality comparer:
// returns a Dictionary<string, List<string>>
var groupByEmployee = rowsDictionary
.SelectMany(kvp => kvp.Value)
.GroupBy(s => s.LastName.ToUpper() + s.FirstName.ToUpper())
.ToDictionary(
s => s.Key,
s => s.Select(x => x.Delivery_System).Distinct().ToList());

Sorting strings in C#

I have a delimited string that I need sorted. First I need to check if 'Francais' is in the string, if so, it goes first, then 'Anglais' is next, if it exists. After that, everything else is alphabetical. Can anyone help me? Here's what I have so far, without the sorting
private string SortFrench(string langs)
{
string _frenchLangs = String.Empty;
string retval = String.Empty;
_frenchLangs = string.Join(" ; ",langs.Split(';').Select(s => s.Trim()).ToArray());
if (_frenchLangs.Contains("Francais"))
retval += "Francais";
if (_frenchLangs.Contains("Anglais"))
{
if (retval.Length > 0)
retval += " ; ";
retval += "Anglais";
}
//sort the rest
return retval;
}
Someone liked my comment, so figured I'd go ahead and convert that into your code:
private string SortFrench(string langs)
{
var sorted = langs.Split(';')
.Select(s => s.Trim())
.OrderByDescending( s => s == "Francais" )
.ThenByDescending( s => s == "Anglais" )
.ThenBy ( s => s )
.ToArray();
return string.Join(" ; ",sorted);
}
My syntax may be off slightly as I've been in the Unix world for awhile now and haven't used much LINQ lately, but hope it helps.
Here's what I came up with. You could change the .Sort() for a OrderBy(lang => lang) after the Select, but I find it's cleaner that way.
public string SortLanguages(string langs)
{
List<string> languages = langs.Split(';').Select(s => s.Trim()).ToList();
languages.Sort();
PlaceAtFirstPositionIfExists(languages, "anglais");
PlaceAtFirstPositionIfExists(languages, "francais");
return string.Join(" ; ", languages);
}
private void PlaceAtFirstPositionIfExists(IList<string> languages, string language)
{
if (languages.Contains(language))
{
languages.Remove(language);
languages.Insert(0, language);
}
}
You should use a custom comparer class
it will allow you to use the built in collection sorting functions, or the linq OrderBy using your own criteria
Try this:
private string SortFrench(string langs)
{
string _frenchLangs = String.Empty;
List<string> languages = langs
.Split(';')
.Select(s => s.Trim())
.OrderBy(s => s)
.ToList();
int insertAt = 0;
if (languages.Contains("Francais"))
{
languages.Remove("Francais");
languages.Insert(insertAt, "Francais");
insertAt++;
}
if(languages.Contains("Anglais"))
{
languages.Remove("Anglais");
languages.Insert(insertAt, "Anglais");
}
_frenchLangs = string.Join(" ; ", languages);
return _frenchLangs;
}
All can be done in single line
private string SortFrench(string langs)
{
return string.Join(" ; ", langs.Split(';').Select(s => s.Trim())
.OrderBy(x => x != "Francais")
.ThenBy(x => x != "Anglais")
.ThenBy(x=>x));
}
Sorting alphabetically is simple; adding .OrderBy(s => s) before that .ToArray() will do that. Sorting based on the presence of keywords is trickier.
The quick and dirty way is to split into three:
Strings containing "Francais": .Where(s => s.Contains("Francais")
Strings containing "Anglais": .Where(s => s.Contains("Anglais")
The rest: .Where(s => !francaisList.Contains(s) && !anglaisList.Contains(s))
Then you can sort each of these alphabetically, and concatenate them.
Alternatively, you can implement IComparer using the logic you described:
For strings A and B:
If A contains "Francais"
If B contains "Francais", order alphabetically
Else
If B contains "Francais", B goes first
Else
If A contains "Anglais"
If B contains "Anglais", order alphabetically
Else, A goes first
Else, order alphabetically
There may be room for logical re-arrangement to simplify that. With all that logic wrapped up in a class that implements IComparer, you can specify that class for use by .OrderBy() to have it order your query results based on your custom logic.
You can also use Array.Sort(yourStringArray)
This code creates a list of the languages, sorts them using a custom comparer, and then puts the sorted list back together:
const string langs = "foo;bar;Anglais;Francais;barby;fooby";
var langsList = langs.Split(';').ToList();
langsList.Sort((s1, s2) =>
{
if (s1 == s2)
return 0;
if (s1 == "Francais")
return -1;
if (s2 == "Francais")
return 1;
if (s1 == "Anglais")
return -1;
if (s2 == "Anglais")
return 1;
return s1.CompareTo(s2);
});
var sortedList = string.Join(";", langsList);
Console.WriteLine(sortedList);
This way you can set any list of words in front:
private static string SortFrench(string langs, string[] setStartList)
{
string _frenchLangs = String.Empty;
List<string> list = langs.Split(';').Select(s => s.Trim()).ToList();
list.Sort();
foreach (var item in setStartList){
if (list.Contains(item))
{
list.Remove(setFirst);
}
}
List<string> tempList = List<string>();
tempList.AddRange(setStartList);
tempList.AddRange(list);
list = tempList;
_frenchLangs = string.Join(" ; ", list);
return _frenchLangs;
}

Elegant way to check if a list contains an object where one property is the same, and replace only if the date of another property is later

I have a class as follows :
Object1{
int id;
DateTime time;
}
I have a list of Object1. I want to cycle through another list of Object1, search for an Object1 with the same ID and replace it in the first list if the time value is later than the time value in the list. If the item is not in the first list, then add it.
I'm sure there is an elegant way to do this, perhaps using linq? :
List<Object1> listOfNewestItems = new List<Object1>();
List<Object1> listToCycleThrough = MethodToReturnList();
foreach(Object1 object in listToCycleThrough){
if(listOfNewestItems.Contains(//object1 with same id as object))
{
//check date, replace if time property is > existing time property
} else {
listOfNewestItems.Add(object)
}
Obviously this is very messy (and that's without even doing the check of properties which is messier again...), is there a cleaner way to do this?
var finalList = list1.Concat(list2)
.GroupBy(x => x.id)
.Select(x => x.OrderByDescending(y=>y.time).First())
.ToList();
here is the full code to test
public class Object1
{
public int id;
public DateTime time;
}
List<Object1> list1 = new List<Object1>()
{
new Object1(){id=1,time=new DateTime(1991,1,1)},
new Object1(){id=2,time=new DateTime(1992,1,1)}
};
List<Object1> list2 = new List<Object1>()
{
new Object1(){id=1,time=new DateTime(2001,1,1)},
new Object1(){id=3,time=new DateTime(1993,1,1)}
};
and OUTPUT:
1 01.01.2001
2 01.01.1992
3 01.01.1993
This is how to check:
foreach(var object in listToCycleThrough)
{
var currentObject = listOfNewestItems
.SingleOrDefault(obj => obj.Id == object.Id);
if(currentObject != null)
{
if (currentObject.Time < object.Time)
currentObject.Time = object.Time
}
else
listOfNewestItems.Add(object)
}
But if you have large data, would be suggested to use Dictionary in newest list, time to look up will be O(1) instead of O(n)
You can use LINQ. Enumerable.Except to get the set difference(the newest), and join to find the newer objects.
var listOfNewestIDs = listOfNewestItems.Select(o => o.id);
var listToCycleIDs = listToCycleThrough.Select(o => o.id);
var newestIDs = listOfNewestIDs.Except(listToCycleIDs);
var newestObjects = from obj in listOfNewestItems
join objID in newestIDs on obj.id equals objID
select obj;
var updateObjects = from newObj in listOfNewestItems
join oldObj in listToCycleThrough on newObj.id equals oldObj.id
where newObj.time > oldObj.time
select new { oldObj, newObj };
foreach (var updObject in updateObjects)
updObject.oldObj.time = updObject.newObj.time;
listToCycleThrough.AddRange(newestObjects);
Note that you need to add using System.Linq;.
Here's a demo: http://ideone.com/2ASli
I'd create a Dictionary to lookup the index for an Id and use that
var newItems = new List<Object1> { ...
IList<Object1> itemsToUpdate = ...
var lookup = itemsToUpdate.
Select((i, o) => new { Key = o.id, Value = i }).
ToDictionary(i => i.Key, i => i.Value);
foreach (var newItem in newitems)
{
if (lookup.ContainsKey(newitem.ID))
{
var i = lookup[newItem.Id];
if (newItem.time > itemsToUpdate[i].time)
{
itemsToUpdate[i] = newItem;
}
}
else
{
itemsToUpdate.Add(newItem)
}
}
That way, you wouldn't need to reenumerate the list for each new item, you'd benefit for the hash lookup performance.
This should work however many times an Id is repeated in the list of new items.

Find the count of duplicate items in a C# List

I am using List in C#. Code is as mentioned below:
TestCase.cs
public class TestCase
{
private string scenarioID;
private string error;
public string ScenarioID
{
get
{
return this.scenarioID;
}
set
{
this.scenarioID = value;
}
}
public string Error
{
get
{
return this.error;
}
set
{
this.error = value;
}
}
public TestCase(string arg_scenarioName, string arg_error)
{
this.ScenarioID = arg_scenarioName;
this.Error = arg_error;
}
}
List I am createing is:
private List<TestCase> GetTestCases()
{
List<TestCase> scenarios = new List<TestCase>();
TestCase scenario1 = new TestCase("Scenario1", string.Empty);
TestCase scenario2 = new TestCase("Scenario2", string.Empty);
TestCase scenario3 = new TestCase("Scenario1", string.Empty);
TestCase scenario4 = new TestCase("Scenario4", string.Empty);
TestCase scenario5 = new TestCase("Scenario1", string.Empty);
TestCase scenario6 = new TestCase("Scenario6", string.Empty);
TestCase scenario7 = new TestCase("Scenario7", string.Empty);
scenarios.Add(scenario1);
scenarios.Add(scenario2);
scenarios.Add(scenario3);
scenarios.Add(scenario4);
scenarios.Add(scenario5);
scenarios.Add(scenario6);
scenarios.Add(scenario7);
return scenarios;
}
Now I am iterating through the list. I want to find the how many duplicate testcases are there in a list with same ScenarioID. Is there any way to solve it using Linq or any inbuilt method for List?
Regards,
Priyank
Try this:
var numberOfTestcasesWithDuplicates =
scenarios.GroupBy(x => x.ScenarioID).Count(x => x.Count() > 1);
As a first idea:
int dupes = list.Count() - list.Distinct(aTestCaseComparer).Count();
To just get the duplicate count:
int duplicateCount = scenarios.GroupBy(x => x.ScenarioID)
.Sum(g => g.Count()-1);
var groups = scenarios.GroupBy(test => test.ScenarioID)
.Where(group => group.Skip(1).Any());
That will give you a group for each ScenarioID that has more than one items. The count of the groups is the number of duplicate groups, and the count of each group internally is the number of duplicates of that single item.
Additional note, the .Skip(1).Any() is there because a .Count() in the Where clause would need to iterate every single item just to find out that there is more than one.
Something like this maybe
var result= GetTestCases()
.GroupBy (x =>x.ScenarioID)
.Select (x =>new{x.Key,nbrof=x.Count ()} );
To get total number of duplicates, yet another:
var set = new HashSet<string>();
var result = scenarios.Count(x => !set.Add(x.ScenarioID));
To get distinct duplicates:
var result = scenarios.GroupBy(x => x.ScenarioID).Count(x => x.Skip(1).Any());

Categories