C# lambda - Differences (Except) of inner lists inside two lists - c#

I have two lists of 'Table' 1 ----N 'Columns.
The first list holds the default schema that must be achieved.
The second list holds the schema defined by the user.
I need to compare the second list against the first one, retrieving the tables where the schema mismatches, also the list of columns missing/unknown.
Consider the following example:
public class Table
{
public string Name {get;set;}
public IList<Column> Columns {get;set;}
public Table()
{
Columns = new List<Column>();
}
}
public class Column
{
public string Name {get;set;}
}
//...
var Default1 = new Table() { Name = "Table1" };
Default1.Columns.Add(new Column() { Name = "X1" });
Default1.Columns.Add(new Column() { Name = "X2" });
var Default2 = new Table() { Name = "Table2" };
Default2.Columns.Add(new Column() { Name = "Y1" });
Default2.Columns.Add(new Column() { Name = "Y2" });
var DefaultSchema = new List<Table>() { Default1, Default2 };
var T1 = new Table() { Name = "Table1" };
T1.Columns.Add(new Column() { Name = "X1" });
var T2 = new Table() { Name = "Table2" };
T2.Columns.Add(new Column() { Name = "Y2" });
var MyTables = new List<Table>() { T1, T2};
/*
var DiffTables = DefaultSchema.Join(??).Select(x => x.Columns).Except(?? MyTables.Select(y => y.Columns) ...
*/
Expected result:
var DiffTables =
{
{
Name = "Table1",
Columns =
{
Name = "X2" //Missing from DefaultSchema.Table1
}
},
{
Name = "Table2",
Columns =
{
Name = "Y1" //Missing from DefaultSchema.Table2
}
}
}
Is there any way of doing this with a lamdba expression, or just by a master+nested foreach?
Thanks!

For comparing just two tables, it would be:
Default1.Columns
.Select(x => x.Name)
.Except(T1.Columns.Select(x => x.Name));
For comparing two schemas, it would be:
DefaultSchema
.Zip(MyTables, (x, y) => new
{ Name = x.Name,
MissingColumns = x.Columns.Select(x1 => x1.Name)
.Except(y.Columns.Select(y1 => y1.Name)) });
Zip combines any two sequences, so that item 1 gets matched with item 2, item 2 gets matched with item 2, etc. (in other words, like a zipper).
Except removes all items of one sequence from another sequence.
As #MetroSmurf pointed out, my original version had an error that was causing Except to fail. The reason is that the it was comparing the columns based on whether they are referring to the same object. I added the inner Select statements to allow the columns to be compared by Name instead.
Note also that this answer assumes the two schema being compared have the same tables in the same order.
Another way to go (inspired by #MetroSmurf's use of IEquatable) is to create a custom IEqualityComparer, like this:
public class ColumnComparer : IEqualityComparer<Column>
{
public bool Equals(Column x, Column y)
{
if (Object.ReferenceEquals(x, y)) return true;
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null)) return false;
return x.Name == y.Name;
}
public int GetHashCode(Column column)
{
if (Object.ReferenceEquals(column, null)) return 0;
return column.Name.GetHashCode();
}
}
Then the LINQ query reduces down to just this:
DefaultSchema.Zip(MyTables, (x, y) => new
{
Name = x.Name,
MissingColumns = x.Columns.Except(y.Columns, new ColumnComparer())
});
Again, the same assumption that the tables are equivalent for the two schemas applies.
If this assumption doesn't apply (i.e., MyTables are not in order or may be missing tables), you can use a "left outer join" instead:
var result =
from table in DefaultSchema
join myTable in MyTables on table.Name equals myTable.Name into matchingTables
from matchingTable in matchingTables.DefaultIfEmpty()
select new
{
Name = table.Name,
MissingColumns = matchingTable == null
? null
: table.Columns.Except(matchingTable.Columns, new ColumnComparer())
};
With this query, a result is generated for every table in DefaultSchema. If one or more of the MyTables has the same name, the missing columns get reported. If the table is missing from MyTables, the value of MissingColumns is null. Note that this will not report on any extra tables in MyTables that don't exist in DefaultSchema.

Here is how you can do it:
var result =
DefaultSchema
.Select(
table =>
new
{
Table = table,
UserTable = MyTables.FirstOrDefault(utable => utable.Name == table.Name)
})
.Select(item => new
{
Name = item.Table.Name,
MissingColumns =
item.UserTable == null
? item.Table.Columns.Select(x => x.Name).ToArray()
: item.Table.Columns.Select(x => x.Name)
.Except(item.UserTable.Columns.Select(x => x.Name))
.ToArray()
}).ToList();
This code handles the case where the two lists are not guaranteed to have the same number of tables or to have the tables in the correct order.
It starts by selecting the default schema table with its corresponding user table (or null of the user table is not found).
Then, for each such object, it creates a new object that contains the default schema table name, and the list of missing columns.
The list of missing columns are all the columns in the default schema table if a corresponding user table is not found.
If a corresponding user table is found, Except is used to subtract the list of columns defined in the user table from the columns defined in the default schema table.

Using a complex Linq query is going to over-complicate what would be an otherwise easier to understand and maintainable loop. An alternative to devuxer's suggestion (which works great assuming everything in both lists can be zipped):
First, I'd implement IEquatable for an easy comparison:
public class Column : IEquatable<Column>
{
public string Name { get; set; }
public bool Equals( Column other )
{
// consider case insensitive comparison if needed.
return Name == other.Name;
}
}
The loop then becomes:
var diffs = new List<Table>();
foreach( Table table in MyTables )
{
Table schema = DefaultSchema
// consider case insensitive comparison if needed.
.FirstOrDefault( x => x.Name == table.Name );
if( schema == null )
{
// no matching schema, everything should be evaluated.
diffs.Add( table );
continue;
}
// use IEquatable to pull out the differences
List<Column> columns = table.Columns.Except( schema.Columns ).ToList();
if( columns.Any() )
{
diffs.Add( new Table { Name = table.Name, Columns = columns } );
}
}

Related

Multiple condition on same column inside Linq where

How can I write a linq query to match two condition on same column in the table?
Here one person can be assigned to multiple types of works and it is store in PersonWorkTypes table containing the details of persons and their worktypes.
So I need to get the list of persons who have both fulltime and freelance works.
I have tried
people.where(w => w.worktype == "freelance" && w.worktype == "fulltime")
But it returns an empty result.
You can try this
public class Person {
public string Name {get;set;}
public List<PersonWorkType> PersonWorkTypes {get;set;}
}
public class PersonWorkType {
public string Type {get;set;}
}
public static void Main()
{
var people = new List<Person>();
var person = new Person { Name = "Toño", PersonWorkTypes = new List<PersonWorkType>() { new PersonWorkType { Type = "freelance" } } };
var person2 = new Person { Name = "Aldo", PersonWorkTypes = new List<PersonWorkType>() { new PersonWorkType { Type = "freelance" }, new PersonWorkType { Type = "fulltime" } } };
var person3 = new Person { Name = "John", PersonWorkTypes = new List<PersonWorkType>() { new PersonWorkType { Type = "freelance" }, new PersonWorkType { Type = "fulltime" } } };
people.Add(person);
people.Add(person2);
people.Add(person3);
var filter = people.Where(p => p.PersonWorkTypes.Any(t => t.Type == "freelance") && p.PersonWorkTypes.Any(t => t.Type == "fulltime"));
foreach(var item in filter) {
Console.WriteLine(item.Name);
}
}
This returns person that contains both types in PersonWorkTypes
AS already said, && operator means, that BOTH conditions has to be met. So in your condition it means that you want worktype type to by freelanceand fulltime at the same time, which is not possible :)
Most probably you want employees that have work type freelance OR fulltime, thus your condition should be:
people.Where(w=>w.worktype=="freelance" || w.worktype =="fulltime")
Or, if person can be set more than once in this table, then you could do:
people
.Where(w=>w.worktype=="freelance" || w.worktype =="fulltime")
// here I assume that you have name of a person,
// Basically, here I group by person
.GroupBy(p => p.Name)
// Here we check if any person has two entries,
// but you have to be careful here, as if person has two entries
// with worktype freelance or two entries with fulltime, it
// will pass condition as well.
.Where(grp => grp.Count() == 2)
.Select(grp => grp.FirstOrDefault());
w.worktype=="freelance"
w.worktype=="fulltime"
These are mutually exclusive to each other, and therefore cannot both be true to ever satisfy your AND(&&) operator.
I am inferring that you have two (or more) different rows in your table per person, one for each type of work they do. If so, the Where() method is going to check your list line-by-line individually and won't be able to check two different elements of a list to see if Alice (for example) both has en entry for "freelance" and an entry for "fulltime" as two different elements in the list. Unfortuantely, I can't think of an easy way to do this in a single query, but something like this might work:
var fulltimeWorkers = people.Where(w=>w.worktype=="fulltime");
var freelanceWorkers = people.Where(w=>w.worktype=="freelance");
List<Person> peopleWhoDoBoth = new List<Person>();
foreach (var worker in fulltimeWorkers)
{
if (freelanceWorkers.Contains(worker)
peopleWhoDoBoth.Add(worker);
}
This is probably not the most efficient way possible of doing it, but for small data sets, it shouldn't matter.

Creating LINQ with where clause

I have this LINQ:
private static object getObjectModels(DbContext context, IQueryable<int> contractsID)
{
return (from objectModel in context.Set<ObjectModel>()
where "column named conId contains contractsID "
select new ContractHelper
{
Id = contract.Id,
ClientId = contract.ClientId,
});
}
I need to select from table records where column named conID have values of contractsID.
The contractsID is int array.
The conID is int value column.
What have I write in this row:
where "column named conId contains contractsID"
to get all records where column conID have item that equal to item in contractsID array?
You might be able to invert the where clause and use a 'contains', such as:
private static object getObjectModels(DbContext context, IQueryable<int> contractsID)
{
return (from objectModel in context.Set<ObjectModel>()
where objectModel.conId.HasValue && contractsID.Contains(objectModel.conId)
select new ContractHelper
{
Id = contract.Id,
ClientId = contract.ClientId,
});
}
You might need to convert the IQueryable to a list however.
var myIds = contractIDs.ToList();
...
where myIds.Contains(objectModel.conId)
...
You can go with an int array to make linq translate to the correct IN SQL syntax
private static object getObjectModels(DbContext context, IQueryable<int> contractsID)
{
// Necessary to translate Contains to SQL IN CLAUSE
int [] contractIdsArray = contractsID.ToArray() ;
return (from objectModel in context.Set<ObjectModel>()
where contractIdsArray.Contains(objectModel.conId)
select new ContractHelper
{
Id = contract.Id,
ClientId = contract.ClientId,
});
}

Linq - complex query - list in a list

I have this class:
public class RecipeLine
{
public List<string> PossibleNames { get; set; }
public string Name { get; set; }
public int Index { get; set; }
}
I have a list of multiple RecipeLine objects. For example, one of them looks like this:
Name: apple
PossibleNames: {red delicious, yellow delicious, ... }
Index = 3
I also have a table in my db which is called tblFruit and has 2 columns: name and id. the id isn't the same as the index in the class.
What I want to do is this:
for the whole list of RecipeLine objects, find all the records in tblFruit whose name is in PossibleNames, and give me back the index of the class and the id in the table. So we have a list in a list (a list of RecipeLine objects who have a list of strings). How can I do this with Linq in c#?
I'm pretty sure there isn't going to be a LINQ statement that you can construct for this that will create a SQL query to get the data exactly how you want. Assuming tblFruit doesn't have too much data, pull down the whole table and process it in memory with something like...
var result = tblFruitList.Select((f) => new {Id = f.id, Index = recipeLineList.Where((r) => r.PossibleNames.Contains(f.name)).Select((r) => r.Index).FirstOrDefault()});
Keeping in mind that Index will be 0 if there isn't a recipeLine with the tblFruit's name in it's PossibleNames list.
A more readable method that doesn't one-line it into a nasty linq statement is...
Class ResultItem {
int Index {get;set;}
int Id {get;set;}
}
IEnumerable<ResultItem> GetRecipeFruitList(IEnumerable<FruitItem> tblFruitList, IEnumerable<RecipeLine> recipeLineList) {
var result = new List<ResultItem>();
foreach (FruitItem fruitItem in tblFruitList) {
var match = recipeLineList.FirstOrDefault((r) => r.PossibleNames.Contains(fruitItem.Name));
if (match != null) {
result.Add(new ResultItem() {Index = match.Index, Id = fruitItem.Id});
}
}
return result;
}
If tblFruit has a lot of data you can try and pull down only those items that have a name in the RecipeLine list's of PossibleName lists with something like...
var allNames = recipeLineList.SelectMany((r) => r.PossibleNames).Distinct();
var tblFruitList = DbContext.tblFruit.Where((f) => allNames.Contains(f.Name));
To get all the fruits within your table whose Name is in PossibleNames use the following:
var query = myData.Where(x => myRecipeLines.SelectMany(y => y.PossibleNames).Contains(x.Name));
I don't think you can do this in a single step.
I would first create a map of the possible names to indexes:
var possibleNameToIndexMap = recipes
.SelectMany(r => r.PossibleNames.Select(possibleName => new { Index = r.Index, PossbileName = possibleName }))
.ToDictionary(x => x.PossbileName, x => x.Index);
Then, I would retrieve the matching names from the table:
var matchingNamesFromTable = TblFruits
.Where(fruit => possibleNameToIndexMap.Keys.Contains(fruit.Name))
.Select(fruit => fruit.Name);
Then you can use the names retrieved from the tables as keys into your original map:
var result = matchingNamesFromTable
.Select(name => new { Name = name, Index = possibleNameToIndexMap[name]});
Not fancy, but it should be easy to read and maintain.

Is there any way to reduce duplication in these two linq queries

Building a bunch of reports, have to do the same thing over and over with different fields
public List<ReportSummary> ListProducer()
{
return (from p in Context.stdReports
group p by new { p.txt_company, p.int_agencyId }
into g
select new ReportSummary
{
PKi = g.Key.int_agencyId,
Name = g.Key.txt_company,
Sum = g.Sum(foo => foo.lng_premium),
Count = g.Count()
}).OrderBy(q => q.Name).ToList();
}
public List<ReportSummary> ListCarrier()
{
return (from p in Context.stdReports
group p by new { p.txt_carrier, p.int_carrierId }
into g
select new ReportSummary
{
PKi = g.Key.int_carrierId,
Name = g.Key.txt_carrier,
Sum = g.Sum(foo => foo.lng_premium),
Count = g.Count()
}).OrderBy(q => q.Name).ToList();
}
My Mind is drawing a blank on how i might be able to bring these two together.
It looks like the only thing that changes are the names of the grouping parameters. Could you write a wrapper function that accepts lambdas specifying the grouping parameters? Or even a wrapper function that accepts two strings and then builds raw T-SQL, instead of using LINQ?
Or, and I don't know if this would compile, can you alias the fields in the group statement so that the grouping construct can always be referenced the same way, such as g.Key.id1 and g.Key.id2? You could then pass the grouping construct into the ReportSummary constructor and do the left-hand/right-hand assignment in one place. (You'd need to pass it as dynamic though, since its an anonymous object at the call site)
You could do something like this:
public List<ReportSummary> GetList(Func<Record, Tuple<string, int>> fieldSelector)
{
return (from p in Context.stdReports
group p by fieldSelector(p)
into g
select new ReportSummary
{
PKi = g.Key.Item2
Name = g.Key.Item1,
Sum = g.Sum(foo => foo.lng_premium),
Count = g.Count()
}).OrderBy(q => q.Name).ToList();
}
And then you could call it like this:
var summary = GetList(rec => Tuple.Create(rec.txt_company, rec.int_agencyId));
or:
var summary = GetList(rec => Tuple.Create(rec.txt_carrier, rec.int_carrierId));
Of course, you'll want to replace Record with whatever type Context.stdReports is actually returning.
I haven't checked to see if that will compile, but you get the idea.
Since all that changes between the two queries is the group key, parameterize it. Since it's a composite key (has more than one value within), you'll need to create a simple class which can hold those values (with generic names).
In this case, to parameterize it, make the key selector a parameter to your function. It would have to be an expression and the method syntax to get this to work. You could then generalize it into a function:
public class GroupKey
{
public int Id { get; set; }
public string Name { get; set; }
}
private IQueryable<ReportSummary> GetReport(
Expression<Func<stdReport, GroupKey>> groupKeySelector)
{
return Context.stdReports
.GroupBy(groupKeySelector)
.Select(g => new ReportSummary
{
PKi = g.Key.Id,
Name = g.Key.Name,
Sum = g.Sum(report => report.lng_premium),
Count = g.Count(),
})
.OrderBy(summary => summary.Name);
}
Then just make use of this function in your queries using the appropriate key selectors.
public List<ReportSummary> ListProducer()
{
return GetReport(r =>
new GroupKey
{
Id = r.int_agencyId,
Name = r.txt_company,
})
.ToList();
}
public List<ReportSummary> ListCarrier()
{
return GetReport(r =>
new GroupKey
{
Id = r.int_carrierId,
Name = r.txt_carrier,
})
.ToList();
}
I don't know what types you have mapped for your entities so I made some assumptions. Use whatever is appropriate in your case.

Querying 2 Sets of Complex-Objects Using Linq

I have two lists comprised of different complex-objects, and each one is from 2 separate data-sources. One list may-or-may-not contain records. When any records exist in the "optional" list I need the "normal" list to be further-filtered.
Unfortunately, I can only find very simple examples here and online, which is why I am asking this question.
The Pseudo-Logic Goes Like This:
When QuickFindMaterial records exist, get all DataSource records where query.Name is in the QuickFindMaterial.Material collection. If no QuickFindMaterial records exist do not affect the final result. Lastly, select all distinct DataSourcerecords.
The Classes Looks Like:
public class QuickFindMaterial
{
public string SiteId { get; set; }
public string Material { get; set; }
}
The Code Looks Like:
I have commented-out my failed WHERE logic below
var dataSource = DocumentCollectionService.ListQuickFind();
var quickFindMaterial = ListMaterialBySiteID(customerSiteId);
var distinct = (from query in dataSource
select new
{
ID = query.DocumentID,
Library = query.DocumentLibrary,
ModifiedDate = query.DocumentModifiedDate,
Name = query.DocumentName,
Title = query.DocumentTitle,
Type = query.DocumentType,
Url = query.DocumentUrl,
})
//.Where(x => x.Name.Contains(quickFindMaterial.SelectMany(q => q.Material)))
//.Where(x => quickFindMaterial.Contains(x.Name))
.Distinct();
I think this is what you want:
.Where(x => !quickFindMaterial.Any() || quickFindMaterial.Any(y => x.Name == y.Material))
You could join on Name -> Material
Example:
var distinct = (from query in dataSource
join foo in quickFindMaterial on query.Name equals foo.Material
select new
{
ID = query.DocumentID,
Library = query.DocumentLibrary,
ModifiedDate = query.DocumentModifiedDate,
Name = query.DocumentName,
Title = query.DocumentTitle,
Type = query.DocumentType,
Url = query.DocumentUrl,
}).Distinct();

Categories