Something like a VLOOKUP - c#

I'm attempting to merge two lists of different objects where a specific field (employeeID) is equal to a specific field[0,0] in another list. My code looks like this:
int i = Users.Count() - 1;
int i2 = oracleQuery.Count() - 1;
for (int c = 0; c <= i; c++)
{
for (int d = 0; d <= i2; d++)
{
if (Users[c].getEmployeeID().ToString() == oracleQuery[d][0,0].ToString())
{
Users[c].setIDMStatus(oracleQuery[d][0,1].ToString());
}
}
}
This works... but it doesn't seem efficient. Any suggestions for more efficient code that will ultimately lead to the Users list containing the new information from the oracleQuery list?

You could use a join with Enumerable.Join:
var matches = Users.Join(oracleQuery,
u => u.getEmployeeId().ToString(),
oq => oq[0,0].ToString(),
(u,oc) => new { User = u, Status = oc[0,1].ToString() });
foreach(var match in matches)
match.User.setIDMStatus(match.Status);
Note that you could eliminate the ToString() calls if getEmployeeId() and the oracleQuery's [0,0] element are of the same type.

The only thing I notice as far as efficiency is that you use the Enumerable.Count() method, which enumerates the results before you loop through again explicitly in your for loops. I think the LINQ implementation will get rid of the pass through the results to count the elements.
I don't know how you feel about using LINQ QUERY EXPRESSIONS, but this is what I like best:
var matched = from user in Users
join item in oracleQuery on user.getEmployeeID().ToString() equals item[0,0].ToString()
select new {user = user, IDMStatus = item[0,1] };
foreach (var pair in matched)
{
pair.user.setIDMStatus(pair.IDMStatus);
}
You could also use nested foreach loops (if there are multiple matches and set is called multiple times):
foreach (var user in Users)
{
foreach (var match in oracleQuery.Where(item => user.getEmployeeID().ToString() == item[0,0].ToString()) {
user.setIDMStatus(match[0,1]);
}
}
Or if there will only be one match for sure:
foreach (var user in Users)
{
var match = oracleQuery.SingleOrDefault(item => user.getEmployeeID().ToString() == item[0,0].ToString());
if (match != null) {
user.setIDMStatus(match[0,1]);
}
}
I don't think there is any real efficiency problem in what you've written, but you can benchmark it against the implementation in LINQ. I think that using foreach or a Linq query expression might make the code easier to read, but I think there is not a problem with efficiency. You can also write the LINQ query expression using LINQ method syntax, as was done in another answer.

If the data comes from a databases you could do a join there. Otherwise, you could sort the two lists and do a merge join that would be faster than what you have now.
However, since C# introduced LINQ there are a lot of ways to do this in code. Just look up using linq to join/merge lists.

Related

Convert nested for each loop to LINQ

I have a for each loop to get data which is very time consuming.any suggestion to convert this to linq. Thanks in advance.
iListReport = obj.GetClosedReports();
string sRepType ="";
foreach (ReportStatisticsInfo item in reportStatistic)
{
sRepType = item.ReportName.Trim();
IList<string> lastClosedReport = new List<string>();
foreach (TaskListInfo taskInfo in iListReport)
{
string reportName = taskInfo.DocumentName.Trim();
if (string.Compare(sRepType, reportName, true) == 0)
{
if (taskInfo.ActionID == Convert.ToInt16(ReportAction.Close) && !lastClosedReport.Contains(taskInfo.DocumentID))
{
iClosedreportCount += 1;
lastClosedReport.Add(taskInfo.DocumentID);
}
}
}
}
Here you go. I've done a pretty literal translation of your code into LINQ which will hopefully help you to see how I've converted it.
Note the use of the let keyword which allows you to declare a range variable (which allows you to perform your trim once and then use the result in multiple places).
Also note the use of group by at the bottom of the LINQ query to ensure we only take the first occurence of each documentID.
IList iListReport = obj.GetClosedReports();
var query = from item in reportStatistic
let sRepType = item.ReportName.Trim()
from taskInfo in iListReport
let reportName = taskInfo.DocumentName.Trim()
where string.Compare(sRepType, reportName, true) == 0
&& taskInfo.ActionID == Convert.ToInt16(ReportAction.Close)
//here's how we make sure we don't get the same documentID twice
//we group by the id and then take the first
group taskInfo by taskInfo.DocumentID into grouping
select grouping.First().DocumentID;
var lastClosedReport = query.ToList();
iClosedreportCount = lastClosedReport.Count;
How to convert a foreach loop to LINQ
Here are some comparisons of your code against LINQ version to help you out if you've got to do a conversion again sometime. Hopefully this will help anyone else out there that has got to convert a foreach loop to LINQ.
1. foreach and from
You can perform a straight swap of the foreach clause for a LINQ from clause. You can see that this:
foreach (ReportStatisticsInfo item in reportStatistic)
has become this:
from item in reportStatistic
2) Variable declaration and the let keyword
When you declare variables within your foreach, you can swap them out for the LINQ let statement. You can see that this declaration:
sRepType = item.ReportName.Trim();
has become:
let sRepType = item.ReportName.Trim()
3) if statements and the where clause
Your if statements can go inside the where clause. You can see that the following two if statements:
if (string.Compare(sRepType, reportName, true) == 0)
if (taskInfo.ActionID == Convert.ToInt16(ReportAction.Close)
have become this where clause
where string.Compare(sRepType, reportName, true) == 0
&& taskInfo.ActionID == Convert.ToInt16(ReportAction.Close)
4) Using group by to remove duplicates.
It's all been quite simple so far because everything has just been a straight swap. The most tricky part is the bit of code where you prevent duplicates from appearing in your result list.
if (taskInfo.ActionID == Convert.ToInt16(ReportAction.Close)
&& !lastClosedReport.Contains(taskInfo.DocumentID))
{
iClosedreportCount += 1;
lastClosedReport.Add(taskInfo.DocumentID);
}
This is tricky because it's the only part that we have to do a bit differently in LINQ.
Firstly we group the 'taskInfo' by the 'DocumentID'.
group taskInfo by taskInfo.DocumentID into grouping
Then we take the first taskInfo from each grouping and get it's ID.
select grouping.First().DocumentID;
A note about Distinct
A lot of people try to use Distinct to get rid of duplicates. This is fine when we're using primitive types, but this can fail when you're using a collection of objects. When you're working with objects Distinct will do a reference comparison of the two objects. This will fail to match objects that are different instances but happen to have the same ID.
If you need to remove duplicates based upon a specific property within an object, then the best approach is to use a group by.
With LINQ you'll get a single IEnumerable<string> with duplicates
from item in reportStatistic
from taskInfo in iiListReport
where (string.Compare(item.ReportName.Trim(), taskInfo.DocumentName.Trim(), true) == 0)
&& taskInfo.ActionID == Convert.ToInt16(ReportAction.Close)
select taskInfo.DocumentID
You can then Distinct().GroupBy(d => d.taskInfo)

How best to remove list items in a loop in C#

Given the code:
var AllItems = new List<CartItem>();
using(var db = new MainContext())
{
foreach (var item in AllItems)
{
if (!db.tblStoreItems.Where(i => i.ID == item.ItemID).Any())
{
AllItems.Remove(item);
}
}
}
Is this the best way to remove an item from the List object in a loop?
I don't think so. If you remove an item from the list on which you are iterating, the results will be securely wrong.
It's best to use an old fashion for - loop in reverse order
using(var db = new MainContext())
{
for(int x = AllItems.Count - 1; x >= 0; x--)
{
var item = AllItems[x];
if (!db.tblStoreItems.Where(i => i.ID == item.ItemID).Any())
{
AllItems.RemoveAt(x);
}
}
}
There are several things that are wrong with the loop approach, the main being - you cannot remove items from the collection you're currently iterating over with foreach - you will get an exception.
Since your main collection is a List<T>, you should use the RemoveAll method that takes in a predicate. You should also simplify your query like this:
AllItems.RemoveAll(item => !db.tblStoreItems.Any(i => i.ID == item.ItemID));
That's wrong (the OP's approach) as rightly suggested by Steve (Steve's way is probably the best in terms of performance),
I prefer to store the 'those to be removed' into a separate list, then you can do e.g.
AllItems = AllItems.Except(Items2Remove);
That is not best from performance way but for me makes things cleaner - you could also combine with LINQ enumerate - e.g. make IEnumerable from the list of records etc.
hope this helps
EDIT: just to clarify as per Steve's response

Linq optimization of query and foreach

I return a List from a Linq query, and after it I have to fill the values in it with a for cycle.
The problem is that it is too slow.
var formentries = (from f in db.bNetFormEntries
join s in db.bNetFormStatus on f.StatusID.Value equals s.StatusID into entryStatus
join s2 in db.bNetFormStatus on f.ExternalStatusID.Value equals s2.StatusID into entryStatus2
where f.FormID == formID
orderby f.FormEntryID descending
select new FormEntry
{
FormEntryID = f.FormEntryID,
FormID = f.FormID,
IPAddress = f.IpAddress,
UserAgent = f.UserAgent,
CreatedBy = f.CreatedBy,
CreatedDate = f.CreatedDate,
UpdatedBy = f.UpdatedBy,
UpdatedDate = f.UpdatedDate,
StatusID = f.StatusID,
StatusText = entryStatus.FirstOrDefault().Status,
ExternalStatusID = f.ExternalStatusID,
ExternalStatusText = entryStatus2.FirstOrDefault().Status
}).ToList();
and then I use the for in this way:
for(var x=0; x<formentries.Count(); x++)
{
var values = (from e in entryvalues
where e.FormEntryID.Equals(formentries.ElementAt(x).FormEntryID)
select e).ToList<FormEntryValue>();
formentries.ElementAt(x).Values = values;
}
return formentries.ToDictionary(entry => entry.FormEntryID, entry => entry);
But it is definitely too slow.
Is there a way to make it faster?
it is definitely too slow. Is there a way to make it faster?
Maybe. Maybe not. But that's not the right question to ask. The right question is:
Why is it so slow?
It is a lot easier to figure out the answer to the first question if you have an answer to the second question! If the answer to the second question is "because the database is in Tokyo and I'm in Rome, and the fact that the packets move no faster than speed of light is the cause of my unacceptable slowdown", then the way you make it faster is you move to Japan; no amount of fixing the query is going to change the speed of light.
To figure out why it is so slow, get a profiler. Run the code through the profiler and use that to identify where you are spending most of your time. Then see if you can speed up that part.
For what I see, you are iterating trough formentries 2 more times without reason - when you populate the values, and when you convert to dictionary.
If entryvalues is a database driven - i.e. you get them from the database, then put the value field population in the first query.
If it's not, then you do not need to invoke ToList() on the first query, do the loop, and then the Dictionary creation.
var formentries = from f in db.bNetFormEntries
join s in db.bNetFormStatus on f.StatusID.Value equals s.StatusID into entryStatus
join s2 in db.bNetFormStatus on f.ExternalStatusID.Value equals s2.StatusID into entryStatus2
where f.FormID == formID
orderby f.FormEntryID descending
select new FormEntry
{
FormEntryID = f.FormEntryID,
FormID = f.FormID,
IPAddress = f.IpAddress,
UserAgent = f.UserAgent,
CreatedBy = f.CreatedBy,
CreatedDate = f.CreatedDate,
UpdatedBy = f.UpdatedBy,
UpdatedDate = f.UpdatedDate,
StatusID = f.StatusID,
StatusText = entryStatus.FirstOrDefault().Status,
ExternalStatusID = f.ExternalStatusID,
ExternalStatusText = entryStatus2.FirstOrDefault().Status
};
var formEntryDictionary = new Dictionary<int, FormEntry>();
foreach (formEntry in formentries)
{
formentry.Values = GetValuesForFormEntry(formentry, entryvalues);
formEntryDict.Add(formEntry.FormEntryID, formEntry);
}
return formEntryDictionary;
And the values preparation:
private IList<FormEntryValue> GetValuesForFormEntry(FormEntry formEntry, IEnumerable<FormEntryValue> entryValues)
{
return (from e in entryValues
where e.FormEntryID.Equals(formEntry.FormEntryID)
select e).ToList<FormEntryValue>();
}
You can change the private method to accept only entryId instead the whole formEntry if you wish.
It's slow because your O(N*M) where N is formentries.Count and M is entryvalues.Count Even with a simple test I was getting more than 20 times slower with only 1000 elements any my type only had an int id field, with 10000 elements in the list it was over 1600 times slower than the code below!
Assuming your entryvalues is a local list and not hitting a database (just .ToList() it to a new variable somewhere if that's the case), and assuming your FormEntryId is unique (which it seems to be from the .ToDictionary call then try this instead:
var entryvaluesDictionary = entryvalues.ToDictionary(entry => entry.FormEntryID, entry => entry);
for(var x=0; x<formentries.Count; x++)
{
formentries[x] = entryvaluesDictionary[formentries[x].FormEntryID];
}
return formentries.ToDictionary(entry => entry.FormEntryID, entry => entry);
It should go a long way to making it at least scale better.
Changes: .Count instead of .Count() just because it's better to not call extension method when you don't need to. Using a dictionary to find the values rather than doing a where for every x value in the for loop effectively removes the M from the bigO.
If this isn't entirely correct I'm sure you can change whatever is missing to suit your work case instead. But as an aside, you should really consider using case for your variable names formentries versus formEntries one is just that little bit easier to read.
There are some reasons why this might be slow regarding the way you use formentries.
The formentries List<T> from above has a Count property, but you are calling the enumerable Count() extension method instead. This extension may or may not have an optimization that detects that you're operating on a collection type that has a Count property that it can defer to instead of walking the enumeration to compute the count.
Similarly the formEntries.ElementAt(x) expression is used twice; if they have not optimized ElementAt to determine that they are working with a collection like a list that can jump to an item by its index then LINQ will have to redundantly walk the list to get to the xth item.
The above evaluation may miss the real problem, which you'll only really know if you profile. However, you can avoid the above while making your code significantly easier to read if you switch how you iterate the collection of formentries as follows:
foreach(var fe in formentries)
{
fe.Values = entryvalues
.Where(e => e.FormEntryID.Equals(fe.FormEntryID))
.ToList<FormEntryValue>();
}
return formentries.ToDictionary(entry => entry.FormEntryID, entry => entry);
You may have resorted to the for(var x=...) ...ElementAt(x) approach because you thought you could not modify properties on object referenced by the foreach loop variable fe.
That said, another point that could be an issue is if formentries has multiple items with the same FormEntryID. This would result in the same work being done multiple times inside the loop. While the top query appears to be against a database, you can still do joins with data in linq-to-object land. Happy optimizing/profiling/coding - let us know what works for you.

Detect entities which have the same children

I have two entities, Class and Student, linked in a many-to-many relationship.
When data is imported from an external application, unfortunately some classes are created in duplicate. The 'duplicate' classes have different names, but the same subject and the same students.
For example:
{ Id = 341, Title = '10rs/PE1a', SubjectId = 60, Students = { Jack, Bill, Sarah } }
{ Id = 429, Title = '10rs/PE1b', SubjectId = 60, Students = { Jack, Bill, Sarah } }
There is no general rule for matching the names of these duplicate classes, so the only way to identify that two classes are duplicates is that they have the same SubjectId and Students.
I'd like to use LINQ to detect all duplicates (and ultimately merge them). So far I have tried:
var sb = new StringBuilder();
using (var ctx = new Ctx()) {
ctx.CommandTimeout = 10000; // Because the next line takes so long!
var allClasses = ctx.Classes.Include("Students").OrderBy(o => o.Id);
foreach (var c in allClasses) {
var duplicates = allClasses.Where(o => o.SubjectId == c.SubjectId && o.Id != c.Id && o.Students.Equals(c.Students));
foreach (var d in duplicates)
sb.Append(d.LongName).Append(" is a duplicate of ").Append(c.LongName).Append("<br />");
}
}
lblResult.Text = sb.ToString();
This is no good because I get the error:
NotSupportedException: Unable to create a constant value of type 'TeachEDM.Student'. Only primitive types ('such as Int32, String, and Guid') are supported in this context.
Evidently it doesn't like me trying to match o.SubjectId == c.SubjectId in LINQ.
Also, this seems a horrible method in general and is very slow. The call to the database takes more than 5 minutes.
I'd really appreciate some advice.
The comparison of the SubjectId is not the problem because c.SubjectId is a value of a primitive type (int, I guess). The exception complains about Equals(c.Students). c.Students is a constant (with respect to the query duplicates) but not a primitive type.
I would also try to do the comparison in memory and not in the database. You are loading the whole data into memory anyway when you start your first foreach loop: It executes the query allClasses. Then inside of the loop you extend the IQueryable allClasses to the IQueryable duplicates which gets executed then in the inner foreach loop. This is one database query per element of your outer loop! This could explain the poor performance of the code.
So I would try to perform the content of the first foreach in memory. For the comparison of the Students list it is necessary to compare element by element, not the references to the Students collections because they are for sure different.
var sb = new StringBuilder();
using (var ctx = new Ctx())
{
ctx.CommandTimeout = 10000; // Perhaps not necessary anymore
var allClasses = ctx.Classes.Include("Students").OrderBy(o => o.Id)
.ToList(); // executes query, allClasses is now a List, not an IQueryable
// everything from here runs in memory
foreach (var c in allClasses)
{
var duplicates = allClasses.Where(
o => o.SubjectId == c.SubjectId &&
o.Id != c.Id &&
o.Students.OrderBy(s => s.Name).Select(s => s.Name)
.SequenceEqual(c.Students.OrderBy(s => s.Name).Select(s => s.Name)));
// duplicates is an IEnumerable, not an IQueryable
foreach (var d in duplicates)
sb.Append(d.LongName)
.Append(" is a duplicate of ")
.Append(c.LongName)
.Append("<br />");
}
}
lblResult.Text = sb.ToString();
Ordering the sequences by name is necessary because, I believe, SequenceEqual compares length of the sequence and then element 0 with element 0, then element 1 with element 1 and so on.
Edit To your comment that the first query is still slow.
If you have 1300 classes with 30 students each the performance of eager loading (Include) could suffer from the multiplication of data which are transfered between database and client. This is explained here: How many Include I can use on ObjectSet in EntityFramework to retain performance? . The query is complex because it needs a JOIN between classes and students and object materialization is complex as well because EF must filter out the duplicated data when the objects are created.
An alternative approach is to load only the classes without the students in the first query and then load the students one by one inside of a loop explicitely. It would look like this:
var sb = new StringBuilder();
using (var ctx = new Ctx())
{
ctx.CommandTimeout = 10000; // Perhaps not necessary anymore
var allClasses = ctx.Classes.OrderBy(o => o.Id).ToList(); // <- No Include!
foreach (var c in allClasses)
{
// "Explicite loading": This is a new roundtrip to the DB
ctx.LoadProperty(c, "Students");
}
foreach (var c in allClasses)
{
// ... same code as above
}
}
lblResult.Text = sb.ToString();
You would have 1 + 1300 database queries in this example instead of only one, but you won't have the data multiplication which occurs with eager loading and the queries are simpler (no JOIN between classes and students).
Explicite loading is explained here:
http://msdn.microsoft.com/en-us/library/bb896272.aspx
For POCOs (works also for EntityObject derived entities): http://msdn.microsoft.com/en-us/library/dd456855.aspx
For EntityObject derived entities you can also use the Load method of EntityCollection: http://msdn.microsoft.com/en-us/library/bb896370.aspx
If you work with Lazy Loading the first foreach with LoadProperty would not be necessary as the Students collections will be loaded the first time you access it. It should result in the same 1300 additional queries like explicite loading.

Replacing nested foreach with LINQ; modify and update a property deep within

Consider the requirement to change a data member on one or more properties of an object that is 5 or 6 levels deep.
There are sub-collections that need to be iterated through to get to the property that needs inspection & modification.
Here we're calling a method that cleans the street address of a Employee. Since we're changing data within the loops, the current implementation needs a for loop to prevent the exception:
Cannot assign to "someVariable" because it is a 'foreach iteration variable'
Here's the current algorithm (obfuscated) with nested foreach and a for.
foreach (var emp in company.internalData.Emps)
{
foreach (var addr in emp.privateData.Addresses)
{
int numberAddresses = addr.Items.Length;
for (int i = 0; i < numberAddresses; i++)
{
//transform this street address via a static method
if (addr.Items[i].Type =="StreetAddress")
addr.Items[i].Text = CleanStreetAddressLine(addr.Items[i].Text);
}
}
}
Question:
Can this algorithm be reimplemented using LINQ? The requirement is for the original collection to have its data changed by that static method call.
Update: I was thinking/leaning in the direction of a jQuery/selector type solution. I didn't specifically word this question in that way. I realize that I was over-reaching on that idea (no side-effects). Thanks to everyone! If there is such a way to perform a jQuery-like selector, please let's see it!
foreach(var item in company.internalData.Emps
.SelectMany(emp => emp.privateData.Addresses)
.SelectMany(addr => addr.Items)
.Where(addr => addr.Type == "StreetAddress"))
item.Text = CleanStreetAddressLine(item.Text);
var dirtyAddresses = company.internalData.Emps.SelectMany( x => x.privateData.Addresses )
.SelectMany(y => y.Items)
.Where( z => z.Type == "StreetAddress");
foreach(var addr in dirtyAddresses)
addr.Text = CleanStreetAddressLine(addr.Text);
LINQ is not intended to modify sets of objects. You wouldn't expect a SELECT sql statement to modify the values of the rows being selected, would you? It helps to remember what LINQ stands for - Language INtegrated Query. Modifying objects within a linq query is, IMHO, an anti-pattern.
Stan R.'s answer would be a better solution using a foreach loop, I think.
I don't like mixing "query comprehension" syntax and dotted-method-call syntax in the same statement.
I do like the idea of separating the query from the action. These are semantically distinct, so separating them in code often makes sense.
var addrItemQuery = from emp in company.internalData.Emps
from addr in emp.privateData.Addresses
from addrItem in addr.Items
where addrItem.Type == "StreetAddress"
select addrItem;
foreach (var addrItem in addrItemQuery)
{
addrItem.Text = CleanStreetAddressLine(addrItem.Text);
}
A few style notes about your code; these are personal, so I you may not agree:
In general, I avoid abbreviations (Emps, emp, addr)
Inconsistent names are more confusing (addr vs. Addresses): pick one and stick with it
The word "number" is ambigious. It can either be an identity ("Prisoner number 378 please step forward.") or a count ("the number of sheep in that field is 12."). Since we use both concepts in code a lot, it is valuable to get this clear. I use often use "index" for the first one and "count" for the second.
Having the type field be a string is a code smell. If you can make it an enum your code will probably be better off.
Dirty one-liner.
company.internalData.Emps.SelectMany(x => x.privateData.Addresses)
.SelectMany(x => x.Items)
.Where(x => x.Type == "StreetAddress")
.Select(x => { x.Text = CleanStreetAddressLine(x.Text); return x; });
LINQ does not provide the option of having side effects. however you could do:
company.internalData.Emps.SelectMany(emp => emp.Addresses).SelectMany(addr => Addr.Items).ToList().ForEach(/*either make an anonymous method or refactor your side effect code out to a method on its own*/);
You can do this, but you don't really want to. Several bloggers have talked about the functional nature of Linq, and if you look at all the MS supplied Linq methods, you will find that they don't produce side effects. They produce return values, but they don't change anything else. Search for the arguments over a Linq ForEach method, and you'll get a good explanation of this concept.
With that in mind, what you probaly want is something like this:
var addressItems = company.internalData.Emps.SelectMany(
emp => emp.privateData.Addresses.SelectMany(
addr => addr.Items
)
);
foreach (var item in addressItems)
{
...
}
However, if you do want to do exactly what you asked, then this is the direction you'll need to go:
var addressItems = company.internalData.Emps.SelectMany(
emp => emp.privateData.Addresses.SelectMany(
addr => addr.Items.Select(item =>
{
// Do the stuff
return item;
})
)
);
To update the LINQ result using FOREACH loop, I first create local ‘list’ variable and then perform the update using FOREACH Loop. The value are updated this way. Read more here:
How to update value of LINQ results using FOREACH loop
I cloned list and worked NET 4.7.2
List<TrendWords> ListCopy = new List<TrendWords>(sorted);
foreach (var words in stopWords)
{
foreach (var item in ListCopy.Where(w => w.word == words))
{
item.disabled = true;
}
}

Categories