Fastest way to compare 2 lists objects [closed]

Fastest way to compare 2 lists objects [closed] - c#

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I allow a user to download some data to csv. They can then edit some columns and then upload it back. I need a speed efficient way to compare certain columns between like objects to see what changed.
Currently I pull the original data from the DB and make it a list so it's all in memory. There is about 100k items so it's not that bad. That part takes less than a second. Then I load in the csv file and put it to list. Both lists have the same class type.
Then I loop over the csv data (as they probably removed some rows which they didn't change but they could still have changed a lot of rows). For each row in the csv list I query the list that came from the DB to find that object. Now I have the csv object and the object from the database as the same structure. Then I run it through a custom object compare function that looks at certain columns to see if anything changed.
If something did change I have to validate what they entered is a valid value by query another reference list for that column. If it's not valid I write it out to an exceptions list. At the end if there are no exceptions I save to db. If there are exceptions I don't save anything and I show them the list of errors.
The detail compare provides a list of columns and the old vs new values that changed. I need this to query the reference list to make sure the new value is valid before I make the change. It's fairly inefficient but it gives great detail to the user about what may be an issue with an upload which is very valuable.
This is very slow. I'm looking for ways to speed it up while still being able to give the user detailed information about why it may have failed so they can correct it.
// get all the new records from the csv
var newData = csv.GetRecords<MyTable>().ToArray();
// select all data from database to list
var origData = ctx.MyTable.Select(s => s).ToList();
// look for any changes in the new data and update the database. note we are looping over the new data so if they removed some data from the csv file it just won't loop over those and they won't change
foreach (var d in newData)
{
// find data so we can compare between new (csv) and current (from db) to see what possibly changed
var oData = (from o in origData
where o.id == d.id
select o).FirstOrDefault();
// only the columns in the updatableColumns list are compared
var diff = d.DetailedCompare(oData, comparableColumns.ToList());
if (diff.Count > 0)
{
// even though there are differences between the csv record and db record doesn't mean what the user input is valid. only existing ref data is valid and needs to be checked before a change is made
bool changed = false;
// make a copy of this original data and we'll check after if we actually were able to make a change to it (was the value provided valid)
var data = CopyRecord(oData);
// update this record's data fields that have changed with the new data
foreach (var v in diff)
{
// special check for setting a value to NULL as its always valid to do this but wouldn't show up in ref data to pass the next check below
if (v.valA == null)
{
oData.GetType().GetProperty(v.Prop).SetValue(oData, v.valA);
oData.UpdatedBy = user;
oData.UpdatedDate = DateTime.Now;
changed = true;
}
// validate that the value for this column is in the ref table before allowing an update. note exception if not so we can tell the user
else if (refData[v.Prop].Where(a => a.value == v.valA.ToString()).FirstOrDefault() != null)
{
// update the current objects values with the new objects value as it changed and is a valid value based on the ref data defined for that column
oData.GetType().GetProperty(v.Prop).SetValue(oData, v.valA);
oData.UpdatedBy = user;
oData.UpdatedDate = DateTime.Now;
changed = true;
}
else
{
// the value provided isn't valid for this column so note this to tell the user
exceptions.Add(string.Format("Error: ID: {0}, Value: '{1}' is not valid for column [{2}]. Add the reference data if needed and re-import.", d.id, v.valA, v.Prop));
}
}
// we only need to reattach and save off changes IF we actually changed something to a valid ref value and we had no exceptions for this record
if (changed && exceptions.Count == 0)
{
// because our current object was in memory we will reattached it to EF so we can mark it as changed and SaveChanges() will write it back to the DB
ctx.MyTable.Attach(oData);
ctx.Entry(oData).State = EntityState.Modified;
// add a history record for the change to this product
CreateHistoryRecord(data, user);
}
}
}
// wait until the very end before making DB changed. we don't save anything if there are exceptions or nothing changed
if (exceptions.Count == 0)
{
ctx.SaveChanges();
}

The first big win would be to put your data in a dictionary so you can get to the desired value quickly by ID, without having to search for the object through thousands of objects. I'm pretty sure it'll be faster.
Beyond that I suggest you run your code through a profiler to determine exactly which parts are the slowest. It's entirely possible that DetailedCompare() does something that's terribly slow but may not be obvious.

One thing to consider is having asynchronous compares and or asynchronous if (diff,Count > 0) at least the latter assuming that there are a few random changes why wait for all the copying and reflection. Put it in a seperatge function and have run parallel.

Related

Remove list values based on series of other values

I have a situation wherein a List object is built off of values pulled from a MSSQL database. However, this particular table is mysteriously getting an errant record or two tossed in. Removing the records cause trouble even though they have no referential links to any other tables, and will still get recreated without any known user actions taken. This causes some trouble as it puts unwanted values on display that add a little bit of confusion. The specific issue is that this is a platform that allows users to run a search for quotes, and the filtering allows for sales rep selection. The select/dropdown field is showing these errant values, and they need to be removed.
Given that deleting the offending table rows does not provide a desirable result, I was thinking that maybe the best course of action was to modify the code where the List object is created and either filter the values out or remove them after the object is populated. I'd like to do this in a clean, scalible fashion by providing some kind of appendable data object where I could just add in a new string value if something else cropped up as opposed to doing something clunky that adds new code to find the value and remove it each time.
My thought was to create a string array, and somehow loop through that to remove bad List values, but I wasn't entirely certain that was the best way to approach this, and I could not for the life of me think of a clean approach for this. I would think that the best way would be to add a filter within the Find arguments, but I don't know how to add in an array or list that way. Otherwise I figured to loop through the values either before or after the sorting of the List and remove any matches that way, but I wasn't sure that was the best choice of actions.
I have attached the current code, and would appreciate any suggestions.
int licenseeID = Helper.GetLicenseeIdByLicenseeShortName(Membership.ApplicationName);
List<User> listUsers;
if (Roles.IsUserInRole("Admin"))
{
//get all users
listUsers = User.Find(x => x.LicenseeID == licenseeID).ToList();
}
else
{
//get only the current user
listUsers = User.Find(x => (x.LicenseeID == licenseeID && x.EmailAddress == Membership.GetUser().Email)).ToList();
}
listUsers.Sort((x, y) => string.Compare(x.FirstName, y.FirstName));
-- EDIT --
I neglected to mention that I did not develop this, I merely inherited its maintenance after the original developer(s) disappeared, and my coworker who was assigned to it left the company. I'm not really really skilled at handling ASP.NET sites. Many object sources are hidden and unavailable for edit, I assume due to them being defined in a DLL somewhere. So, for any of these objects that are sourced from database tables, altering the tables will not help, since I would not be able to get the new data anyway.
However, I did try to do the following to filter out the undersirable data:
List<String> exclude = new List<String>(new String[] { "value1" , "value2" });
listUsers = User.Find(x => x.LicenseeID == licenseeID && !exclude.Contains(x.FirstName)).ToList();
Unfortunately it only resulted in an error being displayed to the page.
-- EDIT #2 --
I got the server setup to accept a new event viewer source so I could write info to the Application log to see what was happening. Looks like this installation of ASP.NET does not accept "Contains" as an action on a List object. An error gets kicked out stating that the method is not available.

I will probably add a bit to the table and flag Errant rows and then skip them when I query the table, something like
&& !ErrantData
Other way, that requires a bit more upkeep but doesn't require db change, would be to keep a text file that gets periodically updated and you read it and remove users from list based on it.
The bigger issue is unknown rows creeping in your database. Changing user credentials and adding creation timestamps may help you narrow down the search scope.

Best way to update a collection of SQL data

I am writing a .net/entity framework code snippet that's supposed to update/delete a bunch of MS SQL rows with the latest data passed from UI.
Say the table originally has 20 rows and the latest collection contains 15 records. Out of the 15, 9 have changes and 6 remain the same. So 9 rows will be updated, and the 5 rows that are not in the latest collection, will be deleted from the table.
So my question is, what's the best way of doing this: If I iterate over all 20 rows and try to find each of them, it would be O(mn). Deleting all table rows and re-insert them may be faster but I am not sure.
All help appreciated!

So you have a user interface element filled with items of some class. You also have a database with a table filled with items of the same class.
IEnumerable<MyClass> userInterfaceElements = ...
IQueryable<MyClass> databaseElements = ...
Note: the query is not executed yet!
You want to update the database such, that after your update your database contains the items from your user interface elements.
User interface elements that are not in the database yet will be added
Database elements that are not in the user interface need to be removed
User interface elements that are also in the database need to be updated.
You didn't write how you decide whether a user interface element is also in the database.
Let's assume you don't invent primary keys. This means that elements with a default value (zero) for your primary key are elements that are not in the database.
var itemsToAdd = userInterfaceElements.Where(row => row.Id == 0);
var itemsToUpdate = userInterfaceElements.Where(row => row.Id != 0);
var idsItemsToKeep = itemsToUpdate.Select(row => row.Id);
var itemsToRemove = databaseElements.Where(row => !idsItemsToKeep.Contains(row.Id))
The last one: remove all items that have an Id that is not in your user interface elements anymore.
Note: we still have not executed any query!
Adding the items to your database will change databaseElements, so before you make any changes you need to materialize the items
var addList = itemsToAdd.ToList();
var updateList = itemsToUpdate.ToList();
var removeList = itemsToRemove.ToList();
By now you've queried your database exactly once: you fetched all items to remove. You can't order entity framework to remove items without fetching them first.
dbContext.MyClasses.RemoveRange(removeList);
dbContext.MyClasses.AddRange(addList);
To update in entity framework, the proper method would be to fetch the data and then change the properties.
Some people prefer to attach the items to the dbContext's change tracker and tell that it is changed. This might be dangerous however, if someone else has changed some properties of these items, especially if you don't show these values in your user interface elements. So only do this if you really have a long list of items to update.
Proper way:
foreach(var itemToUpdate in updateList)
{
var fetchedItem = dbContext.MyClasses.Find(itemToUpdate.Id);
// TODO: update changed properties of the fetchedItem with values from itemToUpdate
}
Dangerous method:
foreach(var itemToUpdate in updateList)
{
dbContext.Entry(itemToUpdate).State = entityState.Modified;
}
Finally:
dbContext.SaveChanges();
Improved delete method
You've got a problem when you filled your user interface element with database values, and some other process removed one of these values from your database.
When your code looks at the primary key, it will think it is in the database, however it is not there anymore. What to do with this element? Add it again? Act as if the user also wanted it to be deleted?
To solve this kind of problems, quite often people don't delete items from their database, but declare them obsolete instead. They add a Boolean column to the table that indicates whether the item is to be deleted in the near future. This solves the problem that people want to update items while others want them to be removed.
Regularly, every month or so, a process is started to remove all obsolete objects. The chance that you want to update an obsolete object are much lower.
If this needs to be full save: don't remember a Boolean obsolete, but the obsolete date. Periodically remove all items that are obsolete for a longer time.
The nice thing about the obsolete, is that if someone declared an item obsolete by accident, there is still some time to repair this.

using a c# foreach on List<T> to check for default date values

I initially wanted to quickly select certain records (using a Linq qry) which match a default date condition (01/01/0001) -
var defDates = (from rec in myRecords where rec.myDate == System.DateTime.MinValue select rec.myDate);
and then update those record(s) to a Null value. However, since those returned records have only a get, it means they are read-only (makes sense since I'm using Linq).
So the best I could come up with is a c# foreach which will catch default dates and sets to Null (i.e. I'm exporting to Excel via Aspose.cells, so I need those cell values to just be blank):
foreach (myExportClass rec in myRecords)
{
if (rec.myDate == System.DateTime.MinValue)
{
rec.myDate = null;
}
};
Okay, this works, and exports the required Null value(s) to Excel (a blank cell, actually).
However, the question is: could I find a more efficient way of handling this default date scenario ? Seems to me that a foreach() may waste resources if there are several thousand records coming back.
Fyi: the records are coming back from the backend this way, so I need to deal with this in c#.
thanks in advance

How to read all new rows from database?

I am trying to read all new rows that are added to the database on a timer.
First I read the entire database and save it to a local data table, but I want to read all new rows that are added to the database. Here is how I'm trying to read new rows:
string accessDB1 = string.Format("SELECT * FROM {0} ORDER BY ID DESC", tableName);
setupaccessDB(accessDB1);
int dTRows = localDataTable.Rows.Count + 1;
localDataTable.Rows.Add();
using (readNext = command.ExecuteReader())
{
while (readNext.Read())
{
for (int xyz = 0; xyz < localDataTable.Columns.Count; xyz++)
{
// Code
}
break;
}
}
If only 1 row is added within the timer then this works fine, but when multiple rows are added this only reads the latest row.
So is there any way I can read all added rows.
I am using OledbDataReader.
Thanks in advance

For most tables the primary key is based an incremental value. This can be a very simple integer that is incremented by one, but it could also be a datetime based guid.
Anyway if you know the id of the last record. You can simple ask for all records that have a 'higher' id. In that way you do get the new records, but what about updated records? If you also want those you might want to use a column that contains a datetime value.
A little bit more trickier are records that are deleted from the database. You can't retrieve those with a basic query. You could solve that by setting a TTL for each record you retrieve from the database much like a cache. When the record is 'expired', you try to retrieve it again.
Some databases like Microsoft SQL Server also provide more advanced options into this regard. You can use query notifications via the broker services or enable change tracking on your database. The last one can even indicate what was the last action per record (insert, update or delete).

Your immediate problem lies here:
while (readNext.Read())
{
doSomething();
break;
}
This is what your loop basically boils down to. That break is going to exit the loop after processing the first item, regardless of how many items there are.
The first item, in this case, will probably be the last one added (as you state it is) since you're sorting by descending ID.
In terms of reading only newly added rows, there are a variety of ways to do it, some which will depend on the DBMS that you're using.
Perhaps the simplest and most portable would be to add an extra column processed which is set to false when a row is first added.
That way, you can simply have a query that looks for those records and, for each, process them and set the column to true.
In fact, you could use triggers to do this (force the flag to false on insertion) which opens up the possibility for doing it with updates as well.
Tracking deletions is a little more difficult but still achievable. You could have a trigger which actually writes the record to a separate table before deleting it so that your processing code has access to those details as well.

The following works
using (readNext = command.ExecuteReader())
{
while (readNext.Read())
{
abc = readNext.FieldCount;
for (int s = 1; s < abc; s++)
{
var nextValue = readNext.GetValue(s);
}
}
}
The For Loop reads the current row and then the While Loop moves onto the next row

MVC comparing changes to existing values

I've currently got a bit of an issue when trying to check values that have been posted as part of an update, to what is currently being held in the database.
Currently what i'm doing is reading the existing record into a new variable, alongside the one passed in from the post, and checking values in that variable. However I've just noticed that as soon as I read the record from the database, the values passed in from the post get reset to their previous value.
I have a feeling that the reason is that the posted record and the retrieved record both have the same primary key value, and so the code is then overwriting the new values as there can't be two different objects with the same primary key in memory. Though that is just a guess.
Can someone help me with this issue, and possibly help me find a way to get around this?
EDIT:
My code is below. This is the main code, and as soon as I retrieve the "original record", the values in the "faultrecord" get reverted to what they were before
[HttpPost]
public ActionResult Edit(fault faultrecord)
{
fault originalRecord = _faultRepository.Read(faultrecord.ID); /*here is where it gets reverted*/
if (faultrecord != originalRecord)
{
/*perform actions and update record*/
}
}
The below code is what I use to read the record from the database:
public fault Read(int id)
{
var result = ITJobEntities.faults.Where(t => t.ID == id);
if (result.Any())
{
return result.FirstOrDefault();
}
return null;
}
My only reason for believing it to be to do with the primary keys is because when I added in the "originalRecord" retrieval, my update statement to the database started failing due to there being multiple objects with the same ID (the actual error was a bit more descriptive, but I can't remember it fully).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Fastest way to compare 2 lists objects [closed] - c#

One thing to consider is having asynchronous compares and or asynchronous if (diff,Count > 0) at least the latter assuming that there are a few random changes why wait for all the copying and reflection. Put it in a seperatge function and have run parallel.

Related

Remove list values based on series of other values

Best way to update a collection of SQL data

using a c# foreach on List<T> to check for default date values

How to read all new rows from database?

MVC comparing changes to existing values

Categories

Resources