Iterating through two identical data sources - c#

I have data with the same schema in a pipe delimited text file and in a database table, including the primary key column.
I have to check if each row in the file is present in the table, if not generate an INSERT statement for that row.
The table has 30 columns, but here I've simplified for this example:
ID Name Address1 Address2 City State Zip
ID is the running identity column; so if a particular ID value from the file is found in the table, there should be no insert statement generated for that.
Here's my attempt, which doesn't feel correct:
foreach (var item in RecipientsInFile)
{
if (!RecipientsInDB.Any(u => u.ID == item.ID ))
{
Console.WriteLine(GetInsertSql(item));
}
}
Console.ReadLine();
EDIT: Sorry, I missed the asking the actual question; how to do this?
Thank you very much for all the help.
EDIT: The table has a million plus rows, while the file has 50K rows. This a one time thing, not a permanent project.

I would add all the RecipientsInDB Ids in a HashSet and then test if the set contains the item Id.
var recipientsInDBIds = new Hashset(RecipientsInDB.Select(u => u.ID));
foreach (var item in RecipientsInFile)
{
if (!recipientsInDBIds.Contains(item.ID ))
{
Console.WriteLine(GetInsertSql(item));
}
}
Console.ReadLine();

Try comparing the ID lists using .Except()
List<int> dbIDs = Recipients.Select(x=>x.ID).ToList();
List<int> fileIDs = RecipientsFile.Select(x=>x.ID).ToList();
List<int> toBeInserted = fileIDs.Except(dbIDs).ToList();
toBeInserted.ForEach(x=>GetInsertSqlStatementForID(x));
For the pedantic and trollish among us in the comments, please remember the above code (like any source code you find on the interwebs) shouldn't be copy/pasted into your production code. Try this refactoring:
foreach (var item in RecipientsFile.Select(x=>x.ID)
.Except(DatabaseRecipients.Select(x=>x.ID)))
{
GetInsertSqlStatementForID(item);
}

Lots of ways of accomplishing this. Yours is one way.
Another would be to always generate SQL, but generate it in the following manner:
if not exists (select 1 from Recipients where ID == 1234)
insert Recipients (...) values (...)
if not exists (select 1 from Recipients where ID == 1235)
insert Recipients (...) values (...)
Another would be to retrieve the entire contents of the database into memory beforehand, loading the database IDs into a HashSet, then only checking that HashSet to see if it exists - would take a little longer to get started, but would be faster for each record.
Any of these three techniques would work - it all depends on how big your database table is, and how big your file is. If they're both relatively small (maybe 10,000 records or so), then any of these should work fine.
EDIT
And there's always option D: Insert all records from the file into a temporary table (could be a real table or a SQL temp table, doesn't really matter) in the database, then use SQL to join the two tables together and retrieve the differences (using not exists or in or whatever technique you want), and insert the missing records that way.

Related

How to return a column of the record deleted when deleting a record in EF?

I'm working on reducing managed stored procedures and have come across this not knowing how (or if) it can be successfully ported to Entity Framework:
-- select out filename so the caller can delete the file from the filesystem
SELECT SystemFileName
FROM File
WHERE FileId = #fileId
-- now delete
DELETE
FROM File
WHERE FileId = #fileId
Basically, the proc takes an id, returns the file name of the record it will be deleting and then deletes the row. The calling code then has the file name to perform any file system clear up reducing the potential for orphan files.
Now using Entity Framwork, I could just find all the file and perform the delete but if I was doing this in a loop, it would be terribly inefficient.
I could try to document my attempts but it's more pseudocode than anything that will compile.
// Select all the File objects from EF DbSet<File> that you want to delete and store in a List<File>.
var toDelete = context.Files.Where(predicate).ToList();
//loop through the list and remove each one or i think there is a RemoveRange method.
var deleted = context.Files.RemoveRage(toDelete);
//SaveChanges on your DbContext
if(context.SaveChanges() == toDelete.Count) {
//and then you can loop through your List and delete the files from system
deleted.ToList().ForEach(file => {
var fileInfo = new FileInfo(file.SystemFileName);
if(fileInfo.Exists) {
fileInfo.Delete();
}
});
}
Assuming that you have less than 4000 records to delete why not
1) Select the ID and File Name you want to delete
2) Store these results in an array / list
3) Delete all records using the IN clause
// get all of the records to be deleted
var records = context.Files.Where(x => x.Name.Contains("delete-me")).Select(x => new {x.Id, x.Name}).ToList();
// mark all of our files to be deleted
context.Files.RemoveAll(x => records.Contains(x.Id)); // not sure if this line is correct as I am writing this in notepad, but it will give you a good enough idea
// execute our save, which will delete our records
context.Save();
// return the list of records that have been deleted back to the caller
return records;
To be honest, I would consider this a bulk operation and not a great candidate for EF (or other ORMs for that matter). I'm not sure which is more important to you, simplifying the query, or using EF. But if you want to simplify the query, you can execute something like this and read the affected deleted values at the same time
DELETE
OUTPUT deleted.SystemFileName -- or OUTPUT deleted.*
FROM File
WHERE FileId = #fileId

Which is faster between Linq to Sql And SQl Query

I have List of object like this
List<Product> _products;
Then I get productId input and search in this list like this
var target = _peoducts.Where(o => o.productid == input).FirstOrDefault();
my Question is
If This list have 100 Products (productId from 1 to 100) and an
input I get productId = 100. that mean this Method must loop for 100
time Right ? (If I ORDER BY productId ASC in Query)
Between use this Method and Query on Database with where clause like
this WHERE productId = #param
Thank you.
No. If there is an index with key productId it finds the correct row with O(log n) operations
Just implement both methods and take the time. (hint: use StopWatch() class)
Edit
To get the full performance you should not create an intermediate (unsorted) List<T> but put all your logic in a LINQ query which operates on the SQL Server.
#might be helpful to get your answer.
https://www.linqpad.net/WhyLINQBeatsSQL.aspx
If you execute that Where on a List<Product>, then:
you got all 100 rows from the database
and then looped through all products in memory until you found the one that matches or until you went through the entire list and found nothing.
If, on the other hand, you used an IQueryable<Product> that was connected to the database table, then:
You wouldn't have read anything from the database yet
When you apply the Where, you still wouldn't read anything
When you apply the FirstOrDefault a sql query is constructed to find just the one row you need. Given correct indexes on the table, this would be quite fast.

Appending and deleting linked tables in access using c# issue

I have a piece of code that goes through all the linked tables and tables in an access database and for every table(all linked in this case) that matches a certain criteria it should add a new table and delete the old. The new is on a sql server database and the old the oracle, however this is irrelevant. The code is:
var dbe = new DBEngine();
Database db = dbe.OpenDatabase(#"C:\Users\x339\Documents\Test.accdb");
foreach (TableDef tbd in db.TableDefs)
{
if (tbd.Name.Contains("CLOASEUCDBA_T_"))
{
useddatabases[i] = tbd.Name;
string tablename = CLOASTableDictionary[tbd.Name];
string tablesourcename = CLOASTableDictionary[tbd.Name].Substring(6);
var newtable = db.CreateTableDef(tablename.Trim());
newtable.Connect = "ODBC;DSN=sql server copycloas;Trusted_Connection=Yes;APP=Microsoft Office 2010;DATABASE=ILFSView;";
newtable.SourceTableName = tablesourcename;
db.TableDefs.Append(newtable);
db.TableDefs.Delete(tbd.Name);
i++;
}
}
foreach (TableDef tbd in db.TableDefs)
{
Console.WriteLine("After loop "+tbd.Name);
}
There are 3 linked tables in this database 'CLOASEUCDBA_T_AGENT', 'CLOASEUCDBA_T_CLIENT' and 'CLOASEUCDBA_T_BASIC_POLICY'. The issue with the code is that it updates the first two tables perfectly but for some unknown reason, it never finds the third. Then in the second loop, it prints it out... it seems to just skip over 'CLOASEUCDBA_T_BASIC_POLICY'. I really dont know why. The weird thing is then that if run the code again, it will change 'CLOASEUCDBA_T_BASIC_POLICY'. Any help would be greatly appreciated.
Modifying a collection while you are iterating over it can sometimes mess things up. Try using a slightly different approach:
Iterate over the TableDefs collection and build a List (or perhaps a Dictionary) of the items you need to change. Then,
Iterate over the List and update the items in the TableDefs collection.

Possibilities and nested foreach loops C#

I am writing a program that can calculate all possible places for people sitting at a random amount of tables (can go up to 1000+):
As you see on the image, the black dots represent people (but in the computer, there are different types of people.)
There are two types of tables : blue and pink ones. The blue ones can contain 3 people and the pink 2 people.
To get all possible places for any person to sit I could use foreach loops (8of them) and then I can run some extra code...
But what happens if I add 200 tables? Then do I need to use 200 foreach loops?
Is there any way that this can be coded faster and less-space-consuming-coded?
What I tried? =>
switch(listoftables.Count)
{
case 1:foreach(Table table in listoftables){ //code to add people to this table}break;
case 2: foreach(Table table1 in listoftables)
{foreach(Table table1 in listoftables){//code to add people to this table
}}break;
}
INPUT : array with editable Table class objects (its a class created by myself)
PROCESS : the above List is edited and is added to another List object, where after the whole foreach process has ended, the OUTPUT will write all possible configurations (who are in the other List object) to the screen.
Example part of output :
// List<Table> listofalltables was processed
List<listofalltables> output
=> contains as [0] in array : List first
=> contains as [0] in array : Table.attachedpeople (is list)
Try a recursive method. A small example :
public List<Table> assignToTable(List<Person> invited, List<Table> tables)
{
if(!tables.HasRoom)
return tables;
else
{
assign(tables,invited) //code to add a person to a table
assignToTable(invited, tables);
}
}
If I were you I'll create a object taht represent you tables with a propertie to know if there is still some room avaiblable. This will assign to every people a table without any foreach.
Then in you main you could have a method that will rearrange the tables in all the way possible :
Table 1
Table 2
Table 3
Then
Table 1
Table 3
Table 2
...
Table 3
Table 2
Table 1
and call the recursive method on those lists and you will have all the possibility where poeple can sit...
Taken #Guigui's answer and changed it to how I interpret the question. This will try to seat everyone everywhere (except when there are more people than chairs, is that a case? I assumed more chairs than people) by recursion and loops, as you see the complexity will be of the form O(Math.Power(nrPeople, nrSeats)) which is a lot (if I'm not mistaken).
Person[] peopleInvited = ....; // Avoid copying this, we are not modifying it
public void AssignToTable(int invited, SeatingArrangements tables)
{
if(!tables.HasRoom || invited == peopleInvited.Length)
// Do what? Print the seating?
else
{
var personToSeat = peopleInvited[invited];
foreach (var possibleSeating in tables.GetEmptyChairs())
{
// Add one person to a table somewhere, but don't modify tables
var newArrangments = Assign(possibleSeating, personToSeat, tables)
AssignToTable(invited + 1, newArrangements);
}
}
}
Well, it more a question about math than programming. What you are trying to do is creating permutations of people. Basically you have N tables and 2N+2 seats. You can assign a number to each seat. Then the result will be the set of K-permutations of 2N+2, where K is the number of people invited, and N is the number of tables.
You can do this using loops, but you can also do it recursively. There are algorithms ready to use out there. For example:
Algorithm to generate all possible permutations of a list?

Using Linq for comparison of two Datatables

I am developing a C# ASP.NET web application. I have data being pulled from two databases. One is the database that holds all of our actual data, the second is to be used so that users of the site can save "favorites" and easily find this data later. The databases have the following columns:
Table1:
itemid, itemdept, itemdescription
Table2:
userid, itemid, itemdept, itemdescription
If the item is present in table2 (the user has already added it), I want to mark the item as removable if it comes up again in a search, and addable if it has is not yet in their favorites.
I've got data from both pulled into datatables so I can compare them, but I feel that using a nested foreach loops will be too tedious as the query is set to return a max of 300 results. Also to do that, I have to put a bool value in one of the tables to mark that it was found, so this seems messy.
I have read up a little on Linq, but can't find anything exactly like this scenario. Could I use Linq to accomplish such a thing? Below is an (admittedly crude) image of the search results page that may help get a better grasp on this. In the real deal, the Add and Remove links will be imagebuttons.
Forgot to ever post the solution to this one, but I went with the HashSet setup, with one loop to compare. Thank you everyone for your comments.
if (User.Identity.IsAuthenticated)
{
DataColumn dc = new DataColumn("isMarked", System.Type.GetType("System.Int32"));
ds.Tables[0].Columns.Add(dc);
string[] strArray = ds.Tables[0].AsEnumerable().Select(s => s.Field<string>("itemid")).ToArray<string>();
HashSet<string> hset = new HashSet<string>(strArray);
foreach (DataRow dr in ds.Tables[0].Rows)
{
if (hset.Contains(dr["itemid"].ToString().Trim()))
dr[3] = 1;
else
dr[3] = 0;
}
}

Categories