Seeking a less costly solution for matching Ids of two tables - c#

The application I am building allows a user to upload a .csv file containing multiple rows and columns of data. Each row contains a unique varchar Id. This will ultimately fill in fields of an existing SQL table where there is a matching Id.
Step 1: I am using LinqToCsv and a foreach loop to import the .csv fully into a temporary table.
Step 2: Then I have another foreach loop where I am trying to loop the rows from the temporary table into an existing table only where the Ids match.
Controller Action to complete this process:
[HttpPost]
public ActionResult UploadValidationTable(HttpPostedFileBase csvFile)
{
var inputFileDescription = new CsvFileDescription
{
SeparatorChar = ',',
FirstLineHasColumnNames = true
};
var cc = new CsvContext();
var filePath = uploadFile(csvFile.InputStream);
var model = cc.Read<Credit>(filePath, inputFileDescription);
try
{
var entity = new TestEntities();
var tc = new TemporaryCsvUpload();
foreach (var item in model)
{
tc.Id = item.Id;
tc.CreditInvoiceAmount = item.CreditInvoiceAmount;
tc.CreditInvoiceDate = item.CreditInvoiceDate;
tc.CreditInvoiceNumber = item.CreditInvoiceNumber;
tc.CreditDeniedDate = item.CreditDeniedDate;
tc.CreditDeniedReasonId = item.CreditDeniedReasonId;
tc.CreditDeniedNotes = item.CreditDeniedNotes;
entity.TemporaryCsvUploads.Add(tc);
}
var idMatches = entity.PreexistingTable.Where(x => x.Id == tc.Id);
foreach (var number in idMatches)
{
number.CreditInvoiceDate = tc.CreditInvoiceDate;
number.CreditInvoiceNumber = tc.CreditInvoiceNumber;
number.CreditInvoiceAmount = tc.CreditInvoiceAmount;
number.CreditDeniedDate = tc.CreditDeniedDate;
number.CreditDeniedReasonId = tc.CreditDeniedReasonId;
number.CreditDeniedNotes = tc.CreditDeniedNotes;
}
entity.SaveChanges();
entity.Database.ExecuteSqlCommand("TRUNCATE TABLE TemporaryCsvUpload");
TempData["Success"] = "Updated Successfully";
}
catch (LINQtoCSVException)
{
TempData["Error"] = "Upload Error: Ensure you have the correct header fields and that the file is of .csv format.";
}
return View("Upload");
}
The issue in the above code is that tc is inside the first loop, but the matches are defined after the loop with var idMatches = entity.PreexistingTable.Where(x => x.Id == tc.Id);, so I am only getting the last item of the first loop.
If I nest the second loop then it is way to slow (stopped it after 10 minutes) because there are roughly 1000 rows in the .csv and 7000 in the preexisting table.
Finding a better way to do this is plaguing me. Pretend that the temporary table didn't even come from a .csv and just think about the most efficient way to fill in rows in table 2 from table 1 where the id of that row matches. Thanks for your help!

As your code is written now, much of the work is being done by the application that could much more efficiently be done by SQL Server. You are making hundreds of unnecessary roundtrip calls to the database. When you are mass importing data you want a solution like this:
Bulk import the data. See this answer for helpful guidance on bulk import efficiency with EF.
Join and update destination table.
Processing the import should only require a single mass update query:
update PT set
CreditInvoiceDate = CSV.CreditInvoiceDate
,CreditInvoiceNumber = CSV.CreditInvoiceNumber
,CreditInvoiceAmount = CSV.CreditInvoiceAmount
,CreditDeniedDate = CSV.CreditDeniedDate
,CreditDeniedReasonId = CSV.CreditDeniedReasonId
,CreditDeniedNotes = CSV.CreditDeniedNotes
from PreexistingTable PT
join TemporaryCsvUploads CSV on PT.Id = CSV.Id
This query would replace your entire nested loop and apply the same update in a single database call. As long as your table is indexed properly this should run very fast.

After saving CSV record into second table which have same fileds as your primary table, execute following procedure in sqlserver
create proc [dbo].[excel_updation]
as
set xact_abort on
begin transaction
-- First update records
update first_table
set [ExamDate] = source.[ExamDate],
[marks] = source.[marks],
[result] = source.[result],
[dob] = source.[dob],
[spdate] = source.[spdate],
[agentName] = source.[agentName],
[companycode] = source.[companycode],
[dp] = source.[dp],
[state] = source.[state],
[district] = source.[district],
[phone] = source.[phone],
[examcentre] = source.[examcentre],
[examtime] = source.[examtime],
[dateGiven] = source.[dateGiven],
[smName] = source.[smName],
[smNo] = source.[smNo],
[bmName] = source.[bmName],
[bmNo] = source.[bmNo]
from tbUser
inner join second_table source
on tbUser.[UserId] = source.[UserId]
-- And then insert
insert into first_table (exprdate, marks, result, dob, spdate, agentName, companycode, dp, state, district, phone, examcentre, examtime, dateGiven, smName, smNo, bmName, bmNo)
select [ExamDate], [marks], [result], [dob], [spdate], [agentName], [companycode], [dp], [state], [district], [phone], [examcentre], [examtime], [dateGiven], [smName], [smNo], [bmName], [bmNo]
from second_table source
where not exists
(
select *
from first_table
where first_table.[UserId] = source.[UserId]
)
commit transaction
delete from second_table
The condition of this code is only that both table must have same id matching data. Which id match in both table, data of that particular row will be updated in first table.

As long as the probability of the match is high you should simply attempt update with every row from your CSV, with a condition that the id matches,
UPDATE table SET ... WHERE id = #id

Related

Select SQL Query on Controller - ASP.NET MVC 5

What I'm trying to do is a report bringing values of a table, but just if this table have specific selected client and others specific properties
Activation Report, follow a part of the code that already is working
var sql = "SELECT * FROM ativacao WHERE id_cliente = " + id_cliente;
var itemAtivacao = db.ativacao.SqlQuery(sql).ToList();
foreach (var item in itemAtivacao)
{
dt.Rows.Add(item.id, item.codigo, item.cliente.nome);
}
Until here Ok, easy. but I need to bring elements of table cliente, of other table comparing with a column. Damnt, it is boring.
Apart from the specifit client, I need to:
var id_executivo = Convert.ToInt32(Request.Form["id_executivo"]);
var id_prospector = Convert.ToInt32(Request.Form["id_prospector"]);
var id_cluster = Convert.ToInt32(Request.Form["id_cluster"]);
var id_status = Convert.ToInt32(Request.Form["id_status"]);
var id_cliente = Convert.ToInt32(Request.Form["id_cliente"]);
This is Columns of table Client. I'm thinking about INNER JOIN. is it???
summing up - I need to bring ativacoes of specific cliente if this cliente contains ......
Someone help me how can I do it please. Thanks so much.

How do I use LINQ to update a datatable with a SqlDataReader?

I am trying to merge data from two separate queries using C#. The data is located on separate servers or I would just combine the queries. I want to update the data in one of the columns of the first data set with the data in one of the columns of the second data set, joining on a different column.
Here is what I have so far:
ds.Tables[3].Columns[2].ReadOnly = false;
List<object> table = new List<object>();
table = ds.Tables[3].AsEnumerable().Select(r => r[2] = reader.AsEnumerable().Where(s => r[3] == s[0])).ToList();
The ToList() is just for debugging. To summarize, ds.Tables[3].Rows[2] is the column I want to update. ds.Tables[3].Rows[3] contains the key I want to join to.
In the reader, the first column contains the matching key to ds.Tables[3].Rows[3] and the second column contains the data with which I want to update ds.Tables[3].Rows[2].
The error I keep getting is
Unable to cast object of type 'WhereEnumerableIterator1[System.Data.IDataRecord]' to type 'System.IConvertible'.Couldn't store <System.Linq.Enumerable+WhereEnumerableIterator1[System.Data.IDataRecord]> in Quoting Dealers Column. Expected type is Int32.
Where am I going wrong with my LINQ?
EDIT:
I updated the line where the updating is happening
table = ds.Tables[3].AsEnumerable().Select(r => r[2] = reader.AsEnumerable().First(s => r[3] == s[0])[1]).ToList();
but now I keep getting
Sequence contains no matching element
For the record, the sequence does contain a matching element.
You can use the following sample to achieve the join and update operation. Let's suppose there are two Datatables:
tbl1:
tbl2:
Joining two tables and updating the value of column "name1" of tbl1 from column "name2" of tbl2.
public DataTable JoinAndUpdate(DataTable tbl1, DataTable tbl2)
{
// for demo purpose I have created a clone of tbl1.
// you can define a custom schema, if needed.
DataTable dtResult = tbl1.Clone();
var result = from dataRows1 in tbl1.AsEnumerable()
join dataRows2 in tbl2.AsEnumerable()
on dataRows1.Field<int>("ID") equals dataRows2.Field<int>("ID") into lj
from reader in lj
select new object[]
{
dataRows1.Field<int>("ID"), // ID from table 1
reader.Field<string>("name2"), // Updated column value from table 2
dataRows1.Field<int>("age")
// .. here comes the rest of the fields from table 1.
};
// Load the results in the table
result.ToList().ForEach(row => dtResult.LoadDataRow(row, false));
return dtResult;
}
Here's the result:
After considering what #DStanley said about LINQ, I abandoned it and went with a foreach statement. See code below:
ds.Tables[3].Columns[2].ReadOnly = false;
while (reader.Read())
{
foreach (DataRow item in ds.Tables[3].Rows)
{
if ((Guid)item[3] == reader.GetGuid(0))
{
item[2] = reader.GetInt32(1);
}
}
}

Linq to Sql Update on extracted list not working

I am having issues updating my Database using linq to sql.
I have a master query that retrieves all records in the database (16,000 records)
PostDataContext ctxPost = new PostDataContext();
int n = 0;
var d = (from c in ctxPost.PWC_Gs
where c.status == 1
select c);
I then take the first 1000, and pass it to to another object after modification using the following query:
var cr = d.Skip(n).Take(1000);
I loop through the records using foreach loop
foreach (var _d in cr)
{
// Some stuffs here
_d.status = 0;
}
I then Call SubmitChanges
ctxPost.SubmitChanges();
No Record gets updated
Thanks to you all. I was missing the primary key on the ID field in the dbml file.

Linq merging DataTable with dynamically added primary keys

I'm stumped on this one.
I'm trying to merge two DataTables into one. Preferably I would use linq to perform this task, but the problem is I need to add conditions for the join dynamically. The data for each table comes from two different calls to stored procedures and which calls are used can be switched. The results can therefor vary in number of columns and which primary keys are available.
The goal is to replace regular strings in the first result set with a second database that can contain unicode (but only if it contains a value for that specific combination of primary keys).
My linq query would look like this:
var joined = (from DataRow reg in dt1.Rows
join DataRow uni in dt2.Rows
on new { prim1 = reg.ItemArray[0], prim2 = reg.ItemArray[1] }
equals new { prim1 = uni.ItemArray[0], prim2 = uni.ItemArray[1] }
select new
{
prim1 = reg.ItemArray[0],
prim2 = reg.ItemArray[1],
value1 = reg.ItemArray[4],
value2 = uni.ItemArray[3] ?? reg.ItemArray[3]
}
);
This works perfectly for what I want, but as I said I need to be able to define which columns in each table are primary keys, so this:
join DataRow uni in dt2.Rows
on new { prim1 = reg.ItemArray[0], prim2 = reg.ItemArray[1] }
equals new { prim1 = uni.ItemArray[0], prim2 = uni.ItemArray[1] }
needs to be replaced by something like creating a DataRelation between the tables or before performing the linq adding the primary keys dynamically.
ALSO, I need to make the select something like SQLs * instead of specifying each column, as I do not know the number of columns in the first result set.
I've also tried joining the tables by adding primary keys and doing a merge, but how do I then choose which column in dt2 to overwrite which one in dt1?
DataTable join = new DataTable("joined");
join = dt1.Copy();
join.Merge(dt2, false, MissingSchemaAction.Add);
join.AcceptChanges();
I'm using VS2012.
I ended up using a very simple approach, which doesn't involve creating primary key relations or joins at all. I'm sure there are more elegant or performance effective ways of solving the problem.
Basically I've adapted the solution in Linq dynamically adding where conditions, where instead of joining I dynamically add .Where-clauses.
That way I can loop through the rows and compare for each dynamically added primary key:
foreach (DataRow regRow in dt1.Rows)
{
//Select all rows in second result set
var uniRows = (from DataRow uniRow in dt2.Rows select uniRow);
//Add where clauses as needed
if (firstCondition) { uniRows = uniRows.Where(x => x["SalesChannel"] == "001"); }
else if (secondCondition) { uniRows = uniRows.Where(x => x["Language"] == "SV"); }
else (thirdCondition) { uniRows = uniRows.Where(x => x["ArticleNo"] == "242356"); }
// etc...
}
Each row gets compared to a diminishing list of rows in the second result set.

How to save data retrieved from a query

I previously asked the question and got answer to Best approach to write query but the problem is that if you have to save this result in a list then there duplication of records. For example
the resultant table of the join given EXAMPLE
See there are duplicate rows. How can you filter them out, and yet save the data of order number?
Of course there may be some ways but I am looking for some great ways
How can we store the data in list and not create duplicate rows in list?
My current code for my tables is
int lastUserId = 0;
sql_cmd = new SqlCommand();
sql_cmd.Connection = sql_con;
sql_cmd.CommandText = "SELECT * FROM AccountsUsers LEFT JOIN Accounts ON AccountsUsers.Id = Accounts.userId ORDER BY AccountsUsers.accFirstName";
SqlDataReader reader = sql_cmd.ExecuteReader();
if (reader.HasRows == true)
{
Users userToAdd = new Users();
while (reader.Read())
{
userToAdd = new Users();
userToAdd.userId = int.Parse(reader["Id"].ToString());
userToAdd.firstName = reader["accFirstName"].ToString();
userToAdd.lastName = reader["accLastName"].ToString();
lastUserId = userToAdd.userId;
Websites domainData = new Websites();
domainData.domainName = reader["accDomainName"].ToString();
domainData.userName = reader["accUserName"].ToString();
domainData.password = reader["accPass"].ToString();
domainData.URL = reader["accDomain"].ToString();
userToAdd.DomainData.Add(domainData);
allUsers.Add(userToAdd);
}
}
For second table I have custom list that will hold the entries of all the data in second table.
The table returned is table having joins and have multiple rows for same
Besides using the Dictionary idea as answered by Antonio Bakula...
If you persist the dictionary of users and call the code in your sample multiple times you should consider that a user account is either new, modifed, or deleted.
The algorithm to use is the following when executing your SQL query:
If row in query result is not in dictionary create and add new user to the dictionary.
If row in query result is in dictionary update the user information.
If dictionary item not in query result delete the user from the dictionary.
I'd also recommend not using SELECT *
Use only the table columns your code needs, this improves the performance of your code, and prevents a potential security breach by returning private user information.
i am not sure why are you not using distinct clause in your sql to fetch unique results. also that will be faster. did you look at using hashtables.
I would put users into Dictonary and check if allready exists, something like this :
Dictionary<int, Users> allUsers = new Dictionary<int, Users>()
and then in Reader while loop :
int userId = int.Parse(reader["Id"].ToString());
Users currUser = allUsers[userId];
if (currUser == null)
{
currUser = new Users();
currUser.userId = userId);
currUser.firstName = reader["accFirstName"].ToString();
currUser.lastName = reader["accLastName"].ToString();
allUsers.Add(userID, currUser);
}
Websites domainData = new Websites();
domainData.domainName = reader["accDomainName"].ToString();
domainData.userName = reader["accUserName"].ToString();
domainData.password = reader["accPass"].ToString();
domainData.URL = reader["accDomain"].ToString();
currUser.DomainData.Add(domainData);
Seems like the root of your problem is in your database table.
When you said duplicate data rows, are you saying you get duplicate entries in the list or you have duplicate data in your table?
Give 2 rows that are duplicate.
Two options:
First, prevent pulling duplicate data from sql by using a distinct clause like:
select distinct from where
Second option as mentioned Antonio, is to check if the list already has it.
First option is recommended unless there are other reasons.

Categories