I want to display the missing (non-matching) records - c#

Is there a way to program the following SQL query
SELECT dbo.Assets_Master.Serial_Number, dbo.Assets_Master.Account_Ident, dbo.Assets_Master.Disposition_Ident
FROM dbo.Assets_Master LEFT OUTER JOIN
dbo.Assets ON dbo.Assets_Master.Serial_Number = dbo.Assets.Serial_Number
WHERE (dbo.Assets.Serial_Number IS NULL)
in c# .net code using dataviews or data relation or something else?
I have a spreadsheet of about 4k rows and a data table that should have the same records but if not I want to display the missing (non-matching) records from the table.
Thanks,
Eric

If you've already got that query, you can just pass that text as a SQL command and pull back the results as a dataset. Better might be setting up your query as a stored procedure and then following the same steps (calling a stored proc is cleaner than writing the SQL by hand).
If you want a way to do it without SQL at all, you could use LINQ to grab an IENUMERABLE of your ASSETS_MASTER serial numbers and another IENUMBERABLE of your ASSETS records. Then something like:
foreach(ASSET asset in ASSETS)
{
if(!ASSETS_MASTER_SERIALSNOS.CONTAINS(asset.SerialNumber))
{
//Do Whatever
}
}

Related

How to use FETCH in OleDb query?

I have xlsx table like this:
Name SubDatasetCount Parameter1 Parameter2 ParameterX .......
Dataset1
SubDataset1
SubDataset2
SubDatasetX
Dataset2
SubDataset1
SubDataset2
SubDatasetX
.
.
.
My goal is to load any Dataset Parameters and all its SubDatasets.
Xlsx format and reading method is given. At this moment I read Data1-SubDataCount and then I try to run following SQL query for OleDbReader:
SELECT *
FROM ["SheetName"$]
WHERE Name LIKE '%DatasetName%'
FETCH NEXT [SubDatasetCount] ROWS ONLY
It cause OleDbException: 'IErrorInfo.GetDescription failed with E_FAIL(0x80004005).' . Prior addition of FETCH query worked fine. I have no SQL knowledge, I copied it from here: How to select next rows from database in C#?
In linked answer there is statement that ORDER BYis a MUST, but I can not do that obviously.
And even when I tested following query, error is same:
SELECT *
FROM ["SheetName"$]
WHERE Name LIKE '%DatasetName%'
ORDER BY Name
FETCH NEXT 10 ROWS ONLY
It works when I remove FETCH and leave ORDER BY. Quick study of that specific error yields always same result - reserved keyword is used in the query. But I don't see anything like that in FETCH part of query.
How do I make FETCH work?
In case FETCH is fixed somehow, how to solve ORDER BY requirement? ORDER BY(SELECT NULL) cause exception.

SQL Server - Best practice to circumvent large IN (...) clause (>40000 items)

I'm developing an ASP.NET app that analyzes Excel files uploaded by user. The files contain various data about customers (one row = one customer), the key field is CustomerCode. Basically the data comes in form of DataTable object.
At some point I need to get information about the specified customers from SQL and compare it to what user uploaded. I'm doing it the following way:
Make a comma-separated list of customers from CustomerCode column: 'Customer1','Customer2',...'CustomerN'.
Pass this string to SQL query IN (...) clause and execute it.
This was working okay until I ran into The query processor ran out of internal resources and could not produce a query plan exception when trying to pass ~40000 items inside IN (...) clause.
The trivial ways seems to:
Replace IN (...) with = 'SomeCustomerCode' in query template.
Execute this query 40000 times for each CustomerCode.
Do DataTable.Merge 40000 times.
Is there any better way to work this problem around?
Note: I can't do IN (SELECT CustomerCode FROM ... WHERE SomeConditions) because the data comes from Excel files and thus cannot be queried from DB.
"Table valued parameters" would be worth investigating, which let you pass in (usually via a DataTable on the C# side) multiple rows - the downside is that you need to formally declare and name the data shape on the SQL server first.
Alternatively, though: you could use SqlBulkCopy to throw the rows into a staging table, and then just JOIN to that table. If you have parallel callers, you will need some kind of session identifier on the row to distinguish between concurrent uses (and: don't forget to remove your session's data afterwards).
You shouldn't process too many records at once, because of errors as you mentioned, and it is such a big batch that it takes too much time to run and you can't do anything in parallel. You shouldn't process only 1 record at a time either, because then the overhead of the SQL server communication will be too big. Choose something in the middle, process eg. 10000 records at a time. You can even parallelize the processing, you can start running the SQL for the next 10000 in the background while you are processing the previous 10000 batch.

Reduce number of database calls

I have a stored-procedure which accepts five parameters and performing a update on a table
Update Table
Set field = #Field
Where col1= #Para1 and Col2=#Para and Col3=#Para3 and col4 =#aPara4
From the user prospective you can select multiple values for all the condition parameters.
For example you can select 2 options which needs to match Col1 in database table (which need to pass as #Para1)
So I am storing all the selected values in separates lists.
At the moment I am using foreach loop to do the update
foreach (var g in _list1)
{
foreach (var o in _list2)
{
foreach (var l in _list3)
{
foreach (var a in _list4)
{
UpdateData(g, o, l,a);
}
}
}
}
I am sure this is not a good way of doing this since this will call number of database call. Is there any way I can ignore the loop and do a minimum number of db calls to achieve the same result?
Update
I am looking for some other approach than Table-Valued Parameters
You can bring query to this form:
Update Table Set field = #Field Where col1 IN {} and Col2 IN {} and Col3 IN {} and col4 IN {}
and pass parameters this way: https://stackoverflow.com/a/337792/580053
One possible way would be to use Table-Valued Parameters to pass the multiple values per condition to the stored procedure. This would reduce the loops in your code and should still provide the functionality that you are looking for.
If I am not mistaken they were introduced in SQL Server 2008, so as long as you don't have to support 2005 or earlier they should be fine to use.
Consider using the MS Data Access Application Block from the Enterprise Library for the UpdateDataSet command.
Essentially, you would build a datatable where each row is a parameter set, then you execute the "batch" of parameter sets against the open connection.
You can do the same without that of course, by building a string that has several update commands in it and executing it against the DB.
Since table-valued parameters are off limits to you, you may consider an XML-based approach:
Build an XML document containing the four columns that you would like to pass.
Change the signature of your stored procedure to accept a single XML-valued parameter instead of four scalar parameters
Change the code of your stored procedure to perform the updates based on the XML that you get
Call your new stored procedure once with the XML that you constructed in memory using the four nested loops.
This should reduce the number of round-trips, and speed up the overall execution time. Here is a link to an article explaining how inserting many rows can be done at once using XML; your situation is somewhat similar, so you should be able to use the approach outlined in that article.
So long as you have the freedom to update the structure of the stored procedure; the method I would suggest for this would be to use a table value parameter instead of the multiple parameters.
A good example which goes into both server and database code for this can be found at: http://www.codeproject.com/Articles/39161/C-and-Table-Value-Parameters
Why are you using a stored procedure for this? In my opinion you shouldn't use SP to do simple CRUD operations. The real power of stored procedures is for heavy calculations and things like that.
Table-valued parameters would be my choice, but since you are looking for other approach why don't you go the simpler way and just dynamically construct a bulk/mass update query on your server side code and run it against the DB?

How to manage a million records?

I really need an expert's help to answer my query.
Here is the scenario:
Im using an sql select query to retrieve a million records.
I need to perform sorting and grouping on the resultant records which im storing in a datatable( in one execution)
and looping through it for grouping and sorting it.
I know this is so childish and not the right way to process it.
How can i manage the million records effectively and apply the grouping and sorting to it?
Really need help out here. Heard of executing the select query batch wise but how to implement the grouping and sorting while we dont have the entire data in hand?
I cannot go for sql order by and group by directly and that's against my requirement.
Here is what i'm doing right now:
I have the following objects, i.e the column names for grouping and Sorting
List<Group> groupList;
List<Sort> sortList;
DataTable reportData; // Here im having the entire records from db
Im looping through the 'reportData' row by row and matches the current and previous row for the custom grouping and sorting. Would like to know how the same can be done when we are using a batchwise execution or any alternative solution is there?
I need to perform sorting and grouping on the resultant records which
im storing in a datatable( in one execution) and looping through it
for grouping and sorting it.
What for?
Seriously.
Do not pull then try plaing smart with a stupid object model behind (and datasets are not particularly smart, sorry).
Group and sort in your select statement, pull the data lready grouped and joined and be done with it.
A million records was a small amount of data for sql server when the original version was release (4.2 it was, a port of sysase sql server) 17 years of so ago. These days it is something that fits likely into the processor thiird level cache and is nothing a proper sql server even realizes it has just processed.
SQL is particulaly good ad doing projects and ever since they indoruced MARS you can even run multiple queries over one connection, which comes in handy here.
So, go back - throw away the dataset and "I try to program a sort algo" and create proper SQL statements to pull the data as you need it.
Sounds like you should implement Partition Pruning. Partitioning will allow for a separation of content like you are requesting in order to have faster queries.
If I understood correctly, in your case, I would create a temporary database table with the structure I want especially to cover my grouping.
Then I would select the records from main tables and insert them to the temporary one appying all modifications including grouping.
A specific index on how you want them sorted should be also applied.
After that, just select from this table, do what you have to do, and finally if the data are not needed any more, delete the temporary table.
I would choose the above solution because a million of records in memory smells trouble to me...
For example:
1. Lets assume that you would like to group them by their DocumentTypeID
var groupByType = reportData.GroupBy(g=>g.DocumentTypeID);
2. Sorting Alphabetically
var sortAlphabetically = reportData.OrderBy(g=>g.DocumentName);
3. Grouping and Sorting
var groupAndSort = reportData.GroupBy(g=>g.DocumentTypeID)
.OrderBy(g=>g.DocumentName);
4. Sort and Group
var groupAndSort = reportData.OrderBy(g=>g.DocumentName)
.GroupBy(g=>g.DocumentTypeID);
5. Multiple Grouping and sorting
var multipleGroupAndSort = reportData.GroupBy(g=>g.DocumentTypeID)
.GroupBy(g=>g.CreatedOnDate.Month)
.OrderBy(g=>g.DocumentName);
so on and so forth...
But I would still discourage bringing million rows to application. It will cost memory. There are of course ways to manage it through stored procedures etc.

getting multiple sets of data in one request?

I am working on a site in which as user logs in (first database request) the stored procedure varify password and user id and then returns user record that I put in session to use next.
After this I do a second db request. it returns addresses of user which I put in cache.
Can you pleas guide me is there some way that I can get both sets of data (user record and his address from 2nd table) in one database requests.
plz guide me on this, I am using DAAB (enter prise library) for data access.
Thanks
Modify your SP which has multiple select statements, as in you case is 2. Two select statements in one SP will return two record sets. Verify in SQL Management Studio, when you run your SP, it should show you multiple Grid in bottom panel.
Once your SP is done, call SP from C# code and load result in DataSet. Dataset will have two table, and you can get the data from different table
You can write two select queries in a stored procedure or
Execute two queries one after another . In single query you can execute and receive the data in DataSet .
ExecuteDataset()
So two tables will be returned inside the dataset . You can get the values like
dataset.tables(0)
dataset.tables(1)
Thanks
You would gain nothing from retrieving two results sets in one go, But the code will become more incoherent. Why do you thing you need to merge two logically separate operations into one? Instead of using such questionable methods you can use join to get one result set that contains all the data in one go, but still that seems wrong. I can not see a clean way of doing what you are asking for and any benefits that might be gained.

Categories