Large unknown data from db to code

Large unknown data from db to code - c#

I have to get a big amount of different records (1000.000) from my db to build a report from this data. My DB is on a remote system. Now I have different sql statements for each report. This sql statements are send to the service. The service fills a DataSet and returns this to my application. Now I can bind the DataSet to my reports.
The problem is that DataSets with thid number of records have a enormous memory consumption. I mean if I load the data the memory rises to 1gb for one loading.
Is there an alternative to load data without this memory consumption?
I mean I already use ORM like NHibernate, but the problem is that I don't know the data that will be loaded, there are hundreds of reports with different sql statements that can be changed, so I cannot create hundreds of classes to map...
Edit :
Here is my example code that I am using:
DataSet dataSet = new DataSet();
try
{
using (FbConnection connection = new FbConnection(strConnString))
{
connection.Open();
using (FbCommand cmd =
new FbCommand(
"SELECT * FROM CUSTOMERS;",
connection))
{
FbDataAdapter fbd = new FbDataAdapter(cmd);
fbd.Fill(dataSet);
// This is what the default ADO.Net provider can do..
//SqlCommand command = new SqlCommand(queryString, connection);
//System.Xml.XmlReader reader = command.ExecuteXmlReader();
}
}
}
catch (Exception ex)
{
}

The question you should ask is: How much DATA is it and what is the memory OVERHEAD. If the OVERHEAD is large, you need to find a better data structure. If the DATA itself is too large for memory, you need to explore ways to only bring part of it into memory at a time.
In either case, using NHibernate for reporting on such large volumes is dubious - it can be useful to use existing mapped classes to construct queries, but you must include a projection to a simple unmapped DTO class or to an object[] or similar, to avoid NHibernate instantiating mapped classes for all the results - the latter would be bad for both performance and memory consumption.
Oh, and did you mean that you have a web service that returns a DataSet? That is considered bad style in general, because the DataSet class is Microsoft-specific, and for various other reasons (http://www.hanselman.com/blog/ReturningDataSetsFromWebServicesIsTheSpawnOfSatanAndRepresentsAllThatIsTrulyEvilInTheWorld.aspx).

Related

How to load 1000+ items to a asp:GridView from SQL Server DB with high efficiency

I have a function like below that I use to return a gridview (id: dgmenu) to the end users based on their role. Note that I am not allowed to apply pagination to the gridView, all items must be seen in one page.
protected DataTable MenuForUserRole(string userRole) {
DataTable dtMenus = new DataTable();
string connectionString = constr;
try {
using(SqlConnection cnn = new SqlConnection(connectionString)) {
cnn.Open();
string query = #"Select mycolumn1, mycolumn2, mycolumn3, mycolumn4m mycolumn5
From mytable
Where mykey = (select thekey from anothertable where role = #role)
order by myOrderColumn;
";
SqlCommand oCmd = new SqlCommand(query, cnn);
oCmd.Parameters.AddWithValue("#role", userRole);
using(SqlDataAdapter a = new SqlDataAdapter(oCmd)) {
a.Fill(dtMenus);
}
cnn.Close();
}
} catch (Exception ex) {
throw;
}
return dtMenus;
}
Usage:
dgMenu.DataSource = MenuForUserRole(ddlUserRoles.SelectedItem.Value.ToString());
dgMenu.DataBind();
My issue is performance-related: some of the GridViews returned has more than 1000 items, so it takes 5-6 seconds to load the complete gridView for those users, which is unacceptable. When I search online, I couldn't find more efficient code to load a gridView from SQL Server Database. Any help or advice that might increase the load speed when there is high amount of data to the gridview would be appreciated.
Used -> Visual Studio 2017 & SQL Server 2017

The most efficient way would be to realize that it is a bad idea.
1000 records is too much for any user to deal with. 1-2 Orders of Magnitude to much. There is no human on this planet, that could work with that much data at once. This data needs to be filtered, grouped or paginated way more before it comes in front of a user.
And those are all operations you should not be doing past the query. Those should be done in the query itself. Retrieving data you do not want to do filtering later just adds a tone of network load, adds race conditions, Database locks and is propably slower anyway (DBMS are really good at their job!). Worse, with ASP.Net and it's shared memory it can quickly lead to memory issues.

Profile your code. Understand where most of the time is spent. It could be SQL Server, it could be transmission over network, it could be binding the data to the control[s]. If it is SQL Server, we would need to see your schema to tell you how performance could be improved. Like, do you have an index on mykey? BTW, don't call it a key, key is something that uniquely identifies the record, which is obviously not the case here.

Use reporting (e.g. Reporting Services) and create a link to export the data into an excel spreadsheet.

Does SqlDataSource keep the database connection open at all times?

Basically I have a website that I am working on where there will be over 8 listboxes filled with information from databases. I currently use SqlDataSource because of ease of use and am using it currently databound to the listboxes.
Does SqlDataSource leave the connection open the whole time? I want to eliminate from an website architectural standpoint any unnecessary continuously open connections for security reasons as well as performance reasons.

Directly in answer to your question: No. The SqlDataSource control ensures that the connection is closed as soon as the operation it is required to perform has been completed.

I used to use SQLDataAdapter + SQLCommand, but now I mostly use
using(SQLDataReader rdr = <YourSQLCommandVariable>.ExecuteReader())
{
rdr.Load(<YourDataTableVariable))
}
Reason being I was unsure what data adapter did on top of the data reader to allow it to do batches of updates, reads and deletes. If you think about it, it would be extremely difficult to write a class like the data adapter that can do all that, without introducing any overhead. The overhead may not be significant, but unless I'm reading multiple tables out of a query into a DataSet object I don't run the risk of using it.
All that being said, I doubt any overhead on these operations is worth even considering if you locally cache all of the resulting data into the local machine. In other words, the biggest improvement you can make to your SQL queries is to not make them if the data is not likely to change over some time frame. If the data is updated once a day, cache it for 24 hours or less. Caching can be done either via the Session if it is end-user-dependent or via the HttpContext.Current.Cache object.

It sounds like you might want some tier separation in your application. The Web project is ideally ignorant of the database. Ideally there is some middle tier assembly that handles communicating with the database. Then from your .aspx.cs or Controller,depending on whether or not you're using MVC, you would make 8 calls to the middle tier (one for each listbox assuming they have distinct information). The middle tier would return something like List<MyObject> which you would then bind to the listbox.
My typical pattern for data access looks like this
using (SqlConnection conn = new SqlConnection("conn string"))
{
conn.Open();
SqlCommand command = new SqlCommand()
{
CommandText = "command text",
Connection = conn,
CommandType = CommandType.StoredProcedure //could be non-stored proc.. but would reccomend stored proc assuming SQL Server
};
command.Parameters.Add(new SqlParameter("MyParam", "param1"));
command.Parameters.Add(new SqlParameter("MyParam2", "param2"));
IDataReader reader = command.ExecuteReader();
while(reader.Read())
{
//magic here
}
conn.Close();
}

SqlCommand or SqlDataAdapter?

I'm creating something like a small cashier application that keeps record for the clients, employees, services, sales, and appointments. I'm using windows forms, and within that DataGrids. I've created the database that I'm going to be using for the application. I want to know if I should use SqlCommand-SqlDataReader or SqlDataAdapter-DataSet instead. Which approach is better?

This is highly depend upon type of operation you want.
Following is my suggetion.
If you want to read data faster go for SQLDataReader but that comes as cost of operation you need to take
during read after that also.
Open Connection
Read Data
Close Connection. If you forgot to close than it will hit performance.
Go for SQLDataAdapter
If you want to read faster and use benefit of Disconnected Arch. of ADO.net
This will automatically close/open connection.
Also it will also allow you to automatically handle update in DataSet back to DataBase. ( SqlCommandBuilder)
Use SQLCommand ( This will also comes when you read SQLDataReader for read data) and for insert and update.
This will give you better performance for insert and update.
If you are using .NET Frame 3.5 sp1 or later i would suggest Linq to SQL or Entity Framework would also
solve your purpose.
Thanks.

SqlDataAdapter
stores data on your client and updates database as necessary. So it
consumes more memory.
On the other hand you wouldn't need to be
connected to your database on insert/delete/update/select command.
It manages connections internally so you wouldn't have to worry about
that.
All good stuff from SqlDataAdapter come at a cost of more memory consumption. It's usually used for systems that need multiple users connected to database.
So I'd say if that's not your situation go for SqlCommand and the connected model.

If you are just reading data and not doing updates/inserts/deletes, then SqlDataReader will be faster. You can also combine it with a DataSet. If you wrap the data access objects with using statements, the runtime will handle the connection cleanup logic for you.
A pattern I often use for synchronous access is something like this:
DataTable result = new DataTable();
using (SqlConnection conn = new SqlConnection(MyConnectionString))
{
using (SqlCommand cmd = new SqlCommand(MyQueryText, conn))
{
// set CommandType, parameters and SqlDependency here if needed
conn.Open();
using (SqlDataReader reader = cmd.ExecuteReader())
{
result.Load(reader);
}
}
}
For updates/deletes/inserts, a SqlDataAdapter might be worth considering, but usually only if you already have your data in a DataSet. Otherwise, there are faster/better ways of doing things.

If you are aware of these components (Core ADO.NET) (Command,Connection, DataAdapter) then I'd suggest Entity Data Model or Linq-SQL.
SqlDataAdapter is helper class which implicitly uses SqlCommand, SqlConnection and SqlDataReader.

DataReader – The datareader is a forward-only, readonly stream of data
from the database. This makes the datareader a very efficient means
for retrieving data, as only one record is brought into memory at a
time. The disadvantage: A connection object can only contain one
datareader at a time, so we must explicitly close the datareader when
we are done with it. This will free the connection for other uses. The
data adapter objects will manage opening and closing a connection for
the command to execute
DataAdapter – Represents a set of SQL commands and a database
connection that are used to fill the DataSet and update the data
source. It serves as a bridge between a DataSet and a data source for
retrieving and saving data. The DataAdapter provides this bridge by
mapping Fill, which changes the data in the DataSet to match the data
in the data source, and Update, which changes the data in the data
source to match the data in the DataSet. By using it, DataAdapter also
automatically opens and closes the connection as and when required.

SQL Command is Easier but not Automated. SQL Data Adapter is Less easy but Automated.
*Automated means it manages the opening and closing of a server, etc. automatically.
Both of them shares the same functionalities on Data

How to stream data rows into sqlserver?

I need to copy large resultset from one database and save it to another database.
Stored procedures are used for both fetching and storing due to the fact that there is some logic involved during saving.
I'm trying to find an efficent solution, no way I can hold the whole dataset in memory, and I would like to minimize roundtrips count.
Data is read from source table with
var reader = fetchCommand.ExecuteReader();
while (reader.Read()){...}
Is there a way to insert this data to another sqlCommand without loading the whole dataset into a DataTable but also without inserting rows ine by one?
Sqlserver is MS SQL Server 2008 on both source and target databases. Databases are on different servers. Use of SSIS or linked servers is not an option.
EDIT:
It appears it's possible to stream rows into a stored procedure using table-valued paramaters. Will investigate this approach as well.
UPDATE:
Yes it's possible to stream data out from command.ExecuteReader to another command like this:
var reader = selectCommand.ExecuteReader();
insertCommand.Parameters.Add(
new SqlParameter("#data", reader)
{SqlDbType = SqlDbType.Structured}
);
insertCommand.ExecuteNonQuery();
Where insertCommand is a stored procedure with table-valued parameter #data.

You need SqlBulkCopy. You can just use it like this:
using (var reader = fetchCommand.ExecuteReader())
using (var bulkCopy = new SqlBulkCopy(myOtherDatabaseConnection))
{
bulkCopy.DestinationTableName = "...";
bulkCopy.ColumnMappings = ...
bulkCopy.WriteToServer(reader);
}
There is also a property to set the batch size. Something like 1000 rows might give you the best trade-off between memory usage and speed.
Although this doesn't let you pipe it into a stored procedure, the best approach might be to copy data to a temporary table and then run bulk update command on the server to copy the data into its final location. This usually faster by far than executing lots of separate statements for each row.

You can use SqlBulkCopy with a data-reader, which does roughly what you are asking (non-buffered etc) - however, this won't be calling stored procedures to insert. If you want that, perhaps use SqlBulkCopy to push the data into a second table (same structure), then at the DB server, loop over the rows calling the sproc locally. That way, latency etc ceases to be an issue (as the loop is all at the DB server).

Import data from excel into multiple tables

I'm building an offline C# application that will import data off spread sheets and store them in a SQL Database that I have created (Inside the Project). Through some research I have been able to use some code that can import a static table, into a Database that is exactly the same layout as the columns in the worksheet
What I"m looking to do is have specific columns go to their correct tables based on name. This way I have the database designed correctly and not just have one giant table to store everything.
Below is the code I'm using to import a few static fields into one table, I want to be able to split the imported data into more than one.
What is the best way to do this?
public partial class Form1 : Form
{
string strConnection = ConfigurationManager.ConnectionStrings
["Test3.Properties.Settings.Test3ConnectionString"].ConnectionString;
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
//Create connection string to Excel work book
string excelConnectionString =
#"Provider=Microsoft.Jet.OLEDB.4.0;
Data Source=C:\Test.xls;
Extended Properties=""Excel 8.0;HDR=YES;""";
//Create Connection to Excel work book
OleDbConnection excelConnection = new OleDbConnection(excelConnectionString);
//Create OleDbCommand to fetch data from Excel
OleDbCommand cmd = new OleDbCommand
("Select [Failure_ID], [Failure_Name], [Failure_Date], [File_Name], [Report_Name], [Report_Description], [Error] from [Failures$]", excelConnection);
excelConnection.Open();
OleDbDataReader dReader;
dReader = cmd.ExecuteReader();
SqlBulkCopy sqlBulk = new SqlBulkCopy(strConnection);
sqlBulk.DestinationTableName = "Failures";
sqlBulk.WriteToServer(dReader);
}

You can try an ETL (extract-transform-load) architecture:
Extract: One class will open the file and get all the data in chunks you know how to work with (usually you take a single row from the file and parse its data into a POCO object containing fields that hold pertinent data), and put those into a Queue that other work processes can take from. In this case, maybe the first thing you do is have Excel open the file and re-save it as a CSV, so you can reopen it as basic text in your process and chop it up efficiently. You can also read the column names and build a "mapping dictionary"; this column is named that, so it goes to this property of the data object. This process should happen as fast as possible, and the only reason it should fail is because the format of a row doesn't match what you're looking for given the structure of the file.
Transform: Once the file's contents have been extracted into an instance of a basic row, perform any validation, calculations or other business rules necessary to turn a row from the file into a set of domain objects that conform to your domain model. This process can be as complex as you need it to be, but again it should be as straightforward as you can make it while obeying all the business rules given in your requirements.
Load: Now you've got an object graph in your own domain objects, you can use the same persistence framework you'd call to handle domain objects created any other way. This could be basic ADO, an ORM like NHibernate or MSEF, or an Active Record pattern where objects know how to persist themselves. It's no bulk load, but it saves you having to implement a completely different persistence model just to get file-based data into the DB.
An ETL workflow can help you separate the repetitive tasks into simple units of work, and from there you can identify the tasks that take a lot of time and consider parallel processes.
Alternately, you can take the file and massage its format by detecting columns you want to work with, and arranging them into a format that matches your bulk input spec, before calling a bulk insert routine to process the data. This file processor routine can do anything you want it to, including separating data into several files. However, it's one big process that works on a whole file at a time and has limited opportunities for optimization or parallel processing. However, if your loading mechanism is slow, or you've got a LOT of data that is simple to digest, it may end up faster than even a well-designed ETL.
In any case, I would get away from an Office format and into a plain-text (or XML) format as soon as I possibly could, and I would DEFINITELY avoid having to install Office on a server. If there is ANY way you can require the files be in some easily-parseable format like CSV BEFORE they're loaded, so much the better. Having an Office installation on a server is a Really Bad Thing in general, and OLE operations in a server app is not much better. The app will be very brittle, and anything Office wants to tell you will cause the app to hang until you log onto the server and clear the dialog box.

If you were looking for a more code related answer, you could use the following to modify your code to work with difficult column names / different tables:
private void button1_Click(object sender, EventArgs e)
{
//Create connection string to Excel work book
string excelConnectionString =
#"Provider=Microsoft.Jet.OLEDB.4.0;
Data Source=C:\Test.xls;
Extended Properties=""Excel 8.0;HDR=YES;""";
//Create Connection to Excel work book
OleDbConnection excelConnection = new OleDbConnection(excelConnectionString);
//Create OleDbCommand to fetch data from Excel
OleDbCommand cmd = new OleDbCommand
("Select [Failure_ID], [Failure_Name], [Failure_Date], [File_Name], [Report_Name], [Report_Description], [Error] from [Failures$]", excelConnection);
excelConnection.Open();
DataTable dataTable = new DataTable();
dataTable.Columns.Add("Id", typeof(System.Int32));
dataTable.Columns.Add("Name", typeof(System.String));
// TODO: Complete other table columns
using(OleDbDataReader dReader = cmd.ExecuteReader())
{
DataRow dataRow = dataTable.NewRow();
dataRow["Id"] = dReader.GetInt32(0);
dataRow["Name"] = dReader.GetString(1);
// TODO: Complete other table columns
dataTable.Rows.Add(dataRow);
}
SqlBulkCopy sqlBulk = new SqlBulkCopy(strConnection);
sqlBulk.DestinationTableName = "Failures";
sqlBulk.WriteToServer(dataTable);
}
Now you can control the names of the columns and which tables the data gets imported into. SqlBulkCopy is good for insert large amounts of data. If you only have a small amount of rows, you might be better off creating a standard data access layer to insert your records.

If you are only interested in the text (not the formatting etc.), alternatively you can save the excel file as CSV file, and parse the CSV file instead, it's simple.

Depending on the lifetime of the program, I would recommend one of two options.
If the program is to be short lived in use, or generally a "throw away" project, I would recommend a series of routines which parse and input data into another set of tables using standard SQL with some string processing as needed.
If the program will stick around longer and/or find more use on a day-to-day basis, I would recommend implementing a solution similar to the one recommended by #KeithS. With a set of well defined steps for working with the data, much flexibility is gained. More specifically, the .NET Entity Framework would probably be a great fit.
As a bonus, if you're not already well versed in this area, you might find you learn a great deal about working with data between boundaries (xls -> sql -> etc.) during your first stint with an ORM such as EF.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.