We are using C#.net & use access database code for import of text file specification into access table
is there any access database limit for this action, as we may have records > 5 lac (500,000) ,will this process work for huge records??
If No then how can we handle huge records insertion in access database for same ?
Thanks
The import process doesn't have any specific limit on the number of records that you can import or store in a table, however it does limit you to a single table size of 1 gigabyte for Access 2000 or 2 gigabytes for later versions.
A huge number of small records is OK, and a small number of huge records is OK. But, a huge number of huge records will probably hit the limit.
P.S. You shouldn't use lac (lakh) on international forums because it is only understood in India and nearby countries. 1 lac = 100,000
Would you consider:
Load up your data in c# (StreamReader, etc),
start an oleDbTransaction,
Run an Insert query x 500k times using an oleDbCommand
Commit your transaction.
This will take away your dependency on the Access Import specification too, so might port easier to other db types in the future.
The speed should be comparable to the Access Import, but requires you to code up the equivalent of your import specification (ie, 'create table' SQL, 'insert into' SQL).
Related
I'm quite new to coding in general and I'm looking to copy 47 columns with c300,000 rows of data, from an Oracle to an SQL database, on a daily basis. The code will be stored as a Windows Service, running at the same time every day (or more likely night).
The data from the Oracle DB table (let's call this the Oracle_Source) will be used to both append to a history table (call this SQL_History) and also to append new/update matching/delete missing rows from a live table (call this SQL_Live). The two types of databases are housed on different servers, but the two SQL tables are on the same DB.
I have a few questions around the best way to approach this.
Using VB/C#, is it faster to loop through rows (either 1 by 1 or batches of 100/1000/etc.) of Oracle_Source and insert/update SQL_History/SQL_Live OR copy the entire table of Oracle_Source in one go and insert into the SQL tables? Previously I have used the loop to download data into a .csv.
Using the more efficient of the above methods, would it be faster to work on both SQL tables simultaneously OR copy the data into the SQL_History table and then use that to APPEND/UPDATE/DELETE from the SQL_Live table?
Am I approaching this completely wrong?
Any other advice available is also much appreciated.
The correct question is “What is the fast way to copy the table?”
In your specific case , with 2 different server and a “big” table to copy, you are probably limited by network IO.
So, the first point is to update only the rows that must be update (Update/ Insert / Delete), so less byte to move.
To answer to your first point, you have to use transaction to improve the speed on sql server during the writing phase. The dimension of transaction depend on differenct factor (db, machine, ...) but I usually make transaction with 500/1000 simple commands. In my personal experience, if you use INSERT with more rows, you can send 500 rows for INSERT without performance issue.
In my experience, a bulk copy is faster than an efficient INSERT, UPDATE and DELETE because the db does not calculate key and does not check duplicate rows.
Better explained:
you TRUNCATE all data
DISABLE keys
massive INSERT of all rows and
re-ENABLE keys.
This is the faster way to copy a table but if your communication is from different server with low network speed this can't be the best choice.
Obviously, what is the best choice depend from your infrastructure and the table dimension
For example:
If you have one server your lan and the second server on clouds, the bottleneck is on the speed of internet connection and you must pay more attention to have an efficient communication(less byte).
If both servers are on your lan with two gigabit connection, probably the full network communication are around 100mb, and you can use a simple move all the table rows without headache.
I'm recieving and parsing a large text file.
In that file I have a numerical ID identifying a row in a table, and another field that I need to update.
ID Current Location
=========================
1 Boston
2 Cambridge
3 Idaho
I was thinking of creating a single SQL command string and firing that off using ADO.Net, but some of these files I'm going to recieve have thousands of lines. Is this doable or is there a limit I'm not seeing?
If you may have thousands of lines, then composing a SQL statement is definitely NOT the way to go. Better code-based alternatives include:
Use SQLBulkCopy to insert the change data to a staging table and then UPDATE your target table using the staging table as the source. It also has excellent batching options (unlike the other choices)
Write a stored procedure to do the Update that accepts an XML parameter that contains the UPDATE data.
Write a stored procedure to do the Update that accepts a table-valued parameter that contains the UPDATE data.
I have not compared them myself but it is my understanding that #3 is generally the fastest (though #1 is plenty fast for almost any need).
Writing one huge INSERT statement well be very slow. You also don't want to parse the whole massive file at once. What you need to do is something along the lines of:
Figure out a good chunk size. Let's call it chunk_size. This will be the number of records you'll read from the file at a time.
Load chunk_size number of records from the file into a DataTable.
Use SQLBulkCopy to insert the DataTable into the DB.
Repeat 2 & 3 until the file is done.
You'll have to experiment to find an optimal size for chunk_size so start small and work your way up.
I'm not sure of an actual limit, if one exists, but why not take "bite sized" chunks of the file that you feel comfortable with and break it into several commands? You can always wrap it in a single transaction if it's important that they all fail or succeed.
Say grab 250 lines at a time, or whatever.
I have a requirement where I need to update thousands of records in live database table, and although there are many columns in this table, I only need to update 2-3 columns.
Further I can't hit database for thousand times just for updating which can be done in a batch update using SQL Server Table Valued Parameter. But again I shouldn't update all thousands records in one go for better error handling, instead want to update records in batches of x*100.
So, below is my approach, please give your valuable inputs for any other alternatives or any change in the proposed process -
1 Fetch required records from database to List<T> MainCollection
2 Save this collection to XML file with each element Status = Pending
3 Take first 'n' elements from XML file with Status = Pending and add them to new List<T> SubsetCollection
4 Loop over List<T> SubsetCollection - make required changes to T
5 Convert List<T> SubsetCollection to DataTable
6 Call Update Stored Procedure and pass above DataTable as TVP
7 Update Status = Processed for XML Elements corresponding to List<T> SubsetCollection
8 If more records with Pending status exists in XML file, go to Step# 3.
Please guide for a better approach or any enhancement in above process.
I would do a database-only approach if possible and if not possible, eliminate the parts that will be the slowest. If you are unable to do all the work in a stored procedure, then retrieve all the records and make changes.
The next step is to write the changes to a staging table with SQL Bulk Copy. This is a fast bulk loaded that will copy thousands of records in seconds. You will store the primary key and the columns to be updated as well as a batch number. The batch number is assigned to each batch of records, therefore allowing another batch to be loaded without conflicting with the first batch.
Use a stored procedure on the server to process the records in batches of 100 or 1000 depending on performance. Pass the batch number to the stored procedure.
We use such a method to load and update millions of records in batches. The best speed is obtained by eliminating the network and allowing the database server to handle the bulk of the work.
I hope this might provide you with an alternate solution to evaluate.
It may not be the best practice but you could embed some logic inside a SQL Server CLR function. This function could be called by a Query,StoProc or a schedule to run at a certain time.
The only issue I can see is getting step 4 to make the required changes on T. Embedding that logic into the database could be detrimental to maintenance, but this is no different to people who embed massive amounts of business logic into StoProcs.
Either way SQL Server CLR functions may be the way to go. You can create them in Visual Studio 2008, 2010 (Check the database new project types).
Tutorial : http://msdn.microsoft.com/en-us/library/w2kae45k(v=vs.80).aspx
my windows app is reading text file and inserting it into the database. The problem is text file is extremely big (at least for our low-end machines). It has 100 thousands rows and it takes time to write it into the database.
Can you guys suggest how should i read and write the data efficiently so that it does not hog machine memory?
FYI...
Column delimiter : '|'
Row delimiter : NewLine
It has approximately 10 columns.. (It has an information of clients...like first name, last name, address, phones, emails etc.)
CONSIDER THAT...I AM RESTRICTED FROM USING BULK CMD.
You don't say what kind of database you're using, but if it is SQL Server, then you should look into the BULK INSERT command or the BCP utility.
Given that there is absolutely no chance of getting help from your security folks and using BULK commands, here is the approach I would take:
Make sure you are reading the entire text file first before inserting into the database. Thus reducing the I/O.
Check what indexes you have on the destination table. Can you insert into a temporary table with no indexes or dependencies so that the individual inserts are fast?
Does this data need to be visible immediately after insert? If not then you can have a scheduled job to read from the temp table in step 2 above and insert into the destination table (that has indexes, foreign keys etc.).
Is it possible for you to register your custom assembly into Sql Server? (I'm assuming it's sql server because you've already said you used bulk insert earlier).
Than you can call your assembly to do (mostly) whatever you need, like getting a file from some service (or whatever your option is), parsing and inserting directly into tables.
This is not an option I like, but it could be a saver sometimes.
I'm currently working on a project where we have a large data warehouse which imports several GB of data on a daily basis from a number of different sources. We have a lot of files with different formats and structures all being imported into a couple of base tables which we then transpose/pivot through stored procs. This part works fine. The initial import however, is awfully slow.
We can't use SSIS File Connection Managers as the columns can be totally different from file to file so we have a custom object model in C# which transposes rows and columns of data into two base tables; one for column names, and another for the actual data in each cell, which is related to a record in the attribute table.
Example - Data Files:
Example - DB tables:
The SQL insert is performed currently by looping through all the data rows and appending the values to a SQL string. This constructs a large dynamic string which is then executed at the end via SqlCommand.
The problem is, even running in a 1MB file takes about a minute, so when it comes to large files (200MB etc) it takes hours to process a single file. I'm looking for suggestions as to other ways to approach the insert that will improve performance and speed up the process.
There are a few things I can do with the structure of the loop to cut down on the string size and number of SQL commands present in the string but ideally I'm looking for a cleaner, more robust approach. Apologies if I haven't explained myself well, I'll try and provide more detail if required.
Any ideas on how to speed up this process?
The dynamic string is going to be SLOW. Each SQLCommand is a separate call to the database. You are much better off streaming the output as a bulk insertion operation.
I understand that all your files are different formats, so you are having to parse and unpivot in code to get it into your EAV database form.
However, because the output is in a consistent schema you would be better off either using separate connection managers and the built-in unpivot operator, or in a script task adding multiple rows to the data flow in the common output (just like you are currently doing in building your SQL INSERT...INSERT...INSERT for each input row) and then letting it all stream into a destination.
i.e. Read your data and in the script source, assign the FileID, RowId, AttributeName and Value to multiple rows (so this is doing the unpivot in code, but instead of generating a varying number of inserts, you are just inserting a varying number of rows into the dataflow based on the input row).
Then pass that through a lookup to get from AttributeName to AttributeID (erroring the rows with invalid attributes).
Stream straight into an OLEDB destination, and it should be a lot quicker.
One thought - are you repeatedly going back to the database to find the appropriate attribute value? If so, switching the repeated queries to a query against a recordset that you keep at the clientside will speed things up enormously.
This is something I have done before - 4 reference tables involved. Creating a local recordset and filtering that as appropriate caused a speed up of a process from 2.5 hours to about 3 minutes.
Why not store whatever reference tables are needed within each database and perform all lookups on the database end? Or it may even be better to pass a table type into each database where keys are needed, store all reference data in one central database and then perform your lookups there.