C# upload CSV file to Netezza - c#

So my team is looking into connecting to Netezza with C# and we plan on loading data into netezza, pulling data from netezza and writing update queries all in C#.
From my research, I see that it's possible to connect to netezza using C# and I'm wondering if you can do all that is bolded above using C# so that we can decide on whether or not we can do just about anything with Netezza using C#. We'd like to know before we commit to anything. The types of data we would be loading are CSV files.
Are there any good resources on this? I haven't been able to find any.
We also have Aginity client tools so maybe it's possible to incorporate Aginity to this (Not that I would want to but if it's easier I'd like to know about it)?

Retrieving data is straightforward and can be done through the usual channels (loop over a cursor to get results) but loading can take a bit longer.
Netezza is not a fan of multiple INSERT queries; loading a large number of records with individual INSERT queries, as it doesn't support multi-row inserts, will take a long time.
When loading multiple records most people usually write out their data to a ".csv" and use the external table syntax to perform the insert.
When in a application we prefer to load/unload our data via a named pipe so that we don't have to write/read the data to disk prior.

Related

Export SDF database contents as JSON file

I have a C# Windows Forms project which uses a SQL Server Compact (.SDF) database for only retrieving data. My application does not update the database. Thus the database is static.
I recently read somewhere that for such kind of static work it is best to use XML or JSON as they reduce the I/O time which is spent on connecting to the database, retrieving,and closing the database. Is this true?
If So, is there a way by which I can directly convert my database contents to that of a JSON file? It has 7 tables(relations) now and total of 850 rows(tuples) of data. The data is in Kannada language and not English (if this makes any diff).
Yes, with 850 read-only records that you don't join, reading from a file rather than a database is way faster.
What you want to do is save your data in files (one per table will come in handy) and read them once at program start into a list of classes that look like your respective table structure. You can then operate on those lists a lot faster than on a database. You can use JSON as a file format, or XML or CSV if your data is simple, or anything else you can come up with.
How you do this in detail is way to broad as a question for this site. Read a few turorials and ask a more detailed question once you have any problems with it.

Programmatically saving a SQL Server database to xml files and restoring it again

I want to save a whole MS SQL 2008 Database into XML files... using asp.net.
Now I am bit lost here.. what would be the best method to achieve this? Datasets?
And I need to restore the database later again.. using these XML files. I am thinking about using datasets for reading the tables and writing to xml and using the SQLBulkCopy class to restore the database again. But I am not sure whether this would be the right approach..
Any clues and tips for me?
If you will need to restore it on the same server type (I mean SQL Server 2008 or higher) and don't care about ability to see actual data inside the XML do the following:
Programmatically backup the DB using "BACKUP DATABASE" T-SQL
Compress the backup
Convert the backup to Base64
Place the backup as the content of the XML file (like: <database name="..." compressionmethod="..." compressionlevel="...">the Base64 content here</database>
On the server where you need to restore it, download the XML, extract the Base64 content, use the attributes to know what compression was used. Decompress and restore using T-SQL "RESTORE" command.
Would that approach work?
For sure, if you need to see the content of the database, you would need to develop the XML scheme, go through each table etc. But, you won't have SPs/Views and other items backed up.
Because you are talking about a CMS, I'm going to assume you are deploying into hosted environments where you might not have command line access.
Now, before I give you the link I want to state that this is a BAD idea. XML is way too verbose to transfer large amounts of data. Further, although it is relatively easy to pull data out, putting it back in will be difficult and a very time consuming development project in itself.
Next alert: as Denis suggested, you are going to miss all of your stored procedures, functions, etc. Your best bet is to use the normal sql server backup / restore process. (Incidentally, I upvoted his answer).
Finally, the last time I dealt with XML and SQL Server we noticed interesting issues that cropped up when data exceeded a 64KB boundary. Basically, at 63.5KB, the queries ran very quickly (200ms). At 64KB, the query times jumped to over a minute and sometimes quite a bit longer. We didn't bother testing anything over 100KB as that was taking 5 minutes on a fast/dedicated server with zero load.
http://msdn.microsoft.com/en-us/library/ms188273.aspx
See this for putting it back in:
How to insert FOR AUTO XML result into table?
For kicks, here is a link talking about pulling the data out as json objects: http://weblogs.asp.net/thiagosantos/archive/2008/11/17/get-json-from-sql-server.aspx
you should also read (not for the faint of heart): http://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/
Of course, the commentors all recommend building something using a CLR approach, but that's probably not available to you in a shared database hosting environment.
At the end of the day, if you are truly insistent on this madness, you might be better served by simply iterating through your table list and exporting all the data to standard CSV files. Then, iterating the CSV files to load the data back in ala C# - is there a way to stream a csv file into database?
Bear in mind that ALL of the above methods suffer from
long processing times due to the data overhead; which leads to
a high potential for failure due to the various time outs (page processing, command, connection, etc); and,
if your data model changes between the time it was exported and reimported then you're back to writing custom translation code and ultimately screwed anyway.
So, only do this if you really really have to and are at least somewhat of a masochist at heart. If the purpose is simply to transfer some data from one installation to another, you might consider using one of the tools like SQL Compare and SQL Data Compare from RedGate to handle the transfer.
I don't care how much (or little) you make, the $1500 investment in their developer bundle is much cheaper than the months of time you are going to spend doing this, fixing it, redoing it, fixing it again, etc. (for the record I do NOT work for them. Their products are just top notch.)
Red Gate's SQL Packager lets you package a database into an exe or to a VS project, so you might want to take a look at that. You can specify which tables you want to consider for data.
Is there any specific reason you want to do this using xml?

whats the best way to load a large amount of SQL data in .net and generate a CSV file?

i have a sql proc that returns 20,000 + records and want to get this data into a CSV for a SQL2005 bulk load operation.
i think using a data set is overkill since i need only forward only read access to the data.
now i have a data reader but dont think iterating the data-reader is a good idea cause it will lock the oracle DB i am getting the 20,000 records from for some time while its done doing its thing.
logically i am thinking to create a disconnected snapshot of the data in a data table maybe and use that to generate my csv file.
i dont often develop such ETL apps so i wanted to know whats the gold standard on this type of operation.
thoughts?
also, allow me to mention that this needs to be a console app since CORP rules wont allow linked servers and anything cool - so that means SSIS is out.
Since you are worried about doing the iteration of the datareader yourself i could recommend using SqlBulkCopy class.
It lets you load data into an sql server database from any source than can be read with an IDataReader instance
Might solve your potential locking issue.

Efficient way to analyze large amounts of data?

I need to analyze tens of thousands of lines of data. The data is imported from a text file. Each line of data has eight variables. Currently, I use a class to define the data structure. As I read through the text file, I store each line object in a generic list, List.
I am wondering if I should switch to using a relational database (SQL) as I will need to analyze the data in each line of text, trying to relate it to definition terms which I also currently store in generic lists (List).
The goal is to translate a large amount of data using definitions. I want the defined data to be filterable, searchable, etc. Using a database makes more sense the more I think about it, but I would like to confirm with more experienced developers before I make the changes, yet again (I was using structs and arraylists at first).
The only drawback I can think of, is that the data does not need to be retained after it has been translated and viewed by the user. There is no need for permanent storage of data, therefore using a database might be a little overkill.
It is not absolutely necessary to go a database. It depends on the actual size of the data and the process you need to do. If you are loading the data into a List with a custom class, why not use Linq to do your querying and filtering? Something like:
var query = from foo in List<Foo>
where foo.Prop = criteriaVar
select foo;
The real question is whether the data is so large that it cannot be loaded up into memory confortably. If that is the case, then yes, a database would be much simpler.
This is not a large amount of data. I don't see any reason to involve a database in your analysis.
There IS a query language built into C# -- LINQ. The original poster currently uses a list of objects, so there is really nothing left to do. It seems to me that a database in this situation would add far more heat than light.
It sounds like what you want is a database. Sqlite supports in-memory databases (use ":memory:" as the filename). I suspect others may have an in-memory mode as well.
I was facing the same problem that you faced now while I was working on my previous company.The thing is I was looking a concrete and good solution for a lot of bar code generated files.The bar code generates a text file with thousands of records with in a single file.Manipulating and presenting the data was so difficult for me at first.Based on the records what I programmed was, I create a class that read the file and loads the data to the data table and able to save it in database. The database what I used was SQL server 2005.Then I able to manage the saved data easily and present it which way I like it.The main point is read the data from the file and save to it to the data base.If you do so you will have a lot of options to manipulate and present as the way you like it.
If you do not mind using access, here is what you can do
Attach a blank Access db as a resource
When needed, write the db out to file.
Run a CREATE TABLE statement that handles the columns of your data
Import the data into the new table
Use sql to run your calculations
OnClose, delete that access db.
You can use a program like Resourcer to load the db into a resx file
ResourceManager res = new ResourceManager( "MyProject.blank_db", this.GetType().Assembly );
byte[] b = (byte[])res.GetObject( "access.blank" );
Then use the following code to pull the resource out of the project. Take the byte array and save it to the temp location with the temp filename
"MyProject.blank_db" is the location and name of the resource file
"access.blank" is the tab given to the resource to save
If the only thing you need to do is search and replace, you may consider using sed and awk and you can do searches using grep. Of course on a Unix platform.
From your description, I think linux command line tools can handle your data very well. Using a database may unnecessarily complicate your work. If you are using windows, these tools are also available by different ways. I would recommend cygwin. The following tools may cover your task: sort, grep, cut, awk, sed, join, paste.
These unix/linux command line tools may look scary to a windows person but there are reasons for people who love them. The following are my reasons for loving them:
They allow your skill to accumulate - your knowledge to a partially tool can be helpful in different future tasks.
They allow your efforts to accumulate - the command line (or scripts) you used to finish the task can be repeated as many times as needed with different data, without human interaction.
They usually outperform the same tool you can write. If you don't believe, try to beat sort with your version for terabyte files.

.Net Data Handling Suggestions

I am just beginning to write an application. Part of what it needs to do is to run queries on a database of nutritional information. What I have is the USDA's SR21 Datasets in the form of flat delimited ASCII files.
What I need is advice. I am looking for the best way to import this data into the app and have it easily and quickly queryable at run time. I'll be using it for all the standard things. Populating controls dynamically, Datagrids, calculations, etc. I will also need to do user specific persistent data storage as well. This will not be a commercial app, so hopefully that opens up the possibilities. I am fine with .Net Framework 3.5 so Linq is a possibility when accessing the data (just don't know if it would be the best solution or not). So, what are some suggestions for persistent storage in this scenario? What sort of gotchas should I be watching for? Links to examples are always appreciated of course.
It looks pretty small, so I'd work out an appropriate object model, load the whole lot into memory, and then use LINQ to Objects.
I'm not quite sure what you're asking about in terms of "persistent storage" - aren't you just reading the data? Don't you already have that in the text files? I'm not sure why you'd want to introduce anything else.
I would import the flat files into SQL Server and access via standard ADO.NET functionality. Not only is DB access always better (more robust and powerful) than file I/O as far as data querying and manipulation goes, but you can also take advantage of SQL Server's caching capabilities, especially since this nutritional data won't be changing too often.
If you need to download updated flat files periodically, then look into developing a service that polls for these files and imports into SQL Server automatically.
EDIT: I refer to SQL Server, but feel free to use any DBMS.
My temptation would be to import the data into SQL Server (Express if you aren't looking to deploy the app) as it's a familiar source for me. Alternatively you can probably create an ODBC data source using the text file handler to get you a database-like connection.
I agree that you would benefit from a database, especially for rapid querying, and even more so if you are saving user changes to the data. In order to load the flat file data into a SQL Server (including Express), you can use SSIS.
Use Linq or text data to list method
1.create a list.
2.Read the text file line by line (or all lines).
3.process the line - get required data and attach to the list.
4.process the list for any further use.
the persistence storage will be files and List is volatile.

Categories