How can I save large amounts of data in C#?

How can I save large amounts of data in C#? - c#

I'm writing a program in C# that will save lots of data points and then later make a graph. What is the best way to save these points?
Can I just use a really long array or should I use a text file or excel file or something like that?
Additional information: It probably wont be more than a couple thousand. And it would be good if I could access it from a windows mobile app. Basically a user will be able to save times that something happens at, and then the app will use the data to find a cross correlation.

If it's millions or even thousands of records, I would probably look at using a database. You can get SQL Server 2008 Express for free, or use MySQL, or something like that.
If you go that route, LINQ to SQL makes database access a piece of cake in .NET. Entity Framework is also available, but LINQ to SQL probably has a quicker time-to-implement.

If you use a text file or excel file, etc. You'll still need to load them back into memory to plot the graph.
So if you're collecting data over a long period of time, or you want to plot the graph some time in the future, write them to a plain text file. When you're ready to plot the graph, load the file up and plot the graph.
If the data collection is within a short period of time, don't bother writing to a file - it'll just add steps to the process for nothing.

A really easy way of doing this would be to serialize your object list into a BinaryWriter or XMLWriter, which automatically format your data into a readable and writable format so that, when your program needs to load the data, all you have to do is deserialize it (1 line of code).
Alternatively, if you have very many records, I suggest trying to use a database. It's quite easy to interface C# with SQL Server (there's a free version called Express Edition) or MySQL, and storing and retrieving huge amounts of data is not a pain. This would be the most efficient way to accomplish your task.
Depending on how much data you have and whether you want to accomplish something like this with 1 line of code (serialization) or interface with a seperate product (the database approach), you can choose either one of the above. Of course, if you wanted to, you could just manually write the contents of your data to a text file or CSV file, as you suggested, but, from personal experience, I recommend the methods I explained above.

It probably wont be more than a couple thousand. And it would be good if I could access it from a windows mobile app. Basically a user will be able to save times that something happens at, and then the app will use the data to find a cross correlation.

Is there any need for interoperability with other processes? If so, time to swat-up on file formats.
However, from the sound of it, you're asking on a matter of "style", with no real requirement to open the file anywhere but your own app. I'd suggest using a BinaryWriter for the task.
If debugging is an issue, a human-readable format might be preferable, but would be considerably larger than the binary equivalent.
Probably the quickest way to do it would be using binary serialization.

Related

Write binary file database C#

my concern is that, im trying to make a program that will show up let's say images of cars to create a collection. I'm using windows forms and i have made the grapchical interface i want.
The thing is i have a xml file (like database) with 300 cars(elements) and each car has a some children elements one of them being the "carname.png" so i can use it to find the file image to show also. My concern here is if there is a way to convert this xml file into a binary one that my program will be able to read much faster. This is for efficiency and self-educational-practice purposes as I am young in programming. Thanx in advance for your time and thoughts.

Simply import your XML file into a suitable database and query that instead. If it naturally fits into relational form, use Sqlite as first choice. In the Windows environment as a learning experience you might prefer to use SqlServer but it's overkill.
If the data does not fit naturally into the relational form then use a no-sql database like MongoDB and store your XML records (or JSON or whatever) in it. You get fast access and XML flexibility together.

C# Winform database

I am building a Winform application that need a database.
The database needs to save an array of items of a custom class:
Name
Date
Duration
Artist
Genre
If I should build the database using a file that every time, when I increase the array, I will save. Is there wait time to save an array of 300 or so items?
And the second database is to use SQL.
What is the difference between them? And what should I use?

As someone mentioned in a comment, SQLite should work very well for this type of scenario.
If you think your data set will remain fairly small, you might consider XML, or a file, or something else if you think that would be quicker/easier.
In any case, I would strongly recommend that you hide your storage-logic behind an interface, and call only that from the winforms part of your application. This way you will be able to replace your storage-solution later if you should need to.
Update in response to comment: The reason for using SQLite instead of another DB System is that SQLite can be integrated directly into your application. Other DBMS`s will typically be external systems, that you just connect to from within your app.
A quick google search will provide you lots of info, such as this short article about using SQLite within a C# application.

I think you have to think about the futured size of your data.
If you know that i future the data will grow up exponentially, i think you have to use a database System like SQL.
Otherwise if it is only for a few records, you can use a XML File instead.
If you are using a MS SQL Database, you can open a Connection while saving your data, and write it with a sqladapter into the database.
If you are using a XML file instead, you can use the XMLSerializer class for serialization of your own Business object.

File vs database? - it is easy. What is database - it is a file. Only it has an engine that knows how to manipulate that file.
If you use file, you suddenly need to think, "what if?". What if file gets corrupted during write. Or what if computer shuts down in the middle of write? DBMS takes care of this issues by issuing all sorts of mechanisms such as uncommitted data files, etc. Now you will need to provide this mechanism yourself.
This is why you should write to file only non-critical data. For example, some user settings. Because if you lost that file, user can re-size controls again but no data will be at loss. Or log file is another good use of file. Because if you lose a log, you can live without. But if you lose months of worth of data...
In your case, I don't know, how user history is important. 300 items is not a large array. You can use XML by creating an object (class) and mark its properties with XML attributes and then use XML serializer to serialize your history into XML
http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlserializer.aspx
But if it is going to grow and you not planning to age some of it and delete, look into RDBMS.

The best way to save huge amount of financial tick data of forex

I have lot of Forex Tick Data to be saved. My question is what is the best way?
Here is am example: I collect only 1 month data from the EURUSD pair. It is originally in CSV file which is 136MB large and has 2465671 rows. I use a library written by : http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader and it took around 30 seconds to read all the ticks and saved it in 2465671 objects. first of all, whether it is fast enough?
Secondly, is there any way better than CSV? For example, the binary file which might be faster and whether you have any recommendation about any database which is best? I tried the db4o but it is not very impressive. I think here are some overhead to save data as properties of object and when we have to save 2465671 objects in Yap file of db4o.

I've thought about this before, and if I was collecting this data, I would break up the process:
collect data from the feed, form a line (I'd use fixed width), and append it to a text file.
I would create a new text file every minute and name it something like rawdata.yymmddhhmm.txt
Then I would have another process working in the background reading these files and pushing then into a database via a parameterized insert query.
I would probably use text over a binary file because I know that would append without any problems, but I'd also look into opening a binary file for append as well. This might actually be a bit better.
Also, you want to open the file in append mode since that's the fastest way to write to a file. This will obviously need to be super fast.

Perhaps look at this product:
http://kx.com/kdb+.php
it seems to made for that purpose.

One way to save data space (and hopefully time) is to save numbers as numbers and not as text, which is what CSV does.
You can perhaps make an object out of each row, and the make the reading and writing each object a serialization problem, which there is good support for in C#.

Kx's kdb database would be a great of-the-shelf package if you had a few million to spare. However you could easily write your own column-orientated database to store and analyse high-frequency data for optimal performance.

I save terabytes as compressed binary files (GZIP) that I dynamically uncompress using C#/.NETs built-in gzip compression/decompression readers.

HDF5 is widely used for big data, including by some financial firms. Unlike KDB it's free to use, and there are plenty of libraries to go on top of it, such as the .NET wrapper
This SO question might help you get started.
HDF5 homepage

Programmatically saving a SQL Server database to xml files and restoring it again

I want to save a whole MS SQL 2008 Database into XML files... using asp.net.
Now I am bit lost here.. what would be the best method to achieve this? Datasets?
And I need to restore the database later again.. using these XML files. I am thinking about using datasets for reading the tables and writing to xml and using the SQLBulkCopy class to restore the database again. But I am not sure whether this would be the right approach..
Any clues and tips for me?

If you will need to restore it on the same server type (I mean SQL Server 2008 or higher) and don't care about ability to see actual data inside the XML do the following:
Programmatically backup the DB using "BACKUP DATABASE" T-SQL
Compress the backup
Convert the backup to Base64
Place the backup as the content of the XML file (like: <database name="..." compressionmethod="..." compressionlevel="...">the Base64 content here</database>
On the server where you need to restore it, download the XML, extract the Base64 content, use the attributes to know what compression was used. Decompress and restore using T-SQL "RESTORE" command.
Would that approach work?
For sure, if you need to see the content of the database, you would need to develop the XML scheme, go through each table etc. But, you won't have SPs/Views and other items backed up.

Because you are talking about a CMS, I'm going to assume you are deploying into hosted environments where you might not have command line access.
Now, before I give you the link I want to state that this is a BAD idea. XML is way too verbose to transfer large amounts of data. Further, although it is relatively easy to pull data out, putting it back in will be difficult and a very time consuming development project in itself.
Next alert: as Denis suggested, you are going to miss all of your stored procedures, functions, etc. Your best bet is to use the normal sql server backup / restore process. (Incidentally, I upvoted his answer).
Finally, the last time I dealt with XML and SQL Server we noticed interesting issues that cropped up when data exceeded a 64KB boundary. Basically, at 63.5KB, the queries ran very quickly (200ms). At 64KB, the query times jumped to over a minute and sometimes quite a bit longer. We didn't bother testing anything over 100KB as that was taking 5 minutes on a fast/dedicated server with zero load.
http://msdn.microsoft.com/en-us/library/ms188273.aspx
See this for putting it back in:
How to insert FOR AUTO XML result into table?
For kicks, here is a link talking about pulling the data out as json objects: http://weblogs.asp.net/thiagosantos/archive/2008/11/17/get-json-from-sql-server.aspx
you should also read (not for the faint of heart): http://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/
Of course, the commentors all recommend building something using a CLR approach, but that's probably not available to you in a shared database hosting environment.
At the end of the day, if you are truly insistent on this madness, you might be better served by simply iterating through your table list and exporting all the data to standard CSV files. Then, iterating the CSV files to load the data back in ala C# - is there a way to stream a csv file into database?
Bear in mind that ALL of the above methods suffer from
long processing times due to the data overhead; which leads to
a high potential for failure due to the various time outs (page processing, command, connection, etc); and,
if your data model changes between the time it was exported and reimported then you're back to writing custom translation code and ultimately screwed anyway.
So, only do this if you really really have to and are at least somewhat of a masochist at heart. If the purpose is simply to transfer some data from one installation to another, you might consider using one of the tools like SQL Compare and SQL Data Compare from RedGate to handle the transfer.
I don't care how much (or little) you make, the $1500 investment in their developer bundle is much cheaper than the months of time you are going to spend doing this, fixing it, redoing it, fixing it again, etc. (for the record I do NOT work for them. Their products are just top notch.)

Red Gate's SQL Packager lets you package a database into an exe or to a VS project, so you might want to take a look at that. You can specify which tables you want to consider for data.
Is there any specific reason you want to do this using xml?

Efficient way to analyze large amounts of data?

I need to analyze tens of thousands of lines of data. The data is imported from a text file. Each line of data has eight variables. Currently, I use a class to define the data structure. As I read through the text file, I store each line object in a generic list, List.
I am wondering if I should switch to using a relational database (SQL) as I will need to analyze the data in each line of text, trying to relate it to definition terms which I also currently store in generic lists (List).
The goal is to translate a large amount of data using definitions. I want the defined data to be filterable, searchable, etc. Using a database makes more sense the more I think about it, but I would like to confirm with more experienced developers before I make the changes, yet again (I was using structs and arraylists at first).
The only drawback I can think of, is that the data does not need to be retained after it has been translated and viewed by the user. There is no need for permanent storage of data, therefore using a database might be a little overkill.

It is not absolutely necessary to go a database. It depends on the actual size of the data and the process you need to do. If you are loading the data into a List with a custom class, why not use Linq to do your querying and filtering? Something like:
var query = from foo in List<Foo>
where foo.Prop = criteriaVar
select foo;
The real question is whether the data is so large that it cannot be loaded up into memory confortably. If that is the case, then yes, a database would be much simpler.

This is not a large amount of data. I don't see any reason to involve a database in your analysis.
There IS a query language built into C# -- LINQ. The original poster currently uses a list of objects, so there is really nothing left to do. It seems to me that a database in this situation would add far more heat than light.

It sounds like what you want is a database. Sqlite supports in-memory databases (use ":memory:" as the filename). I suspect others may have an in-memory mode as well.

I was facing the same problem that you faced now while I was working on my previous company.The thing is I was looking a concrete and good solution for a lot of bar code generated files.The bar code generates a text file with thousands of records with in a single file.Manipulating and presenting the data was so difficult for me at first.Based on the records what I programmed was, I create a class that read the file and loads the data to the data table and able to save it in database. The database what I used was SQL server 2005.Then I able to manage the saved data easily and present it which way I like it.The main point is read the data from the file and save to it to the data base.If you do so you will have a lot of options to manipulate and present as the way you like it.

If you do not mind using access, here is what you can do
Attach a blank Access db as a resource
When needed, write the db out to file.
Run a CREATE TABLE statement that handles the columns of your data
Import the data into the new table
Use sql to run your calculations
OnClose, delete that access db.
You can use a program like Resourcer to load the db into a resx file
ResourceManager res = new ResourceManager( "MyProject.blank_db", this.GetType().Assembly );
byte[] b = (byte[])res.GetObject( "access.blank" );
Then use the following code to pull the resource out of the project. Take the byte array and save it to the temp location with the temp filename
"MyProject.blank_db" is the location and name of the resource file
"access.blank" is the tab given to the resource to save

If the only thing you need to do is search and replace, you may consider using sed and awk and you can do searches using grep. Of course on a Unix platform.

From your description, I think linux command line tools can handle your data very well. Using a database may unnecessarily complicate your work. If you are using windows, these tools are also available by different ways. I would recommend cygwin. The following tools may cover your task: sort, grep, cut, awk, sed, join, paste.
These unix/linux command line tools may look scary to a windows person but there are reasons for people who love them. The following are my reasons for loving them:
They allow your skill to accumulate - your knowledge to a partially tool can be helpful in different future tasks.
They allow your efforts to accumulate - the command line (or scripts) you used to finish the task can be repeated as many times as needed with different data, without human interaction.
They usually outperform the same tool you can write. If you don't believe, try to beat sort with your version for terabyte files.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.