Is File Reference for Deserialization acceptable in a database?

Is File Reference for Deserialization acceptable in a database? - c#

I have a situation where I need to store some data that just won't ...really fit into a database table. It is a little too abstract, and I've not enough knowledge to piecemeal it in such a way that it could be broken into tables and columns. The object in question is a System.Linq.Expressions.Expression<T>.
I have discovered a means of serializing such to xml using MetaLinq. and it works pretty well, albeit the xml it generates is excessively obese, I somewhat expected this much from something as complicated as an Expression. A modest expression turns out to around 19 kb.
So my thought was to use gzip compression on the file. This works well, it saves it to about 2 kb.
So then, my actual question is this : is it bad practice or 'dangerous' practice to basically use a table column to reference a filename for deserialization for an object? Like I would have a table for expressions, and it would have a filename, when that expression was called it would perform the gzip decompression, deserialize it, and return the object.
This seems like the ideal solution but it requires a lot of File I/O and a lot of various compression/zipping/serialization. I'm wondering if I could get the opinion of more experienced database admins out there. I am using Fluent nHibernate as my ORM mapper.
MetaLinq on codeplex

Not an experienced DBA, but I would store the serialized data in a BLOB field in the database. Database backups do no good if the files your data is depending on go away or vice versa. I think it would simplify things to just keep it all together. And the blob works fine since the data you are storing does not need to be queried.

Depends on the size of the data.
Sql has an XML data type for table columns now. So you could deserialize the object and then insert the whole object in the column again depending on size.
But if you must use the file system I would store a path and the file name in the column.
In your programs app.config keep the root of the drive like \\MyDrive or d:\
That way if information moves, just change the app config as long as the folder/file structure stays the same.
Edit:
Along with NerdFury suggestion you could you a binary serializer if you do not need to "see" the data in the database. XML serialization at least makes it readable

Related

Making XML in SQL Server faster - convert to tables?

I have a field in the database that is XML because it represents a class that is used in C#/VB.Net. Problem is after the initial manipulation most, but not all, of the manipulation is done in SQL Server. This means that the XML field is converted on the fly.
As we are getting more fields and more records, this operation is getting slow. I think the slow down is the converting all of those fields to other data types.
So to speed it up I was thinking of a couple of ways:
Have a set of tables that represent the different pieces of the XML data. I would make these tables read only using a trigger on Insert/Update that would reject any changes. My 'main' table with the XML in it when it updates the XML would turn off the triggers, update the tables with the new values then turn the triggers back on.
The only real reason we use the XML is because it's really easy to convert it to the class in C#/VB.Net. But I'm getting the point where I may end up writing a routine that will take all the bits and pieces and convert it to a class and also a function to go the other way (class -> tables).
Can anybody give any ideas on a better way to do this? I'm not tied to the idea of using the XML structure. My concern is if we have separate tables to speed up SQL processing and somebody changes the value of a field in that table we have to make sure the XML is updated. Or don't allow the person to update it.
TIA - Jeff.

What is the purpose of the objects you are saving? If anything other than persistence of state, you are not doing yourself any favors and you are not properly separating concerns. If they are persistence of state, then at minimum, make columns out of the properties and fields (can include private as long as you leave an internal method to set the values when you reconstitute).

Disregarding the wisdom of what you're doing, you might look into creating an XML index. This should help you get started: http://msdn.microsoft.com/en-us/library/ms345121%28v=sql.90%29.aspx
The basic idea is that the right index can 'pre-shred' your XML and automatically build the sor of tables you are thinking of doing 'manually'. A downside is that this can really explode your storage requirements if you are storing lots of XML.

Best way to save/load large amout of data in a .Net application?

What is the best way to save large amount of data for a .Net 4.0 application?
Right now I am using Lists and serializing to a file in "User Data" folder, and its working ok, but I want to know if there is a better/faster way of saving/loading large amount of data.
The data that I am saving contains only lots of words, like documents.
The size of the data is almost 1 mb.

That really depends on the type of your application. I wouldn't use SQL database of any sort for to just load and save operation of data that I do not need to query or transform. The time it will take to map your object graph to a relational model just not worth it.
Also I don't believe it will ever be faster than simple serialization due to the overhead associated with databases (connection management and mapping)
My recent experience was with BinnaryFormatter which had excellent results (files ~ 15mb). Worse come to worse you can always write your own formatter.

Kinda depends on your data and how you have it stored in your app.
But all these NoSQL storage systems are a possibility or just plain binary data into a file.

When you say "large amout [sic] of data", what exactly do you mean by that? A megabyte? a terabyte?
And what exactly is the data?
If it's a set of account records, it might well belong in a database of some sort; if it's a set of images or word processing documents, perhaps not.

If you want fast access, one approach would be to serialize to a hashtable, and cache it. In between reads and writes...
Problem here is ofcourse, versioning, changing of namespaces(then you wont be able to deserialize....easyly), deadlocks, concurrency etc....
Better if you save the file as a XML/JSON, and when you do read it in to memory save it into a hashtable...for fast access...

Efficient way to analyze large amounts of data?

I need to analyze tens of thousands of lines of data. The data is imported from a text file. Each line of data has eight variables. Currently, I use a class to define the data structure. As I read through the text file, I store each line object in a generic list, List.
I am wondering if I should switch to using a relational database (SQL) as I will need to analyze the data in each line of text, trying to relate it to definition terms which I also currently store in generic lists (List).
The goal is to translate a large amount of data using definitions. I want the defined data to be filterable, searchable, etc. Using a database makes more sense the more I think about it, but I would like to confirm with more experienced developers before I make the changes, yet again (I was using structs and arraylists at first).
The only drawback I can think of, is that the data does not need to be retained after it has been translated and viewed by the user. There is no need for permanent storage of data, therefore using a database might be a little overkill.

It is not absolutely necessary to go a database. It depends on the actual size of the data and the process you need to do. If you are loading the data into a List with a custom class, why not use Linq to do your querying and filtering? Something like:
var query = from foo in List<Foo>
where foo.Prop = criteriaVar
select foo;
The real question is whether the data is so large that it cannot be loaded up into memory confortably. If that is the case, then yes, a database would be much simpler.

This is not a large amount of data. I don't see any reason to involve a database in your analysis.
There IS a query language built into C# -- LINQ. The original poster currently uses a list of objects, so there is really nothing left to do. It seems to me that a database in this situation would add far more heat than light.

It sounds like what you want is a database. Sqlite supports in-memory databases (use ":memory:" as the filename). I suspect others may have an in-memory mode as well.

I was facing the same problem that you faced now while I was working on my previous company.The thing is I was looking a concrete and good solution for a lot of bar code generated files.The bar code generates a text file with thousands of records with in a single file.Manipulating and presenting the data was so difficult for me at first.Based on the records what I programmed was, I create a class that read the file and loads the data to the data table and able to save it in database. The database what I used was SQL server 2005.Then I able to manage the saved data easily and present it which way I like it.The main point is read the data from the file and save to it to the data base.If you do so you will have a lot of options to manipulate and present as the way you like it.

If you do not mind using access, here is what you can do
Attach a blank Access db as a resource
When needed, write the db out to file.
Run a CREATE TABLE statement that handles the columns of your data
Import the data into the new table
Use sql to run your calculations
OnClose, delete that access db.
You can use a program like Resourcer to load the db into a resx file
ResourceManager res = new ResourceManager( "MyProject.blank_db", this.GetType().Assembly );
byte[] b = (byte[])res.GetObject( "access.blank" );
Then use the following code to pull the resource out of the project. Take the byte array and save it to the temp location with the temp filename
"MyProject.blank_db" is the location and name of the resource file
"access.blank" is the tab given to the resource to save

If the only thing you need to do is search and replace, you may consider using sed and awk and you can do searches using grep. Of course on a Unix platform.

From your description, I think linux command line tools can handle your data very well. Using a database may unnecessarily complicate your work. If you are using windows, these tools are also available by different ways. I would recommend cygwin. The following tools may cover your task: sort, grep, cut, awk, sed, join, paste.
These unix/linux command line tools may look scary to a windows person but there are reasons for people who love them. The following are my reasons for loving them:
They allow your skill to accumulate - your knowledge to a partially tool can be helpful in different future tasks.
They allow your efforts to accumulate - the command line (or scripts) you used to finish the task can be repeated as many times as needed with different data, without human interaction.
They usually outperform the same tool you can write. If you don't believe, try to beat sort with your version for terabyte files.

XML to LINQ Question/s

Just before I begin heres a small overview of what I'm trying to achieve and then we'll get down to the gory details. At present I'm developing an application which will monitor a users registry for changes to specific keys which relate to user preferences. Were currently using mandatory profiles (not my choice), anyway the whole idea is to record the changes to a location where they can be writen back to a users registry next time they log on.
At the moment I have the system monitoring registry changes and firing events returning the key, value name and value that have changed. I was entering these into a list to create a single string containing all the data, then writing that list to a text file every so often. Now this has all been fine but I need to change the way data's held as breaking the strings down into key, value name and value again for the write back to registry requires too much overhead and theres also problems breaking the strings up in a uniquely identifiable fashion.
So it was suggested to me to look at XML, which I haven't used before and I've begun investigating it and it all looks simple enough, I've also used LINQ before to connect to embedded databases. What I'm currently struggling to get my head around is how LINQ is able to retrieve and manipulate the data in memory from XML, as I don't want to be constantly accessing the XML file due to a need to keep the application as quick as possible. At present all changes in the registry are cached into a List(String) then written to a text file every minute or so.
At the moment what I have is the system returning the key, value name and value in different strings, converging these into a single List(String) value, where as what I'm going to need is table or equivalent representing a key, which contains multiple value names with each value name containing a single value and finally a type (this wil be a number representing what kind of registry value this is, REG SZ, REG BINARY etc). Both in the XML file and the program it self.
Also what I don't quite get is unlike a database the tables and there schemas won't exist until the program first runs as it will create a new XML file rather than it already existing. This is due to the information being writen back to the users personal drive, so it has to be created when it first runs on the users machine.
I've tried a few links and tutorials etc but nothing has clicked just yet, so if you have an example or could maybe explain it to me a little better it would be appreciated.
Just one final bit I want to add is that my current idea for storing the data in program is to create a List of values, embedded in a List of value names and a list of value names embedded in a list of keys. Does that sound ok?
Now I know this is long, and kind of all over the place, so if someone could help it would be appreciated or if you require further information of clarification please let me know and I'll try my best.
Thanks

From what I understand, you are just thinking in the wrong direction. Your application does not want to manipulate XML in memory. You just want to work with some data structure in memory and would like to have an easy way to store it to disc and to read it back? If I understand that right:
Don't care about LINQ for XML. Just have a look at the build in XML serialization infrastructure. Then build an internal data structure which fits your applications needs and use an XmlSerializer to write it to disc and to read it back. No need to touch any XML by hand!

From what you describe it does seem like a good idea to use XML here.
As for accessing the XML dta in memory, I found the MSDN documentation quite helpful:
http://msdn.microsoft.com/en-us/library/bb387098.aspx
The basic idea is, LINQ-to-XML is just LINQ-to-Objects, working with objects that represent the XML elements.
I'm afraid I don't quite get your second question.

Storing settings: XML vs. SQLite?

I am currently writing an IRC client and I've been trying to figure out a good way to store the server settings. Basically a big list of networks and their servers as most IRC clients have.
I had decided on using SQLite but then I wanted to make the list freely available online in XML format (and perhaps definitive), for other IRC apps to use. So now I may just store the settings locally in the same format.
I have very little experience with either ADO.NET or XML so I'm not sure how they would compare in a situation like this.
Is one easier to work with programmatically? Is one faster? Does it matter?

It's a vaguer question than you realize. "Settings" can encompass an awful lot of things.
There's a good .NET infrastructure for handling application settings in configuration files. These, generally, are exposed to your program as properties of a global Settings object; the classes in the System.Configuration namespace take care of reading and persisting them, and there are tools built into Visual Studio to auto-generate the code for dealing with them. One of the data types that this infrastructure supports is StringCollection, so you could use that to store a list of servers.
But for a large list of servers, this wouldn't be my first choice, for a couple of reasons. I'd expect that the elements in your list are actually tuples (e.g. host name, port, description), not simple strings, in which case you'll end up having to format and parse the data to get it into a StringCollection, and that is generally a sign that you should be doing something else. Also, application settings are read-only (under Vista, at least), and while you can give a setting user scope to make it persistable, that leads you down a path that you probably want to understand before committing to.
So, another thing I'd consider: Is your list of servers simply a list, or do you have an internal object model representing it? In the latter case, I might consider using XML serialization to store and retrieve the objects. (The only thing I'd keep in the application configuration file would be the path to the serialized object file.) I'd do this because serializing and deserializing simple objects into XML is really easy; you don't have to be concerned with designing and testing a proper serialization format because the tools do it for you.
The primary reason I look at using a database is if my program performs a bunch of operations whose results need to be atomic and durable, or if for some reason I don't want all of my data in memory at once. If every time X happens, I want a permanent record of it, that's leading me in the direction of using a database. You don't want to use XML serialization for something like that, generally, because you can't realistically serialize just one object if you're saving all of your objects to a single physical file. (Though it's certainly not crazy to simply serialize your whole object model to save one change. In fact, that's exactly what my company's product does, and it points to another circumstance in which I wouldn't use a database: if the data's schema is changing frequently.)

I would personally use XML for settings - .NET is already built to do this and as such has many built-in facilities for storing your settings in XML configuration files.
If you want to use a custom schema (be it XML or DB) for storing settings then I would say that either XML or SQLite will work just as well since you ought to be using a decent API around the data store.

Every tool has its own right
There is plenty of hype arround XML, I know. But you should see, that XML is basically an exchange format -- not a storage format (unless you use a native XML-Database that gives you more options -- but also might add some headaches).
When your configuration is rather small (say less than 10.000 records), you might use XML and be fine. You will load the whole thing into your memory and access the entries there. Done.
But when your configuration is so big, that you dont want to load it completely, than you rethink your decission and stay with SQLite which gives you the option to dynamically load those parts of the configuration you need.
You could also provide a little tool to create a XML file from the DB-content -- creation of XML from a DB is a rather simple task.

Looks like you have two separate applications here: a web server and a desktop client (because that is traditionally where these things run), each with its own storage needs.
On the server side: go with a relational data store, not Xml. Basically at some point you need to keep user data separate from other user data on the server. XML is not a good store for that.
On the client: it doesn't really matter. Xml will probably be easier for you to manipulate. And don't think that because you are using one technology in one setting, you have to use it in the other.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.