Is there a standard data structure for a "database"-like object? - c#

I'm designing an upgrade to an older C++ assembly and want to (re-)do this right.
The situation is, I've got an object that is basically a "database" (it's various bits of data taken over lots of time-steps). The issue is I don't know what each "column" will hold until run-time.
There are about 30 column name possibilities. Speed is a major concern.
The old structure uses an array to store the column names (they're enums), and an array of arrays to hold the data. You'd find the column name in the header array, then use that column index to find the right bit of data in the current data "row". It's a perfectly viable idea, but it seems a bit...out of date.
I'd include each possible column as a property in a "row" object, it seems this would massively inflate the memory usage and creation time (which isn't acceptable).
Is there a standard structure for this kind of thing?

Yes, take a look at the DataSet class in the System.Data namespace.
The DataSet is the top-level container for DataTables, Relations and other "database-like" objects. In the DataTables you can define your columns and populate your rows as you wish. You can also easily persist the whole thing via serialization.

Related

Is it possible to have an object in the header of a DataTable?

The question:
Do you guys know if there is any way that I can put an object in the header of a DataTable column, instead of an integer or a string?
Further explanation:
I'm writing a library that, in some moment, will read data from different meteorological stations. The data I'll read will be, for example, temperature, wind speed, atmospheric pressure, etc. These values can be read in different units (km/h, mph, celsius, fahrenheit) and the information about these units will be in a separate source, not together with the data itself. I'll be reading a XML file that will contain all the information about this datafile and, what I wanted to do is create an object with different attributes and use this object as the header of each column of the DataTable. A bit complicated explanation but I think that I was clear enough.
Do you think that it is possible using native .NET types or, if I wanted to do exactly this way I'd have to create my own table class?
Thank you all!
There is a DataColumn.ExtendedProperties collection, which works like a dictionary and can hold any objects.
So every DataTable column can have an object associated with it, which have description of type, units and any other info.

Strategies for modeling large (50~) number of properties

Scenario
I'm parsing emails and inserting them a database using an ORM (NHibernate to be exact). While my current approach does technically work I'm not very fond of it but can't of a better solution. The email contains 50~ fields and is sent from a third party and looks like this (obviously a very short dummy sample).
Field #1: Value 1 Field #2: Value 2
Field #3: Value 3 Field #4: Value 4 Field #5: Value 5
Problem
My problem is that with parsing this many fields the database table is an absolute monster. I can't create proper models employing any kind of relationships either AFAIK because each email sent is all static data and doesn't rely on any other sources.
The only idea I have is to find commonalities between each field and split them into more manageable chunks. Say 10~ fields per entity, so 5 entities total. However, I'm not terribly in love with that idea either seeing as all I'd be doing is create one-to-one relationships.
What is a good way of managing large number of properties that are out of your control?
Any thoughts?
Create 2 tables: 1 for the main object, and the other for the fields. That way you can programatically access each field as necessary, and the object model doesn't look to nasty.
But this is just off the top of my head; you have a weird problem.
If the data is coming back in a file that you can parse easily, then you might be able to get away with creating a command line application that will produce scripts and c# that you can then execute and copy, paste into your program. I've done that when creating properties out of tables from html pages (Like this one I had to do recently)
If the 50 properties are actually unique and discrete pieces of data regarding this one entity, I don't see a problem with having those 50 properties (even though that sounds like a lot) on one object. For example, the Type class has a large number of boolean properties relating to it's data (IsPublic, etc).
Alternatives:
Well, one option that comes to mind immediately is using dynamic object and overriding TryGetMember to lookup the 'property' name as a key in a dictionary of key value pairs (where your real set up of 50 key value pairs exists). Of course, figuring out how to map that from your ORM into your entity is the other problem and you'd lose intellisense support.
However, just throwing the idea out there.
Use a dictionary instead of separate fields. In the database, you just have a table for the field name and its value (and what object it belongs to).

Best Practise to get data for DataGrid from Web Service

I have WPF DataGrid which get his data from Web Service. End user has ability to customize visible columns in DataGrid.
1st approach:
I get this data in xml and after convert xml to the dataTable and give it like ItemsSource for DataGrid.
2nd approach:
Also I can get this data like class array from service (for example Customer[])
Problem:
I use 1st approach with extra steps for the purpose not get redundant data from service.
In 2nd approach if user see only two columns in DataGrid (one column for one property in class) he get all class with all his filled properties (redundant data). in 1st approach he get only data xml which will be visible in datagrid in UI.
But I use MVMM approach in my project and I dont want to use xml and dataTable approach. I think I have to use 2nd approach, but in this case I get redundant data
In 2nd approach if user see only two columns in DataGrid (one column for one property in class) he get all class with all his filled properties (redundant data)
If the above is the only thing that is stopping you with your second approach, then C# v4.0 has Named and Optional Arguments feature. Which works as
Console.WriteLine(Calculate(weight: 123, height: 64));
even if the actual Calculate() has 99 arguments, with any order.
Please note, I assume, that by redundant you mean, unwanted data.
I would take the second approach even though this may transport a little bit more data. If you really want control over what fields are fetched, this will probably make your application more complex then necessary.
Have you verified you have performance problems with the second approach?
It is a just another trade-off we have always faced when developing softwares.
In your specific case,
First approach has performance advantage by transfering much less (not sure if really much) data on network and flexibilty by not using strongly typed data approach.
Second approach is looking better for manageability and easy development in the long term.
To choose the right approach you should consider and weighting non-functional requirements such as performance, extensibility, manageability etc.

Storing Data from Forms without creating 100's of tables: ASP.NET and SQL Server

Let me first describe the situation. We host many Alumni events over the course of each year and provide online registration forms for each event. There is a large chunk of data that is common for each event:
An Event with dates, times, managers, internal billing info, etc.
A Registration record with info about the payment and total amount charged per form submission
Bio/Demographic and alumni data about the 1 or more attendees (name, address, degree, etc.)
We store all of the above data within columns in tables as you would expect.
The trouble comes with the 'extra' fields we are asked to put on the forms. Maybe it is a dinner and there is a Veggie or Carnivore option, perhaps there is lodging and there are bed or smoking options, or perhaps there is an optional transportation option. There are tons of weird little "can you add this to the form?" types of requests we receive.
Currently, we JSONify any non-standard data and store it all in one column (per attendee) called 'extras'. We can read this data out in code but it is not well suited to querying. Our internal staff would like to generate a quick report on Veggie dinners needed for instance.
Other than creating a separate table for each form that holds the specific 'extra' data items, are there any other approaches that could make my life (and reporting) easier? Anyone working in a simialr environment?
This is actually one of the toughest problem to solve efficiently. The SQL Server Customer Advisory Team has dedicated a white-paper to the topic which I highly recommend you read: Best Practices for Semantic Data Modeling for Performance and Scalability.
You basically have 3 options:
semantic database (entity-attribute-value)
XML column
sparse columns
Each solution comes with ups and downs. Out of the top of my hat I'd say XML is probably the one that gives you the best balance of power and flexibility, but the optimal solution really depends on lots of factors like data set sizes, frequency at which new attributes are created, the actual process (human operators) that create-populate-use these attributes etc, and not at least your team skill set (some might fare better with an EAV solution, some might fare better with an XML solution). If the attributes are created/managed under a central authority and adding new attributes is a reasonable rare event, then the sparse columns may be a better answer.
Well you could also have the following db structure:
Have a table to store custom attributes
AttributeID
AttributeName
Have a mapping table between events and attributes with:
AttributeID
EventID
AttributeValue
This means you will be able to store custom information per event. And you will be able to reuse your attributes. You can include some metadata as
AttributeType
AllowBlankValue
to the attribute to handle it easily afterwards
Have you considered using XML instead of JSON? Difference: XML is supported (special data type) and has query integration ;)
quick and dirty, but actually nice for querying: simply add new columns. it's not like the empty entries in the previous table should cost a lot.
more databasy solution: you'll have something like an event ID in your table. You can link this to an n:m table connecting events to additional fields. And then store the additional field data in a table with additional_field_id, record_id (from the original table) and the actual value. Probably creates ugly queries, but seems politically correct in terms of database design.
I understand "NoSQL" (not only sql ;) databases like couchdb let you store arbitrary fields per record, but since you're already with SQL Server, I guess that's not an option.
This is the solution that we first proposed in ASP.NET Forums (that later became Community Server), and that the ASP.NET team built a similar version of in the ASP.NET 2.0 Membership when they released it:
Property Bags on your domain objects
For example:
Event.Profile() or in your case, Event.Extras().
Basically, a property bag is a serialized collection of data stored in a name/value pair in a column (or columns). The ASP.NET 2.0 Membership went the route of storing names in a semi-colon delimited list, and values in the same:
Table: aspnet_Profile
Column: PropertyNames (separated by semi-colons, and has start index and end index)
Column: PropertyValues (separated by semi-colons, and only stores the string value)
The downside to that approach is it is all strings, and manually has to be parsed (even though the membership system does it for you automatically).
Recently, my current method is I've built FormCollection and NameValueCollection C# extension methods that automatically serialize the collections to an XML result. And I store that XML in the table in it's own column associated with that entity. I also have a deserializer C# extension on XElement that deserializes that data back to the collection at runtime.
This gives you the power of actually querying those properties in XML, via SQL (though, that can be slow though - always flatten out your read-only data).
The final note is runtime querying: The general rule we follow is, if you are going to query a property of an entity in normal application logic, then you move that property to an actual column on the table - and create the appropriate indexes. If that data will never be queried directly (for example, Linq-to-Sql or EF), then leave it in the XML Property Bag.
Property Bags gives you the power of extending your domain models however you like, without having to modify the db schema.

How can I create a simple class that is similar to a datatable, but without the overhead?

I want to create a simple class that is similar to a datatable, but without the overhead.
So loading the object with a sqldatareader, and then return this custom datatable-like object that will give me access to the rows and columns like:
myObject[rowID]["columnname"]
How would you go about creating such an object?
I don't want any built in methods/behavior for this object except for accessing the rows and columns of the data.
Update:
I don't want a datable, I want something much leaner (plus I want to learn how to create such an object).
This type of structure can be easily created with a type signature of:
List<Dictionary<string, object>>
This will allow access as you specify and should be pretty easy to populate.
You can always create an object that inherits from List < Dictionary < string, object > > and implements a constructor that takes a SqlDataReader. This constructor should create a enw dictionary for each row, and insert a new entry into the dictionary for each column, using the column name as the key.
I think you're missing something about how .Net works. The extra overhead involved in a DataTable is not significant. Can you point to a specific performance problem in existing code that you believe is caused by a datatable? Perhaps we can help correct that in a more elegant way.
Perhaps the specific thing you're asking about is how to use the convenient ["whatever"] indexing syntax in your own table object.
If so, I suggest you refer to this MSDN page on indexers.
Dictionary<int,object[]> would be better than List<Dictionary<string, object>>. You don't really need a dictionary for each row, since column names are the same for all rows. And if you want to have it lightweight, you should use column indexes instead of names.
So if you have a column "Name" that is a 3rd column, to get its value "Name" from a row ID 10, the code would be:
object val = table[10][2];
Another option is SortedList<int,object[]>... depending on the way you access the data (forward only or random access).
You could also use MultiDictionary<int,object> from PowerCollections.
From the memory usage perspective, I think the best option would be to use a single dimension array with some slack capacity. So after each, say 100, rows, you would create a new array, copy the old contents to it and leave 100 empty rows at the end. But you would have to keep some sort of an index when you delete a row, so that it is marked as deleted without resizing the array.
Isn't this a DataSet/DataTable? Maybe I didn't get the question.
Also, what is the programming language?

Categories