Is it bad practice to have a string like
"name=Gina;postion= HouseMatriarch;id=1234" to hold state data in an application.I know that I could just as well have a struct , class or hashtable to hold this info.
Is it acceptable practice to hold delimited key/value pairs in a database field– just for use in where the data type is not known at design time.
Thanks...I am just trying to insure that I maintain good design practices
Yes, holding your data in a string like "name=Gina;postion= HouseMatriarch;id=1234" is very bad practice. This data structure should be stored in structs or objects, because it is hard to access, validate and process the data in a string. It will take you much more time to write the code to parse your string to get at your data than just using the language structures of C#.
I would also advise against storing key/value pairs in database fields if the other option is just adding columns for those fields. If you don't know the type of your data at design time, you are probably not doing the design right. How will you be able to build an application when you don't know what data types your fields will have to hold? Or perhaps you should elaborate on the context of the application to make the intent clearer. It is not all black and white :-)
Well, if your application only passes this data between other systems, I don't see any problems in treating it as a string. However, if you need to use the data, I would definitely introduce the necessary types to handle that.
I think you will find your application easier to maintain if you make a struct or class to hold the data and then add a custom property to return (and set) the string you been using. This method will take the fields and format it in the string that you are already using and do the reverse (take the string and fill the fields) This way you maintain maximum compatibility with your old algorithms.
Well one immediate problem with that approach is embedded escape chars. Given your example what would happen if the user entered their name as follows:
Pet;er
or
Pe=;ter
or
pe;Name=Yeoi;
I am not sure what state data it is you are trying to hold, and without any context it's hard to make valid suggestions. Perhaps a first step would be to replace this with a key value pair, at least that negates the problem mentioned above and means you don't have to parse strings regularly.
I try to not keep data in any string based formats. But I encountered several situations, in which it was not possible to know in advance how the structure of the data will be (e.g. it was possible for the customer/end-user to dynamically add fields).
In contrast to your approach, we decided to store the data in XML, e.g. in your case this would be something similar like this:
<user id="1234">
<name>Gina</name>
<postion>HouseMatriarch</position>
</user>
This gives you the following advantages:
The classes to work with the data (read/write) are already available in the framework (e.g. XmlDocument or XML serialization)
you can easily exchange the data with other systems (if/when required)
You can store the data in a file
you can store the data in a database column (xml data type). You can even query that column when using SQL Server (although I'd try to avoid storing data in XML, that has to be queried)
using XML allows to add additional fields to your data at any time
Update: I'm not sure why my answer was downvoted that much - maybe it is because of the bad example. Therefore I'd like to make it clear: I would not use XML for properties such as an ID/primary key of a user, or for standard properties like "name", "email", etc. But for "extended/dynamic" properties (as described above) I still think this is an easy and elegant solution.
If you want to store structured data in a string I think you should use a standard notation such as JSON.
It's bad practice because of the amount of effort you have to go to, to construct the strings and parse them later. There are other more robust ways of serialising data for passing between systems.
For core business data, suitably designed classes will be far simpler to maintain, and with all the properties strongly typed, you'll know early on when you mis-type a property name.
As for key-value pairs, I'd say they're sometimes Ok, sometimes not. If there are a lot of possible values, but not a lot of actually owned values, then it can be perfectly all right to use KVPs. Sebastian Dietz's alternative of having a separate column for each field would result in a lot of empty fields in that case. It would also mean extra work altering the table every time you needed a new one.
None of the answers has mentioned normalization yet, so I thought I would. When database fields are involved, one of the key principles of normalization is that each field in a table only represents one thing. Delimited fields violate that principle.
One of the guys at Red Gate Software posted this article along those lines that you may find useful.
Well it just means that it is less searchable or indexable as a hashtable would be. It would also require a bit of processing to get into a state where it could be easily used by other bits of code. For example a bit of code that queries the id in that data would be something horrible such as:
if(dataStringThing.Substring(26, 4) == SomeIdInStringFormat)
So yes in most cases it is bad practice. However in other cases where this might be a default format that you need to retain the data in or performance means that you only should parse it as and when required. So it may not be a bad thing.
I would suggest myself if you have reasons to keep it in that format that it might be best to transform it into a class that separates the fields but also create a ToString() implementation on that class that restores it to the original format if you also need this. If the performance of it is a concern then modify this object to only parse the source into the fields in the class the first time those fields are accessed.
To re-iterate nothing in isolation is necessarily a bad practise. Bad practises are context dependant.
It (hopefully) obviously shouldn't be a normal choice. But there are cases where it's useful or necessary.
I can't think of any cases that wouldn't involve it being part of a communications protocol with some external service (e.g. a database connection string), so you're probably stuck with the format.
If you have a choice in the format (perhaps you are writing both sides of a system which can only communicate using strings), then at least choose something structured and well known. Examples of such have been given elsewhere, but the prime ones are naturally going to be XML or JSON. CSV, or some other delimited format may be useful in very simple cases (such as the database connection string) - but pay special attention to escaping delimiter characters (as the "Bobby Tables" joke (already referenced in another comment) nicely illustrated - google for him if you are not familiar with that one).
Your mention of a database suggests that this may be where the focus is. Are you trying to serialise application objects? (there are other ways of doing that). As another poster said, this may be a sign of a design that needs rethinking. But if you do need to store unknown datatypes in a DB, then XML may be an appropriate choice - especially if your DB supports XML fields. It's a bit of a minefield, though, so make sure you are familiar with how they work first.
I think it is not that bad when you are using a StringList for manging your string.
Especially when the structure of a e.g configuration-string (or configuration-database field) must be flexibel.
But in normally you should not do this, because of this disadvantages.
It all depends on what you're trying to accomplish.
If you need a heirarcical format of data or lots of fields that preserve data type, then no... a parsed string is a bad idea.
However, if you just need to transmit a string across a service and byte-conservation is important, then a Tag-Data pair may be exactly what you need.
If you do use a parsed string, it's important to be able to get at the data inside and quickly manage it. If you want an example TDP class, I posted one today to my website:
http://www.jerryandcheryl.net/jspot/2009/01/tag-data-pairs.html
I hope that helps.
I suggest considering these usage factors.
If you are processing the data within your own code, then you can use whatever data structures you wish. However, you may have issues developing your own implementation of a complex data structure, so consider using a pre-built one instead. Many come with whatever programming platform you may be using, while many more are documented in various books, articles, and discussions both printed and online. If you properly isolate your work from others, then you can safely do whatever you want.
On the other hand, if you need to share that data with others, then most careful consideration should be given. If you must share the data with an API, or via a storage mechanism (database, file, etc.), or via some transport (sockets, HTTP, etc.), then you should be thinking of others first and foremost. If you wish success and respect from your efforts, then you need to pay attention to standards and conventions and cost. Thankfully, practically any such use that you can imagine has been done before, so you can leverage others' efforts.
In a database, consider how others (and yourself) will be inserting, updating, deleting, and selecting the data. For example, using XML in a database makes all these steps unnecessarily hard and expensive compared to the alternatives. Pay attention to database normalization--learn it if you are not familiar already.
If you are dealing with text, pay attention to character encodings and make them explicit.
If there is an existing standard or convention for what you are doing, honor it. If there is a compelling reason to deviate, then accept the burden of justifying it, explaining it, and making it easy for others to accommodate your choices.
If you control both sides of a communication/transport medium, feel free to optimize. If you don't, err on the side of interoperability. Remember that a primary difference between the two scenarios is the level of self-description embedded with the data: interoperability has lots, optimization drops it based on shared assumptions. Text-rich data is more understandable, but binary is faster.
Think about your audience.
Related
I was just wondering about XML Serialization. If i understand correctly, the main reason for using it is that it lets you transport your object data more easily, am I right? Also, i tried serializing data using a constructor but it says that that you can only serialize data that are "parameterless". The thing is I like constructors because it allows me to have for example a Player class, and adding a new player with all properties is much more productive than having to set all properties one by one.
So the big question here is, what's the BIG purpose of XML serialization, what are the ways to use it? the way I see it is that it adds another level of complexity to my code, because i now need a class to serialize my data. Can someone shed some light?!
If you're talking about the overall purpose of serialization, strictly speaking, serialization (note that I said "serialization," not "XML Serialization" - more on that in a second) doesn't just make transporting objects easier, it's the only way you could transport an object.
As indicated in Pablo Santa Cruz's answer, XML is one of many ways you can serialize data. If you're going to save or send data somewhere, by definition you must first have some way to represent it. Serialization basically means that you represent your object state in some specified format. Deserialization is the opposite - given some representation of an object state, reconstruct what the original object state was.
In that sense, XML serialization, saving an object state to a database somehow, saving it as JSON, saving it in some binary format, and saving in some XML format are all examples of serialization (because you're representing the object state in a pre-defined format for later use).
While any defined format can technically be serialization, there are several standard ways of doing that. XML and JSON are by far the most common formats because they're standardized, easy to parse, easy to constrain (e.g. with XML Schema), are widely supported by libraries, can be relatively human-readable (which makes debugging easier), and they're widely used.
In case the last point sounds a little odd (they're widely used because they're widely used), standards by their very nature tend to have a strong network effect. In other words, the more people adapt them the more useful they are; for example, it's only useful to have email if you can actually use it to contact other people - it wouldn't be even slightly useful to have email if you were the only one using it.
A lot of standards and technologies will win out over competitors more because they have more early adapters than because they're necessarily technically superior. For example, even if someone could clearly prove that OS X is a "better" operating system than Windows, it wouldn't matter because there's vastly more software developed for Windows and it would be prohibitively expensive for people to try to switch to OS X. (You could make a similar argument for Token Ring vs. Ethernet).
Serialization is for storing object representation somehow (on a disk file, on the wire {network transportation}, on a HTTP session, on a database). XML Serialization is just one type of serialization.
The reason you need a parameter-less constructor to support serialization, is that the AUTO DESERIALIZER needs to create an EMPTY (with no o little data) class before start populating it with the corresponding data.
You don't need to use ONE WAY or THE OTHER, because you can have a class with multiple constructor (the parameter-less one will be used on deserialization, and you can use the other one wherever you need in your code).
Is there a 'correct' or preferred manner for sending data over a web socket connection?
In my case, I am sending the information from a C# application to a python (tornado) web server, and I am simply sending a string consisting of several elements separated by commas. In python, I use rudimentary techniques to split the string and then structure the elements into an object.
e.g:
'foo,0,bar,1'
becomes:
object = {
'foo': 0,
'bar': 1
}
In the other direction, I am sending the information as a JSON string which I then deserialise using Json.NET
I imagine there is no strictly right or wrong way of doing this, but are there significant advantages and disadvantages that I should be thinking of? And, somewhat related, is there a consensus for using string vs. binary formats?
Writing a custom encoding (eg, as "k,v,..") is different than 'using binary'.
It is still text, just a rigid under-defined one-off hand-rolled format that must be manually replicated. (What happens if a key or value contains a comma? What happens if the data needs to contain nested objects? How can null be interpreted differently than '' or 'null'?)
While JSON is definitely the most ubiquitous format for WebSockets one shouldn't (for interchange purposes) write JSON by hand - one uses an existing serialization library on both ends. (There are many reasons why JSON is ubiquitous which are covered in other answers - this doesn't mean it is always the 'best' format, however.)
To this end a binary serializer can also be used (BSON being a trivial example as it is effectively JSON-like in structure and operation). Just replace JSON.parse with FORMATX.parse as appropriate.
The only requirements are then:
There is a suitable serializer/deserializer for the all the clients and servers. JSON works well here because it is so popular and there is no shortage of implementations.
There are various binary serialization libraries with both Python and C# libraries, but it will require finding a 'happy intersection'.
The serialization format can represent the data. JSON usually works sufficiently and it has a very nice 1-1 correspondence with basic object graphs and simple values. It is also inherently schema-less.
Some formats are better are certain tasks and have different characteristics, features, or tool-chains. However most concepts (and arguably most DTOs) can be mapped onto JSON easily which makes it a good 'default' choice.
The other differences between different kinds of binary and text serializations is most mostly dressing - but if you'd like to start talking about schema vs. schema-less, extensibility, external tooling, metadata, non-compressed encoded sizes (or size after transport compression), compliance with a specific existing protocol, etc..
.. but the point to take away is don't create a 'new' one-off format. Unless of course, you just like making wheels or there is a very specific use-case to fit.
First advice would be to use the same format for both ways, not plain text in one direction and JSON in the other.
I personally think {'foo':0,'bar':1} is better than foo,0,bar,1 because everybody understands JSON but for your custom format they might not without some explanations. The idea is you are inventing a data interchange format when JSON is already one and #jfriend00 is right, pretty much every language now understands JSON, Python included.
Regarding text vs binary, there isn't any consensus. As # user2864740 mentions in the comments to my answer as long as the two sides understand each other, it doesn't really matter. This only becomes relevant if one of the sides has a preference for a format (consider for example opening the connection from the browser, using JavaScript - for that people might prefer JSON instead of binary).
My advice is to go with something simple as JSON and design your app so that you can change the wire format by swapping in another implementation without affecting the logic of your application.
In my application we have multi-lingual language strings which are stored in custom tables, as the user can edit, delete, import new languages etc... via a UI
Currently, what I'm doing is at the beginning of each request is. I'm going off and getting all the language strings (From our database) for the currently selected language and sticking them in a dictionary.
I then have a Html Helper extension method which I use in the razor views (See below), which fishes in the dictionary I got at the beginning of the request to pull out the correct language based on the key supplied in the helper.
Html.LanguageString("MyLanguage.KeyHere")
Now this works fine. However, as the application is getting bigger. We are getting more and more language strings. It's not an issue right now, as its still very fast as there are only around 200 strings to get.
But this also means I'm getting all of them, even if a page has say one on it. I'd ideally like a way of processing the LanguageString("")'s before hand and doing a query to just get those that are needed at the beginning of the request? Or maybe my own linq based language that can be processed and product a more efficient call.
I'm looking for some advice on how to do this. As I'd like the application to be as efficient as possible. Any advice, help, tips are greatly received. Thanks.
I'd suggest caching language strings on the application basis rather than fetching them for every request. For example, this can be done by maintaining a static dictionary and invalidating the cache only when the user makes changes to these strings. This will make your application more responsive as well as save you from implementing (imho) rather more complex and not necessarily efficient technique of loading this data on-demand.
As a side note I'd add the following: it's usually a good practice to address these kinds of problems when they arise (rather than fixing something that is not broken) and focus on more important things. I totally agree that performance implications of a given solution must always be taken into consideration, I'm just saying that premature optimizations are not always a good idea.
I'm looking for some advice, it may be that there is no hard and fast answer but any thoughts are appreciated. The application is very typical. Its a c# code, currently using VS2010. This code is used amongst other things for providing data from a model to end users. The model itself is based on some known data at fixed points (a "cube") various parameters and settings. In standard programming practice the user accesses the data via public "get" functions which in turn rely on private member variables such as the known data and the settings. So far so standard. Now I want to save the class providing this data to the users into an sql database - primarily so all users can share the same data (or more precisely model generated data).
Of course I could just take each member variable of the class and write these into the db using sql database and reinstantiate the class from these. But I dont want to miss out on all the goodies .net & c# has to offer. So what I'm thinking of doing is serializing the object and using linq to sql to squirt this into the db. The linq to sql section is straightforward, but I'm a total newbie when it comes to serialization and I'm a bit concerned/confused about it. It seems the obvious thing is to format the object into xml and write this into the database as a column in the table with sql datatype "xml". But that leaves me with a few questions.
To my understanding the standard XMLserializer will only write the public members of the class into the xml. That looks like a non-starter to me since my class design is predicated on keeping much of the class private (writing classes with all public members is outside of my experience - who does that ?). This has led me to consider the DataContractSerializer in which you can opt-in variables for serialization. However this seems to have some WCF dependencies and I'm wondering what are the potential drawbacks of using it. Additionally there is the SoapFormatter, which seems to be more prevalent for web applications and also JSON. I'm not really considering these, but maybe I should ? Maybe there are better ways of solving the problem ? Perhaps a bit verbose but hopefully all the context can help so people can shoot me some advice.
I have had requirements similar to yours and have done quite a bit of research in this area. I did a number of proof-of-concept projects using XMLSerialization, JSON, BinraryFormatter and not to forget some home grown hacks. I had almost decided to go with JSON (JSON.NET), until I found protobuf-net! It is fast, small in size, version independent, supports inheritance and easy to implement without much changes to your code. Recommend heavily.
If you store an object as XML, it will be very hard to use from the database. For example, if you store customer objects as XML, how would you write the following?
select * from Customers where LastName = 'Obama'
It can be done, but it's not easy.
How to map objects to a database is a subject of some controversy. Frameworks that are easy to get started with can become overly complex in the application's later life. Since most applications spend more time in maintenance than in initial development, I'd use the simplest model that works.
Plain ADO.NET or Dapper are good contenders. You'll write a bit more boilerplate code, but the decrease in complexity more than makes up for that.
I need to translate result strings from a library into exceptions. Each string has a numeric result code, followed by a pipe char, and then additional, code specific data. I'm thinking of using a custom exception with a ResultCode property, and storing a lookup table of message strings keyed by result code, which I will format with an array of message infos before throwing the exception. What is a good way to store this table of int-string values?
Where the data used to make the decision on which error to raise is never used outside of a single class, the optimal approach is to actually hard code the decision. That is what application programming is all about, vs. systems or frameworks programming. I suffer enough narcissism without imagining myself the divine provider of a perfect business programming framework to recognise that sometimes, my code will only be used in the one place I develop it in.
Therefore, I developed two candidate solutions to this problem:
1. The suggested dictionary of strings (codes) as keys to various exception types, initialised in a constructor; or
2. A medium sized 'switch' statement, with the same hard-coding as the constructor above.
In a single use scenario that is very likely to remain single use for the lifespan of the application, either of the above is a more than just acceptable solution; it is a cost effective, YAGNI based solution that allows me to devote more energy to problems that actually require attention.
If you need to store these strings which you already have, you should use a simple Dictionary<int,string>, where the errorcode is the key and the string is the value.
If you type the strings yourself, you should consider using a Resources file.