Storing a Dictionary<int,string> or KeyValuePair in a database - c#

I wanted to see what others have experienced when working with types like List<> or Dictionary<> and having in turn storing and retrieving that data?
Here's an example scenario: users will be creating their own "templates", where these templates is essentially a collection of Dictionary, e.g. for user1, values are (1, Account), (2, Bank), (3, Code), (4, Savings), and for user2, values (unrelated) could be (1, Name), (2, Grade), (3, Class), and so on. These templates/lists could be of varying length but they will always have an index and a value. Also, each list/ template will have one and only one User linked to it.
What types did you choose on the database side?
And pain-points and/or advice I should be aware of?

As far as the types within the collection go, there is a fairly 1-to-1 mapping between .Net types and SQL types: SQL Server Data Type Mappings. You mostly need to worry about string fields:
Will they always be ASCII values (0 - 255)? Then use VARCHAR. If they might contain non-ASCII / UCS-2 characters, then use NVARCHAR.
What is their likely max length?
Of course, sometimes you might want to use a slightly different numeric type in the database. The main reason would be if an int was chosen on the app side because it "easier" (or so I have been told) to deal with than Int16 and byte, but the values will never be above 32,767 or 255, then you should most likely use SMALLINT or TINYINT respectively. The difference between int and byte in terms of memory in the app layer might be minimal, but it does have an impact in terms of physical storage, especially as row counts increase. And if that is not clear, "impact" means slowing down queries and sometimes costing more money when you need to buy more SAN space. But, the reason I said to "most likely use SMALLINT or TINYINT" is because if you have Enterprise Edition and have Row Compression or Page Compression enabled, then the values will be stored in the smallest datatype that they will fit in.
As far as retrieving the data from the database, that is just a simple SELECT.
As far as storing that data (at least in terms of doing it efficiently), well, that is more interesting :). A nice way to transport a list of fields to SQL Server is to use Table-Valued Parameters (TVPs). These were introduced in SQL Server 2008. I have posted a code sample (C# and T-SQL) in this answer on a very similar question here: Pass Dictionary<string,int> to Stored Procedure T-SQL. There is another TVP example on that question (the accepted answer), but instead of using IEnumerable<SqlDataRecord>, it uses a DataTable which is an unnecessary copy of the collection.
EDIT:
With regards to the recent update of the question that specifies the actual data being persisted, that should be stored in a table similar to:
UserID INT NOT NULL,
TemplateIndex INT NOT NULL,
TemplateValue VARCHAR(100) NOT NULL
The PRIMARY KEY should be (UserID, TemplateIndex) as that is a unique combination. There is no need (at least not with the given information) for an IDENTITY field.
The TemplateIndex and TemplateValue fields would get passed in the TVP as shown in my answer to the question that I linked above. The UserID would be sent by itself as a second SqlParameter. In the stored procedure, you would do something similar to:
INSERT INTO SchemaName.TableName (UserID, TemplateIndex, TemplateName)
SELECT #UserID,
tmp.TemplateIndex,
tmp.TemplateName
FROM #ImportTable tmp;
And just to have it stated explicitly, unless there is a very specific reason for doing so (which would need to include never, ever needing to use this data in any queries, such that this data is really just a document and no more usable in queries than a PDF or image), then you shouldn't serialize it to any format. Though if you were inclined to do so, XML is a better choice than JSON, at least for SQL Server, as there is built-in support for interacting with XML data in SQL Server but not so much for JSON.

List or any collection's representation in databases are supposed to be tables. Always think of it as a collection and relate it to what a database offers.
Though you can always serialize a collection, i do not suggest it since updating or inserting records, you'd always update the whole record or data whereas having a table, you'd only have to query for the KEY wherein Dictionary, you already have it.

Related

Passing Record type from Managed ODP.NET to Oracle Procedure

If I have a record defined in a PL/SQL package and a procedure defined in the same package, is it possible to create the "record" type on the .NET (C#) side and pass it to the procedure using the type t_my_rec. I'm sure I can do this using UDTs (Oracle user-defined data types), but since I am using the managed driver, it isn't yet supported.
TYPE t_arr_my_rec IS TABLE OF t_my_rec INDEX BY PLS_INTEGER;
TYPE t_my_rec IS RECORD
(
item_id items.item_id%type,
item_name items.item_name%type
);
PROCEDURE insert_my_rec
(
p_my_rec in t_my_rec
);
PROCEDURE bulk_insert_my_rec
(
p_my_recs in t_arr_my_rec
);
Ideally I'd like to avoid defining array types for every single item in the table to do bulk FORALL insert statements.
I really appreciate the help!
I don't think you can deal with Oracle type declarations in ODP.net outside a UDT, and even then I've only done so with Type declarations made in the database rather than in a package.
You could also consider passing a collection of objects across in an XML object and parsing it out at both sides. That ensures that you can define the structures in play, although you will incur the overhead of creating / validating / parsing the string, and the data overhead of passing numbers as strings rather than as a couple of bytes.
Heck, in the old days before any decent UDT or XML support I remember stuffing a bunch of data into a CLOB to pass across and parse out, once both sides agreed on the format. Works OK if you never ever EVER expect to change the data object. A flipping maintenance nightmare otherwise. But do-able.
No it is not possible. You'll need to use some other technique, such as flattening out the record into multiple SProc parameters, using a temp table, etc.
Here is a relevant thread over on the OTN forums.
https://community.oracle.com/thread/3620578
I had a similar problem. I have solved using an associative array for each field in the record. Instead of having a single output parameter of type PL / SQL table of records have so many parameters as columns. In the package I have defined two basic types of associative arrays of varchar2 and number.
CREATE OR REPLACE PACKAGE xxx AS
type t_tbl_alfa is table of varchar2(50) index by binary_integer;
type t_tbl_num is table of number index by binary_integer;

C# and SQLServer normalizing large sets of Urls

I have many tables in the database that have at least one column that contains a Url. And these are repeated a lot through-out the database. So I normalize them to a dedicated table and I just use numeric IDs everywhere I need them. I often need to join them so numeric ids are much better than full strings.
In MySql + C++, to insert a lot of Urls in one strike, I used to use multi-row INSERT IGNOREs or mysql_set_local_infile_handler(). Then batch SELECT with IN () to pull the IDs back from the database.
In C# + SQLServer I noticed there's a SqlBulkCopy class that's very useful and fast in mass-insertion. But I also need mass-selection to resolve the Url IDs after I insert them. Is there any such helper class that would work the same as SELECT WHERE IN (many, urls, here)?
Or do you have a better idea for turning Urls into numbers in a consistent manner in C#? I thought about crc32'ing the urls or crc64'ing them but I worry about collisions. I wouldn't care if collisions are few, but if not... it would be an issue.
PS: We're talking about tens of millions of Urls to get an idea of scale.
PS: For basic large insert, SQLBulkCopy is faster than SqlDbType.Structured. Plus it has the SqlRowsCopied event for a status tracking callback.
There is even a better way than SQLBulkCopy.
It's called Structured Parameters and it allows you to pass a table-valued parameter to stored procedure or query through ADO.NET.
There are code examples in the article, so I will only highlight what you need to do to get it up and working:
Create a user defined table type in the database. You can call it UrlTable
Setup a SP or query which does the SELECT by joining with a table variable or type UrlTable
In your backing code (C#), create a DataTable with the same structure as UrlTable, populate it with URLs and pass it to an SqlCommand through as a structured parameter. Note that column order correspondence is critical between the data table and the table type.
What ADO.NET does behind the scenes (if you profile the query you can see this) is that before the query it declares a variable of type UrlTable and populates it (INSERT statements) with what you pass in the structured parameter.
Other than that, query-wise, you can do pretty much everything with table-valued parameters in SQL (join, select, etc).
I think you could use the IGNORE_DUP_KEY option on your index. If you set IGNORE_DUP_KEY = ON on the index of the URL column, the duplicate values are simply ignored and the rest are inserted appropriately.

Creating a Timeline and SQL Storage

Language: C#
Compiler: Visual Studio 2012
O/S: Windows 7 Home Premium
Here is a question thats been on many questions, and through a few debates.
I know there are currently provisional .net controls for a functional timeline, as well as hints and tips on how a process would be done, but I have not found (so far) a complete tutorial on a well-maintained SQL-Storage Timeline system.
I need to document almost every change that my site will have. From the addition to user reputation, to the joining / creating and eventual submissions of members, clans games etc.
As far as I know, DateTime in a SQL database should be avoided, especially in large quantities.
What would be the implementation, process, and eventual output of a Timeline?
What you're describing is sometimes known as "Audit history" - and it's often implemented using a single, denormalized, table, however many DB purists will argue against it as you lose strong typing.
The table looks like this:
AuditTable( EventId bigint, DateTime datetime, Subject nvarchar, Table varchar, Column varchar, TablePK bigint, OldValueInt bigint nullable, OldValueStr nvarchar nullable )
-- add more nullable columns for more types, if necessary
Each time a value is changed, such as a user's reputation being increased, you would add a row to this table, such as this:
INSERT INTO AuditTable( Now(), N'User reputation increased', 'Users', 'Reputation', #userId, 100 )
You only need to store the old value (the value before the change) because the new (i.e. current) value will be in the actual table row.
Adding to the Audit table can be done entirely automatically with SQL Server table triggers.
To view a user's reputation history, you would do this:
SELECT * FROM AuditTable WHERE Table = 'Users' AND Column = 'Reputation' AND TablePK = #userId
Now as I said, this design is more for auditing rather than maintaining an easily user-accessible history, these are the disadvantages:
You cannot semantically index the table, so lookups and lists will always be slow
You're storing database metadata as strings, so there's a lot of overhead
There's no referential integrity (this can be a good thing in that the data will remain if you re-architecture the original tables, e.g. removing the Reputation field from the Users table)
If you want to be more "pure" then you really have to design a table structure that directly supports the history-tracking you want to build. You don't need to create a history table for every field - even Stackoverflow doesn't store a history of everything. For example:
UserReputationHistory ( UserId bigint, ReputationChange int, DateTime when, Subject nvarchar )
Of course it does complicate your code to have to maintain these disparate FooHistory tables.
The other things in in your original question that you comment, such as a member's join date doesn't need a history table, you can get that from a DateJoined field in the member's own DB row.

How to get the primary key from a table without making a second trip?

How would I get the primary key ID number from a Table without making a second trip to the database in LINQ To SQL?
Right now, I submit the data to a table, and make another trip to figure out what id was assigned to the new field (in an auto increment id field). I want to do this in LINQ To SQL and not in Raw SQL (I no longer use Raw SQL).
Also, second part of my question is: I am always careful to know the ID of a user that's online because I'd rather call their information in various tables using their ID as opposed to using a GUID or a username, which are all long strings. I do this because I think that SQL Server doing a numeric compare is much (?) more efficient than doing a username (string) or even a guid (very long string) compare. My questions is, am I more concerned than I should be? Is the difference worth always keeping the userid (int32) in say, session state?
#RedFilter provided some interesting/promising leads for the first question, because I am at this stage unable to try them, if anyone knows or can confirm these changes that he recommended in the comments section of his answer?
If you have a reference to the object, you can just use that reference and call the primary key after you call db.SubmitChanges(). The LINQ object will automatically update its (Identifier) primary key field to reflect the new one assigned to it via SQL Server.
Example (vb.net):
Dim db As New NorthwindDataContext
Dim prod As New Product
prod.ProductName = "cheese!"
db.Products.InsertOnSubmit(prod)
db.SubmitChanges()
MessageBox.Show(prod.ProductID)
You could probably include the above code in a function and return the ProductID (or equivalent primary key) and use it somewhere else.
EDIT: If you are not doing atomic updates, you could add each new product to a separate Collection and iterate through it after you call SubmitChanges. I wish LINQ provided a 'database sneak peek' like a dataset would.
Unless you are doing something out of the ordinary, you should not need to do anything extra to retrieve the primary key that is generated.
When you call SubmitChanges on your Linq-to-SQL datacontext, it automatically updates the primary key values for your objects.
Regarding your second question - there may be a small performance improvement by doing a scan on a numeric field as opposed to something like varchar() but you will see much better performance either way by ensuring that you have the correct columns in your database indexed. And, with SQL Server if you create a primary key using an identity column, it will by default have a clustered index over it.
Linq to SQL automatically sets the identity value of your class with the ID generated when you insert a new record. Just access the property. I don't know if it uses a separate query for this or not, having never used it, but it is not unusual for ORMs to require another query to get back the last inserted ID.
Two ways you can do this independent of Linq To SQL (that may work with it):
1) If you are using SQL Server 2005 or higher, you can use the OUTPUT clause:
Returns information from, or
expressions based on, each row
affected by an INSERT, UPDATE, or
DELETE statement. These results can be
returned to the processing application
for use in such things as confirmation
messages, archiving, and other such
application requirements.
Alternatively, results can be inserted
into a table or table variable.
2) Alternately, you can construct a batch INSERT statement like this:
insert into MyTable
(field1)
values
('xxx');
select scope_identity();
which works at least as far back as SQL Server 2000.
In T-SQL, you could use the OUTPUT clause, saying:
INSERT table (columns...)
OUTPUT inserted.ID
SELECT columns...
So if you can configure LINQ to use that construct for doing inserts, then you can probably get it back easily. But whether LINQ can get a value back from an insert, I'll let someone else answer that.
Calling a stored procedure from LINQ that returns the ID as an output parameter is probably the easiest approach.

Is the usage of identity insert good with metadatatables

I have several tables within my database that contains nothing but "metadata".
For example we have different grouptypes, contentItemTypes, languages, ect.
the problem is, if you use automatic numbering then it is possible that you create gaps.
The id's are used within our code so, the number is very important.
Now I wonder if it isn't better not to use autonumbering within these tables?
Now we have create the row in the database first, before we can write our code. And in my opinion this should not be the case.
What do you guys think?
I would use an identity column as you suggest to be your primary key(surrogate key) and then assign your you candidate key (identifier from your system) to be a standard column but apply a unique constraint to it. This way you can ensure you do not insert duplicate records.
Make sense?
if these are FK tables used just to expand codes into a description or contain other attributes, then I would NOT use an IDENTITY. Identity are good for ever inserting user data, metadata tables are usually static. When you deploy a update to your code, you don't want to be suprised and have an IDENTITY value different than you expect.
For example, you add a new value to the "Languages" table, you expect the ID will be 6, but for some reason (development is out of sync, another person has not implemented their next language type, etc) the next identity you get is different say 7. You then insert or convert a bunch of rows having using Language ID=6 which all fail becuase it does not exist (it is 7 iin the metadata table). Worse yet, they all actuall insert or update because the value 6 you thought was yours was already in the medadata table and you now have a mix of two items sharing the same 6 value, and your new 7 value is left unused.
I would pick the proper data type based on how many codes you need, how often you will need to look at it (CHARs are nice to look at for a few values, helps with memory).
for example, if you only have a few groups, and you'll often look at the raw data, then a char(1) may be good:
GroupTypes table
-----------------
GroupType char(1) --'M'=manufacturing, 'P'=purchasing, 'S'=sales
GroupTypeDescription varchar(100)
however, if there are many different values, then some form of an int (tinyint, smallint, int, bigint) may do it:
EmailTypes table
----------------
EmailType smallint --2 bytes, up to 32k different positive values
EmailTypeDescription varchar(100)
If the numbers are hardcoded in your code, don't use identity fields. Hardcode them in the database as well as they'll be less prone to changing because someone scripted a database badly.
I would use an identity column as the primary key also just for simplicity sake of inserting the records into the database, but then use a column for type of metadata, I call mine LookUpType(int), as well as columns for LookUpId (int value in code) or value in select lists, LookUpName(string), and if those values require additional settings so to speak use extra columns. I personally use two extras, LookUpKey for hierarchical relations, and LookUpValue for abbreviations or alternate values of LookUpName.
Well, if those numbers are important to you because they'll be in code, I would probably not use an IDENTITY.
Instead, just make sure you use a INT column and make it the primary key - in that case, you will have to provide the ID's yourself, and they'll have to be unique.

Categories