MySQL Field Space Allocation - c#

First I create a model on my application, then Entity Framework generates the SQL for creating a table.
The first generates a column with type varchar(20), the second generates longtext.
Example
[StringLength(20)]
public string Code { get; set; }
public string CodeTwo { get; set; }
Questions
There's any difference between these two declarations(space allocation)?
(Even if they store the same value like "test" which has 5 characters.)
If I know that a field has a variance of it's length between let's say 10-15 characters, is the best approach limiting to the max length or let it "unlimited"(space allocation) ?
Thanks in advance.
Sorry my poor english.

Translated answer of the user #Marconcílio Souza , on the same question asked in another language.
When the Entity Framework generates the tables in your database it check the types of each field, in the case of type STRING when you specify the size it does the same specification to the bank with its corresponding type.
In the case of its
[StringLength (20)]
public string Code {get; set; }
The corresponding MySQL is varchar (20), but when the same string type and declared without a fixed size Entity Framework allocate as much as possible for this type in the database in the case of MySQL and longtext.
The columns of type BLOB as LONGTEXT are inherently variable length and take up almost no storage when not used. The space required by them is not affected even if a NULL value in the case of a use such as 'text' test 'set' the allocation and the size of the passed string.
* Advantages / disadvantages of BLOBs vs. VARCHARs *
All comments in this paragraph referring VARCHAR type are valid for CHAR type too.
Each comment ends with BLOB + or VARCHAR + mark to indicate what type of data is better.
 - You know maximum length of your data?
With VARCHARs you need to declare the maximum length of the chain.
With blobs you do not have to worry about it.
BLOB +
You need to store very long strings?
A single VARCHAR is limited to 32K bytes (i.e., about 10 thousand Unicode characters).
The maximum size is blob (according to Service Guide);
  - Page size 1kb => 64 Mb
  - Page Size 2kb => 512 Mb
  - Page size of 4 KB => 4Gb
  - Page size of 8KB => 32Gb
BLOB +
You need to store many long text columns in single table?
The total line length (uncompressed) is restricted to 64K. VARCHARs are stored online directly, so you can not store many long strings in a row.
Blobs are represented by their blob-id, and uses only 8 bytes from 64K maximum.
BLOB +
You want to minimize the call between client and server?
VARCHAR data is fetched along with other line data in a search operation and usually several rows are sent over the network at the same time.
Every single blob needs to do extra search operation open / fetch.
VARCHAR +
You want to minimize the amount of data transferred between client and server?
The advantage of blobs is that to get the line you get only blob-id, so you can decide whether or not to seek BLOB data.
In older versions of InterBase there was a problem that VARCHARs were sent over the network in declared full length. This problem has been fixed in Firebird 1.5 and InterBase 6.5.
draw (BLOB + for older versions of the server)
You want to minimize the space used?
VARCHARs are compressed RLE (indeed entire line are compressed except blobs). A maximum of 128 bytes can be compressed to 2 bytes. This means that even empty varchar (32000) will occupy 500 + 2 bytes.
Blobs are not compressed, but empty (ie null) blob will occupy only 8 bytes of blob-id (and will be later RLE compressed). non-empty blob may be stored on the same page as other data from the line (if appropriate) or in separate page. Small blob that fits the data page has overhead of 40 bytes (or a little more). Big blob has the same 40-byte overhead in the data page, plus 28 bytes overhead on each blob page (30 bytes in the first). A blob page can not contain more than one blob (ie blob pages are not shared as data pages). For example. for 4K page size, if you store 5K blob, two pages of the blob type will be allocated, which means that you lose 3K of space! In other words - the larger page size, the higher probability that small blobs will fit on data page, but also more wasted space if separate blob pages are needed for large blobs.
VARCHAR + (except VARCHARs with extremely large declared length, or tables with lots of NULL blobs)
You need table with extremely large number of rows?
Each line is identified by DB_KEY, which is a 64-bit value, 32 bits, 32 bits and which is balanced ID is used to locate the line. maximum number of theoretical way of rows in a table is 2 ^ 32 (but for various reasons the maximum true is even lower). Blob -IDS are allocated from the same address space as DB_KEYs, that means the more blobs in the table, less DB_KEYs remain to face queues. On the other hand, when the stored lines are wide (e.g. they contain long VARCHARs), then fewer lines fit the data page and many DB_KEY values ​​remain unasigned anyway.
varchar +?
You want a good performance?
Because large blobs are stored outside the data pages, they increase "density" of lines of data pages efficiency and thus cache (reduce the number of I / O operations during the search).
BLOB +
You need to perform the search on the contents of text columns?
In VARCHAR you can use operators such as '=', '>', among them, of (), case sensitive as and departure case insensitive CONTAINING. In most cases index can be used to speed up the search.
Blobs can not be indexed, and you are restricted to TASTE, starting and containing operators. You can not directly compare blobs with operators '=', '>' etc. (Unless you use UDF), so you can not, for example, join tables in Blob fields.
VARCHAR +
You want to search content of these texts with CONTAINING?
Containig can be used to perform case-insensitive search content VARCHAR field. (No index use)
Because you can not set collation order for BLOB columns, you can not use the fully insensitive search case with national characters in BLOB columns (only the lower half of the character set is case insensitive). (Alternatively, you can use UDF).
Firebird 2 already allows you to set text wrapping (and binary) columns.
VARCHAR +
You need capital contents of the text column?
You can use the built-in UPPER () function on varchar, but not the blob. (Also CAST, MIN, MAX can not be used with blobs)
VARCHAR +
You can not sort by blob column. (E GROUP BY, DISTINCT, UNION, JOIN ON)
Unable to concatenate blob columns.
VARCHAR +
There is no built-in conversion function (CAST) for converting blob to VARCHAR or VARCHAR to blob.
(But you can write UDF for this purpose.)
Since Firebird 1.5 you can use builtin SUBSTRING function to convert blob to VARCHAR (but FROM clauses and can not exceed 32K).
to draw
You can not assign value to blob directly in SQL command,
for example. Enter values ​​guide (MyBlob) ( 'abc'); (But you can use UDF for converting string to blob).
VARCHAR +
Firebird - 0.9.4 already has this functionality
to draw
You need a good security on these text columns?
To recover the table data, you must be granted the SELECT privilege.
To retrieve blob, you need to know only blob -id (stored in the table), but Firebird / InterBase will not check if you have any blob table rights belongs. This means that everyone who know or guess right blob -id can read the blob without any rights to the table. (You can try it with ISQL and BLOBDUMP command.)
VARCHAR +
More details
Reference 1
Reference 2
Reference 3
Reference 4

Related

Storing files to byte array

I have a database object that has a column to store files as varbinary. I have tried to store single file using C# and byte arrays and it worked. How can I add multiple files to this column. Please help.
I would suppose you'd need to concat the byte arrays from each file into a giant byte array and then insert that byte array into the field, but then how would know where 1 file begins and the next ends?
You could try to put in a magic set of bytes between each file byte array, but then what happens when one of those files randomly has that magic set of bytes?
If the files are the same exact type, say images, you could look for the magic bytes certain image file types always start with to separate them out, but again, there's still the random chance you might find those magic bytes in the middle of one of the files.
There is also memory concerns both saving and retrieving if the combined files are too large.
This idea also violates database design / normalization.
I would do what Jeremy Lakeman recommends: create a child table.
IE,
Files Table Columns:
ParentID (foreign key to parent table)
FileID (Autonumber / primary key)
File (varbinary)

Optimizing SDF filesize

I recently started learning Linq and SQL. As a small project I'm writing a dictionary application for Windows Phone. The project is split into two Applications. One Application (that currently runs on my PC) generates a SDF file on my PC. The second App runs on my Windows Phone and searches the database. However I would like to optimize the data usage. The raw entries of the dictionary are written in a TXT file with a filesize of around 39MB. The file has the following layout
germanWord \tab englishWord \tab group
germanWord \tab englishWord \tab group
The file is parsed into a SDF database with the following tables.
Table Word with columns _version (rowversion), Id (int IDENTITY), Word (nvarchar(250)), Language (int)
This table contains every single word in the file. The language is a flag from my code that I used in case I want to add more languages later. A word-language pair is unique.
Table Group with columns _version (rowversion), GroupId (int IDENTITY), Caption (nvarchar(250))
This table contains the different groups. Every group is present one time.
Table Entry with columns _version (rowversion), EntryId (int IDENTITY), WordOneId (int), WordTwoId(int), GroupId(int)
This table links translations together. WordOneId and WordTwoId are foreign keys to a row in the Word Table, they contain the id of a row. GroupId defines the group the words belong to.
I chose this layout to reduce the data footprint. The raw textfile contains some german (or english) words multiple times. There are around 60 groups that repeat themselfes. Programatically I reduce the wordcount from around 1.800.000 to around 1.100.000. There are around 50 rows in the Group table. Despite the reduced number of words the SDF is around 80MB in filesize. That's more than twice the size of the the raw data. Another thing is that in order to speed up the searching of translation I plan to index the Word column of the Word table. By adding this index the file grows to over 130MB.
How can it be that the SDF with ~60% of the original data is twice as large?
Is there a way to optimize the filesize?
The database file must contain all of the data from your raw file, in addition to row metadata -- it also will contain the strings based on the datatypes specified -- I believe your option here is NVARCHAR which uses two bytes per letter. Combining these considerations, it would not surprise me that a database file is over twice as large as a text file of the same data using the ISO-Latin-1 character set.

C#: Storing Filesize in Database

I'm storing objects in a database as varbinary(MAX) and want to know their filesize. Without getting into the pro and cons of using the varbinary(MAX) datatype, what is the best way to read the file size of an object stored in the database?
Is it:
A. Better to just read the column from the DB and call the .Length property of System.Data.Linq.Binary.
OR
B. Better to determine the file size of the object before it is added to the DB and create another column called Size.
The files I'm dealing with are generally between 0 and 3 MB with a skew towards the smaller size. It doesn't necessarily make sense to hit the DB again for the file size, but it also doesn't really make sense to read through the entire item to determine its length.
Why not add a calculated column in your database that would be DATALENGTH([your_col])?

C#: Is it possible to store a Decimal Array in an SQL database?

I'm working on an application for a lab project and I'm making it in C#. It's supposed to import results from a text file that is exported from the application we use to run the tests and so far, I've hit a road block.
I've gotten the program to save around 250 decimal values as a single-dimension array but then I'm trying to get the array itself to be able to saved in an SQL database so that I can later retrieve the array and use the decimal values to construct a plot of the points.
I need the entire array to be imported into the database as one single value though because the lab project has several specimens each with their own set of 250 or so Decimal points (which will be stored as arrays, too)
Thanks for your help.
EDIT: Thanks for the quick replies, guys but the problem is that its not just results from a specimen with only 1 test ran. Each specimen itself has the same test performed on them with different decibel levels over 15 times. Each test has its own sets of 250 results and we have many specimens.
Also, the specimens already have a unique ID assigned to them and it'd be stored as a String not an Int. What I'm planning on doing is having a separate table in the DB for each specimen and have each row include info on the decibel level of the test and store the array serialized...
I think this would work because we will NOT need to access individual points in the data straight from the database; I'm just using the database to store the data out of memory since there's so much of it. I'm going to query the database for the array and other info and then use zedgraph to plot the points in the array and compare multiple specimens simultaneously.
Short answer is absolutely not. These are two completely different data structures. There are work arounds like putting it in a blob or comma separating a text column. But, I really hate those. It doesn't allow you to do math at the SQL Server level.
IMO, the best option includes having more than one column in your table. Add an identifier so you know which array the data point belongs to.
For example:
AutoId Specimen Measurement
1 A 42
2 A 45.001
3 B 47.92
Then, to get your results:
select
measurement
from
mytable
where
specimen = 'A'
order by
autoid asc
Edit: You're planning on doing a separate 250 row table for each specimen? That's absolutely overkill. Just use one table, have the specimen identifier as a column (as shown), and index that column. SQL Server can handle millions upon millions of rows markedly well. Databases are really good at that. Why not play to their strengths instead of trying to recreate C# data structures?
I need the entire array to be imported
into the database as one single value
though because the lab project has
several specimens each with their own
set of 250 or so Decimal points (which
will be stored as arrays, too)
So you're trying to pound a nail, should you use an old shoe or a glass bottle?
The answer here isn't "serialize the array into XML and store it in a record". You really want to strive for correct database design, and in your case the simplest design is:
Specimens
---------
specimenID (pk int not null)
SpecimenData
------------
dataID (pk int not null
specimenID (fk int not null, points to Specimens table)
awesomeValue (decimal not null)
Querying for data is very straightforward:
SELECT * FROM SpecimenData where specimenID = #specimenID
As long as you don't to access the the individual values in your queries, you can serialize the array and store it as a blob in the database.
Presumably you could serialize the decimal array in C# to a byte array, and save that in a binary field on a table. Your table would have two fields: SpecimenID, DecimalArrayBytes
Alternately you could have a many to many type table and not store the array in one piece, having fields: SpecimenID, DecimalValue, and use SQL like
SELECT DecimalValue FROM Table WHERE SpecimenID = X
You can serialize the array and store it as a single chunk of xml/binary/json. Here is an example of serializing it as xml.
public static string Serialize<T>(T obj)
{
StringBuilder sb = new StringBuilder();
DataContractSerializer ser = new DataContractSerializer(typeof(T));
ser.WriteObject(XmlWriter.Create(sb), obj);
return sb.ToString();
}
You want two tables. One to store an index, the other to store the decimal values. Something like this:
create table arrayKey (
arrayId int identity(1,1) not null
)
create table arrayValue (
arrayID int not null,
sequence int identity(1,1) not null,
storedDecimal decimal(12,2) not null
)
Insert into arrayKey to get an ID to use. All of the decimal values would get stored into arrayValue using the ID and the decimal value to store. Insert them one at a time.
When you retrieve them, you can group them by arrayID so that they all come out together. If you need to retrieve them in the same order you stored them, sort by sequence.
Although any given example might be impractical, via programming you can engineer any shape of peg into any shape of hole.
You could serialize your data for storage in a varbinary, XML-ize it for storage into a SQL Server XML type, etc.
Pick a route to analyze and carefully consider. You can create custom CLR libraries for SQL as well so the virtual sky is the limit.

VARCHAR collation versus VARBINARY ordering in SQL Server 2000

I need to do some in-memory merging in C# of two sorted streams of strings coming from one or more SQL Server 2000 databases into a single sorted stream. These streams of data can be huge, so I don't want to pull both streams into memory. Instead, I need to keep one item at a time from each stream in memory and at each step, compare the current item from each stream, push the minimum onto the final stream, and pull the next item from the appropriate source stream. To do this correctly, though, the in-memory comparison has to match the collation of the database (consider the streams [A,B,C] and [A,B,C]: the correct merged sequence is [A,A,B,B,C,C], but if your in-memory comparison thinks C < B, your in-memory merge will yield A,A,B, at which point it will be looking at a B and a C, and will yield the C, resulting in an incorrectly sorted stream.)
So, my question is: is there any way to mimic any of the collations in SQL Server 2000 with a System.StringComparison enum in C# or vise-versa? The closest I've come is to use System.StringCompaison.Ordinal with the results of the database strings converted to VARBINARY with the standard VARBINARY ordering, which works, but I'd rather just add an "order by name collate X" clause to my SQL queries, where X is some collation that works exactly like the VARBINARY ordering, rather than converting all strings to VARBINARY as they leave the database and then back to strings as they come in memory.
Have a look at the StringComparer class. This provides for more robust character and string comparisons than you'll find with String.Compare. There are three sets of static instances (CurrentCulture, InvariantCulture, Ordinal) and case-insesitive versions of each. For more specialized cultures, you can use the StringComparer.Create() function to create a comparer tied to a particular culture.
With sql 2005 I know that the db engine does not make OS calls to do the sorting, the ordering rules are statically shipped with the db (may update with a service pack, but doesn't change with the OS). So I don't think you can safely say that a given set of application code can order the same way unless you have the same code as the db server, unless you use a binary collation.
But if you use a binary collation in the db and client code you should have no problem at all.
EDIT - any collation that ends in _BIN will give you binary sorting. The rest of the collation name will determine what code page is used for storing CHAR data, but will not affect the ordering. The _BIN means strictly binary sorting. See http://msdn.microsoft.com/en-us/library/ms143515(SQL.90).aspx

Categories