I'm working with a binary file format. I'll try to make this as simple as possible because it's quite hard to explain.
The data structures that get written to the file may contain 'pointers' (ie. a pointer to a string that is in another location in the file, or a pointer to another structure within the file. A better word for 'pointer' would be 'offset', ie. a structure contains the OFFSET of the string within the file).
A quick example:
struct ExampleStruct {
public string Text;
public int Number;
};
The 'Text' string member will be written at the beginning of the file, and NOT be included in the serialized struct.
So, essentially, the struct will look like this in the file:
struct ExampleStruct {
public uint TextLocationOffset;
public int Number;
};
...'TextLocationOffset' is an offset to where the string 'Text' is located within the file.
So, after I have that, I then need a "relocation table" - essentially a list of double pointers that point to data pointers within the file. (does that make sense?)
So, since I wrote that ExampleStruct to my file, and it contains a 'pointer' (TextLocationOffset), my "relocation table" would consist of:
public uint TextLocationOffset_LocationOffset;
...'TextLocationOffset_LocationOffset' contains the OFFSET of 'TextLocationOffset' within the file.
Does that all make sense? I tried to simplify it as much as possible.
My problem is, how would I keep track of all the pointers/double pointers/relocations in C#? Data is constantly being added to the byte[] array that I have, so offsets will be changing constantly.
This is easy in C++, because I can get a double pointer of whatever is being 'relocated', and then I can change the original 'pointer' (in my example, 'TextLocationOffset') to the correct offset, and I can then find the location of the 'TextLocationOffset' value and add that to my relocation table.
Sorry if that makes no sense. I tried asking this a few weeks ago but got no replies, I might be making it sound confusing.
I just need a way to keep track of all of these in my code... Any tips?
P.S. If you need more thorough examples I'll be happy to provide. :)
Use a database table - all this work has been done.
You may want to look at other ways to serialize and deserialize your data. Keeping track of relocations and offsets in a managed application is doing the unnecessary unless you have extremely exceptional scenarios. SO users could better guide you if you let us know more about what you are trying to achieve in terms of functionality.
Related
I need guidance, someone to point me in the right direction. As the tittle says, I need to save information to a file: Date, string, integer and an array of integers. And I also need to be able to access that information later, when an user wants to review it.
Optional: File is plain text and I can directly check it and it is understandable.
Bonus points if chosen method can be "easily" converted to working with a database in the future instead of individual files.
I'm pretty new to C# and what I've found so far is that I should turn the array into a string with separators.
So, what'd you guys suggest?
// JSON.Net
string json = JsonConvert.SerializeObject(objOrArray);
File.WriteAllText(path, json);
// (note: can also use File.Create etc if don't need the string in memory)
or...
using(var file = File.Create(path)) { // protobuf-net
Serializer.Serialize(file, objOrArray);
}
The first is readable; the second will be smaller. Both will cope fine with "Date, string, integer and an array of integers", or an array of such objects. Protobuf-net would require adding some attributes to help it, but really simple.
As for working with a database as columns... the array of integers is the glitch there, because most databases don't support "array of integers" as a column type. I'd say "separation of concerns" - have a separate model for DB persistence. If you are using the database purely to store documents, then: pretty much every DB will support CLOB and BLOB data, so either is usable. Many databases now have inbuilt JSON support (helper methods, etc), which might make JSON as a CLOB more tempting.
I would probably serialize this to json and save it somewhere. Json.Net is a very popular way.
The advantage of this is also creating a class that can be later used to work with an Object-Relational Mapper.
var userInfo = new UserInfoModel();
// write the data (overwrites)
using (var stream = new StreamWriter(#"path/to/your/file.json", append: false))
{
stream.Write(JsonConvert.SerializeObject(userInfo));
}
//read the data
using (var stream = new StreamReader(#"path/to/your/file.json"))
{
userInfo = JsonConvert.DeserializeObject<UserInfoModel>(stream.ReadToEnd());
}
public class UserInfoModel
{
public DateTime Date { get; set; }
// etc.
}
for the Plaintext File you're right.
Use 1 Line for each Entry:
Date
string
Integer
Array of Integer
If you read the File in your code you can easily seperate them by reading line to line.
Make a string with a specific Seperator out of the Array:
[1,2,3] -> "1,2,3"
When you read the line you can Split the String by "," and gets a Array of Strings. Parse each Entry to int into an Array of Int with the same length.
How to read and write the File get a look at Easiest way to read from and write to files
If you really wants the switch to a database at a point, try a JSON Format for your File. It is easy to handle and there are some good Plugins to work with.
Mfg
Henne
The way I got started with C# is via the game Space Engineers from the Steam Platform, the Mods need to save a file Locally (%AppData%\Local\Temp\SpaceEngineers\ or %AppData%\Roaming\SpaceEngineers\Storage\) for various settings, and their logging is similar to what #H. Sandberg mentioned (line by line, perhaps a separator to parse with later), the upside to this is that it's easy to retrieve, easy to append, easy to overwrite, and I'm pretty sure it's even possible to retrieve File Size, which when combined with File Deletion and File Creation can prevent runaway File Sizes as this allows you to set an Upper Limit to check against, allowing you to run it on a Server with minimal impact (probably best to include a minimum Date filter {make sure X is at least Y days old before deleting it for being over Z Bytes} to prevent Debugging Data Loss {"Why was it over that limit?"})
As far as the actual Code behind the idea, I'm approximately at the same Skill Level as the OP, which is to say; Rookie, but I would advise looking at the Coding in the Space Engineers Mods for some Samples (plus it's not half bad for a Beta Game), as they are almost all written in C#. Also, the Programmable Blocks compile in C# as well, so you'll be able to use that to both assist in learning C# and reinforce and utilize what you already know (although certain C# commands aren't allowed for security reasons, utilizing the Mod API you'll have more flexibility to do things such as Creating/Maintaining Log Files, Retrieving/Modifying Object Properties, etc.), You are even capable of printing Text to various in Game Text Monitors.
I apologise if my Syntax needs some work, and I'm sorry I am not currently capable of just whipping up some Code to solve your issue, but I do know
using System;
Console.WriteLine("Hello World");
so at least it's not a total loss, but my example Code likely won't compile, since it's likely missing things like: an Output Location, perhaps an API reference or two, and probably a few other settings. Like I said, I'm New, but that is a valid C# Command, I know I got that part correct.
Edit: here's a better attempt:
using System;
class Test
{
static void Main()
{
string a = "Hello Hal, ";
string b = "Please open the Airlock Doors.";
string c = "I'm sorry Dave, "
string d = "I'm afraid I can't do that."
Console.WriteLine(a + b);
Console.WriteLine(c + d);
Console.Read();
}
}
This:
"Hello Hal, Please open the Airlock Doors."
"I'm sorry Dave, I'm afraid I can't do that."
Should be the result. (the "Quotation Marks" shouldn't appear in the readout {the last Code Block}, that's simply to improve readability)
Implementing an ISO Creator ( as an intellectual exercise mostly ) I need to store cd data structure values.
For example, from page 47 of the specification https://www.cs.cmu.edu/~varun/cs315p/iso9660.pdf , in order to store a directory in ISO9660 format I need to store information about various byte fields
There are other tables, such as Path Tables that store fields and the total number of fields would be in hundreds.
So I could either
1 - Store these in .cs files , 'hard coding' essentially
public byte length=1;
public byte ExtendedAttributeLength = 0;
//and so on
2- I could store these as Constants in .cs files
3 -I could store these in XML files
4- I could even store these in a database table
Considering that it is unlikely for these values to change, but not impossible, I'm not sure which way i should store the values.
Thank you
I think you can define all those specs as structs with explicit layout. This way you can specify the offset for each field:
[StructLayout(LayoutKind.Explicit, Size=16, CharSet=CharSet.Ansi)]
public struct DirectoryRecord
{
[FieldOffset(0)]
byte RecordLength;
[FieldOffset(1)]
byte ExtendedAttrRecordLength;
...
}
You can then serialize these to byte arrays and presumably save them as parts of the ISO image header or whatever their place is in there.
However, if you look at one of the existing C# ISO9660 implementations such as .NET DiscUtils on CodePlex you will see that they do things a bit differently.
For DirectoryRecord they have defined a class with ReadFrom and WriteTo methods and it takes care of reading from appropriate offsets in an input stream. This is one option. On top of that they have some other component that reads the file and delegates reading to sub-components such as this one.
So, you could do it like them. Or you could do it with structs as I mentioned before and have them behave like POCOs only, not extra reading and writing logic. You'll have to do that somewhere else.
I'm working on a high performance code in which this construct is part of the performance critical section.
This is what happens in some section:
A string is 'scanned' and metadata is stored efficiently.
Based upon this metadata chunks of the main string are separated into a char[][].
That char[][] should be transferred into a string[].
Now, I know you can just call new string(char[]) but then the result would have to be copied.
To avoid this extra copy step from happening I guess it must be possible to write directly to the string's internal buffer. Even though this would be an unsafe operation (and I know this bring lots of implications like overflow, forward compatibility).
I've seen several ways of achieving this, but none I'm really satisfied with.
Does anyone have true suggestions as to how to achieve this?
Extra information:
The actual process doesn't include converting to char[] necessarily, it's practically a 'multi-substring' operation. Like 3 indexes and their lengths appended.
The StringBuilder has too much overhead for the small number of concats.
EDIT:
Due to some vague aspects of what it is exactly that I'm asking, let me reformulate it.
This is what happens:
Main string is indexed.
Parts of the main string are copied to a char[].
The char[] is converted to a string.
What I'd like to do is merge step 2 and 3, resulting in:
Main string is indexed.
Parts of the main string are copied to a string (and the GC can keep its hands off of it during the process by proper use of the fixed keyword?).
And a note is that I cannot change the output type from string[], since this is an external library, and projects depend on it (backward compatibility).
I think that what you are asking to do is to 'carve up' an existing string in-place into multiple smaller strings without re-allocating character arrays for the smaller strings. This won't work in the managed world.
For one reason why, consider what happens when the garbage collector comes by and collects or moves the original string during a compaction- all of those other strings 'inside' of it are now pointing at some arbitrary other memory, not the original string you carved them out of.
EDIT: In contrast to the character-poking involved in Ben's answer (which is clever but IMHO a bit scary), you can allocate a StringBuilder with a pre-defined capacity, which eliminates the need to re-allocate the internal arrays. See http://msdn.microsoft.com/en-us/library/h1h0a5sy.aspx.
What happens if you do:
string s = GetBuffer();
fixed (char* pch = s) {
pch[0] = 'R';
pch[1] = 'e';
pch[2] = 's';
pch[3] = 'u';
pch[4] = 'l';
pch[5] = 't';
}
I think the world will come to an end (Or at least the .NET managed portion of it), but that's very close to what StringBuilder does.
Do you have profiler data to show that StringBuilder isn't fast enough for your purposes, or is that an assumption?
Just create your own addressing system instead of trying to use unsafe code to map to an internal data structure.
Mapping a string (which is also readable as a char[]) to an array of smaller strings is no different from building a list of address information (index & length of each substring). So make a new List<Tuple<int,int>> instead of a string[] and use that data to return the correct string from your original, unaltered data structure. This could easily be encapsulated into something that exposed string[].
In .NET, there is no way to create an instance of String which shares data with another string. Some discussion on why that is appears in this comment from Eric Lippert.
As I am trying to code a C# application from an existing application but developed in Delphi,
Very tough but managed some how till, but now I have come across a problem...
Delphi code contains following code:
type
TFruit = record
name : string[20];
case isRound : Boolean of // Choose how to map the next section
True :
(diameter : Single); // Maps to same storage as length
False :
(length : Single; // Maps to same storage as diameter
width : Single);
end;
i.e. a variant record (with case statement inside) and accordingly record is constructed and its size too.
On the other hand I am trying to do the same in C# struct, and haven't succeeded yet, I hope somemone can help me here.
So just let me know if there's any way I can implement this in C#.
Thanks in advance....
You could use an explicit struct layout to replicate this Delphi variant record. However, I would not bother since it seems pretty unlikely that you really want assignment to diameter to assign also to length, and vice versa. That Delphi record declaration looks like it dates from mid-1990s style of Delphi coding. Modern Delphi code would seldom be written that way.
I would just do it like this:
struct Fruit
{
string name;
bool isRound;
float diameter; // only valid when isRound is true
float length; // only valid when isRound is false
float width; // only valid when isRound is false
}
A more elegant option would be a class with properties for each struct field. And you would arrange that the property getters and setters for the 3 floats raised exceptions if they were accessed for an invalid value of isRound.
Perhaps this will do the trick?
This is NOT a copy-and-paste solution, note that the offsets and data sizes may need to be changed depending on how the Delphi structure is declared and/or aligned.
[StructLayout(LayoutKind.Explicit)]
unsafe struct Fruit
{
[FieldOffset(0)] public fixed char name[20];
[FieldOffset(20)] public bool IsRound;
[FieldOffset(21)] public float Diameter;
[FieldOffset(21)] public float Length;
[FieldOffset(25)] public float Width;
}
It depends on what you are trying to do.
If you're simply trying to make a corresponding structure then look at David Heffernan's answer. These days there is little justification for mapping two fields on top of each other unless they truly represent the same thing. (Say, individual items or the same items in an array.)
If you're actually trying to share files you need to look along the lines of ananthonline's answer but there's a problem with it that's big enough I couldn't put it in a comment:
Not only is there the Unicode issue but a Delphi shortstring has no corresponding structure in C# and thus it's impossible to simply map a field on top of it.
That string[20] actually comprises 21 bytes, a one-byte length code and then 20 characters worth of data. You have to honor the length code as there is no guarantee of what lies beyond the specified length--you're likely to find garbage there. (Hint: If the record is going to be written to disk always zap the field before putting new data in it. It makes it much easier to examine the file on disk when debugging.)
Thus you need to declare two fields and write code to process it on both ends.
Since you have to do that anyway I would go further and write code to handle the rest of it so as to eliminate the need for unsafe code at all.
I've been looking to do some binary serialization to file and protobuf-net seems like a well-performing alternative. I'm a bit stuck in getting started though. Since I want to decouple the definition of the classes from the actual serialization I'm not using attributes but opting to go with .proto files, I've got the structure for the object down (I think)
message Post {
required uint64 id = 1;
required int32 userid = 2;
required string status= 3;
required datetime created = 4;
optional string source= 5;
}
(is datetime valid or should I use ticks as int64?)
but I'm stuck on how to use protogen and then serialize a IEnumerable of Post to a file and read it back. Any help would be appreciated
Another related question, is there any best practices for detecting corrupted binary files, like if the computer is shut down while serializing
Re DateTime... this isn't a standard proto type; I have added a BCL.DateTime (or similar) to my own library, which is intended to match the internal serialization that protobuf-net uses for DateTime, but I'm fairly certain I haven't (yet) updated the code-generator to detect this as a special-case. It would be fairly easy to add if you want me to try... If you want maximum portability, a "ticks" style approach might be pragmatic. Let me know...
Re serializing to a file - if should be about the same as the Getting Started example, but note that protobuf-net wants to work with data it can reconstruct; just IEnumerable<T> might cause problems - IList<T> should be fine, though (it'll default to List<T> as a concrete type when reconstructing).
Re corruption - perhaps use SerializeWithLengthPrefix - it can then detect issues even at a message boundary (where they are otherwise undetectable as an EOF). This (as the name suggests) writes the length first, so it knows whether is has enough data (via DeserializeWithLengthPrefix). Alternatively, reserve the first [n] bytes in your file for a hash / checksum. Write this blank spacer, then the data, calculate the hash / checksum and overwrite the start. Verify during deserialization. Much more work.