Related
Let's say we have one structure :
[StructLayout(LayoutKind.Explicit, Size=8)] // using System.Runtime.InteropServices;
public struct AirportHeader {
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
}
What I want to have : Both direct access to type string and int values, for the field Ident in this structure, without breaking the 8 bytes size of the structure, nor having to compute a string value each time from the int value.
The field Ident in that structure as int is interesting because I can fast compare with other idents if they match, other idents may come from datas that are unrelated to this structure, but are in the same int format.
Question : Is there a way to define a field that is not part of the struture layout ? Like :
[StructLayout(LayoutKind.Explicit, Size=8)]
public struct AirportHeader {
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
[NoOffset()] // <- is there something I can do the like of this
string _identStr;
public string IdentStr {
get { // EDIT ! missed the getter on this property
if (string.IsNullOrEmpty(_identStr)) _identStr =
System.Text.Encoding.ASCII.GetString(Ident.GetBytes());
// do the above only once. May use an extra private bool field to go faster.
return _identStr;
}
}
}
PS : I use pointers ('*' and '&', unsafe) because I need to deal with endianness (Local system, binary files/file format, network) and fast type conversions, fast arrays filling. I also use many flavours of Marshal methods (fixing structures on byte arrays), and a little of PInvoke and COM interop. Too bad some assemblies I'm dealing with doesn't have their dotNet counterpart yet.
TL;DR; For details only
The question is all it is about, I just don't know the answer. The following should answer most questions like "other approaches", or "why not do this instead", but could be ignored as the answer would be straightforward. Anyway, I preemptively put everything so it's clear from the start what am I trying to do. :)
Options/Workaround I'm currently using (or thinking of using) :
Create a getter (not a field) that computes the string value each time :
public string IdentStr {
get { return System.Text.Encoding.ASCII.GetString(Ident.GetBytes()); }
// where GetBytes() is an extension method that converts an int to byte[]
}
This approach, while doing the job, performs poorly : The GUI displays aircraft from a database of default flights, and injects other flights from the network with a refresh rate of one second (I should increase that to 5 seconds). I have around 1200 flights within a area, relating to 2400 airports (departure and arrival), meaning I have 2400 calls to the above code each second to display the ident in a DataGrid.
Create another struct (or class), which only purpose is to manage
data on GUI side, when not reading/writing to a stream or file. That means, read
the data with the explicit layout struct. Create another struct with
the string version of the field. Work with GUI. That will perform
better on an overall point of view, but, in the process of defining
structures for the game binaries, I'm already at 143 structures of
the kind (just with older versions of the game datas; there are a bunch I didn't write yet, and I plan to add structures for the newest datas types). ATM, more than half of them require one or more extra
fields to be of meaningful use. It's okay if I were the only one to use the assembly, but
other users will probably get lost with AirportHeader,
AirportHeaderEx, AirportEntry, AirportEntryEx,
AirportCoords, AirportCoordsEx.... I would avoid doing that.
Optimize option 1 to make computations perform faster (thanks to SO,
there are a bunch of ideas to look for - currently working on the idea). For the Ident field, I
guess I could use pointers (and I will). Already doing it for fields I must display in little endian and read/write in big
endian. There are other values, like 4x4 grid informations that are
packed in a single Int64 (ulong), that needs bit shifting to
expose the actual values. Same for GUIDs or objects pitch/bank/yaw.
Try to take advantage of overlapping fields (on study). That would work for GUIDs. Perhaps it may work for the Ident example, if MarshalAs can constrain the
value to an ASCII string. Then I just need to specify the same
FieldOffset, '0' in this case. But I'm unsure setting the field
value (entry.FieldStr = "FMEP";) actually uses the Marshal constrain on the managed code side. My undestanding is it will store the string in Unicode on managed side (?).
Furthermore, that wouldn't work for packed bits (bytes that contains
several values, or consecutive bytes hosting values that have to be
bit shifted). I believe it is impossible to specify value position, length and format
at bit level.
Why bother ? context :
I'm defining a bunch of structures to parse binary datas from array of bytes (IO.File.ReadAllBytes) or streams, and write them back, datas related to a game. Application logic should use the structures to quickly access and manipulate the datas on demand. Assembly expected capabilities is read, validate, edit, create and write, outside the scope of the game (addon building, control) and inside the scope of the game (API, live modding or monitoring). Other purpose is to understand the content of binaries (hex) and make use of that understanding to build what's missing in the game.
The purpose of the assembly is to provide a ready to use basis components for a c# addon contributor (I don't plan to make the code portable). Creating applications for the game or processing addon from source to compilation into game binaries. It's nice to have a class that loads the entire content of a file in memory, but some context require you to not do that, and only retrieve from the file what is necessary, hence the choice of the struct pattern.
I need to figure out the trust and legal issues (copyrighted data) but that's outside the scope of the main concern. If that matter, Microsoft did provide over the years public freely accessible SDKs exposing binaries structures on previous versions of the game, for the purpose of what I'm doing (I'm not the first and probably not the last to do so). Though, I wouldn't dare to expose undocumented binaries (for the latest game datas for instance), nor facilitate a copyright breach on copyrighted materials/binaries.
I'm just asking confirmation if there is a way or not to have private fields not being part of the structure layout. Naive belief ATM is "that's impossible, but there are workarounds". It's just that my c# experience is pretty sparce, so maybe I'm wrong, why I ask. Thanks !
As suggested, there are several ways to get the job done. Here are the getters/setters I came up with within the structure. I'll measure how each code performs on various scenarios later. The dict approach is very seducing as on many scenarios, I would need a directly accessible global database of (59000) airports with runways and parking spots (not just the Ident), but a fast check between struct fields is also interesting.
public string IdentStr_Marshal {
get {
var output = "";
GCHandle pinnedHandle; // CS0165 for me (-> c# v5)
try { // Fast if no exception, (very) slow if exception thrown
pinnedHandle = GCHandle.Alloc(this, GCHandleType.Pinned);
IntPtr structPtr = pinnedHandle.AddrOfPinnedObject();
output = Marshal.PtrToStringAnsi(structPtr, 4);
// Cannot use UTF8 because the assembly should work in Framework v4.5
} finally { if (pinnedHandle.IsAllocated) pinnedHandle.Free(); }
return output;
}
set {
value.PadRight(4); // Must fill the blanks - initial while loop replaced (Charlieface's)
IntPtr intValuePtr = IntPtr.Zero;
// Cannot use UTF8 because some users are on Win7 with FlightSim 2004
try { // Put a try as a matter of habit, but not convinced it's gonna throw.
intValuePtr = Marshal.StringToHGlobalAnsi(value);
Ident = Marshal.ReadInt32(intValuePtr, 0).BinaryConvertToUInt32(); // Extension method to convert type.
} finally { Marshal.FreeHGlobal(intValuePtr); // freeing the right pointer }
}
}
public unsafe string IdentStr_Pointer {
get {
string output = "";
fixed (UInt32* ident = &Ident) { // Fixing the field
sbyte* bytes = (sbyte*)ident;
output = new string(bytes, 0, 4, System.Text.Encoding.ASCII); // Encoding added (#Charlieface)
}
return output;
}
set {
// value must not exceed a length of 4 and must be in Ansi [A-Z,0-9,whitespace 0x20].
// value validation at this point occurs outside the structure.
fixed (UInt32* ident = &Ident) { // Fixing the field
byte* bytes = (byte*)ident;
byte[] asciiArr = System.Text.Encoding.ASCII.GetBytes(value);
if (asciiArr.Length >= 4) // (asciiArr.Length == 4) would also work
for (Int32 i = 0; i < 4; i++) bytes[i] = asciiArr[i];
else {
for (Int32 i = 0; i < asciiArr.Length; i++) bytes[i] = asciiArr[i];
for (Int32 i = asciiArr.Length; i < 4; i++) bytes[i] = 0x20;
}
}
}
}
static Dictionary<UInt32, string> ps_dict = new Dictionary<UInt32, string>();
public string IdentStr_StaticDict {
get {
string output; // logic update with TryGetValue (#Charlieface)
if (ps_dict.TryGetValue(Ident, out output)) return output;
output = System.Text.Encoding.ASCII.GetString(Ident.ToBytes(EndiannessType.LittleEndian));
ps_dict.Add(Ident, output);
return output;
}
set { // input can be "FMEE", "DME" or "DK". length of 2 characters is the minimum.
var bytes = new byte[4]; // Need to convert value to a 4 byte array
byte[] asciiArr = System.Text.Encoding.ASCII.GetBytes(value); // should be 4 bytes or less
// Put the valid ASCII codes in the array.
if (asciiArr.Length >= 4) // (asciiArr.Length == 4) would also work
for (Int32 i = 0; i < 4; i++) bytes[i] = asciiArr[i];
else {
for (Int32 i = 0; i < asciiArr.Length; i++) bytes[i] = asciiArr[i];
for (Int32 i = asciiArr.Length; i < 4; i++) bytes[i] = 0x20;
}
Ident = BitConverter.ToUInt32(bytes, 0); // Set structure int value
if (!ps_dict.ContainsKey(Ident)) // Add if missing
ps_dict.Add(Ident, System.Text.Encoding.ASCII.GetString(bytes));
}
}
As mentioned by others, it is not possible to exclude a field from a struct for marshalling.
You also cannot use a pointer as a string in most places.
If the number of different possible strings is relatively small (and it probably will be, given it's only 4 characters), then you could use a static Dictionary<int, string> as a kind of string-interning mechanism.
Then you write a property to add/retrieve the real string.
Note that dictionary access is O(1), and hashing an int just returns itself, so it will be very, very fast, but will take up some memory.
[StructLayout(LayoutKind.Explicit, Size=8)]
public struct AirportHeader
{
[FieldOffset(0)]
[MarshalAs(UnmanagedType.I4)]
public int Ident; // a 4 bytes ASCII : "FIMP" { 0x46, 0x49, 0x4D, 0x50 }
[FieldOffset(4)]
[MarshalAs(UnmanagedType.I4)]
public int Offset;
static Dictionary<int, string> _identStrings = new Dictionary<int, string>();
public string IdentStr =>
_identStrings.TryGetValue(Ident, out var ret) ? ret :
(_identStrings[Ident] = Encoding.ASCII.GetString(Ident.GetBytes());
}
This is not possible because a structure must contain all of its values in a specific order. Usually this order is controlled by the CLR itself. If you want to change the order of the data order, you can use the StructLayout. However, you cannot exclude a field or that data would simply not exist in memory.
Instead of a string (which is a reference type) you can use a pointer to point directly to that string and use that in your structure in combination with the StructLayout. To get this string value, you can use a get-only property that reads directly from unmanaged memory.
I know that strings are immutable and any changes to a string simply creates a new string in memory (and marks the old one as free). However, I'm wondering if my logic below is sound in that you actually can, in a round-a-bout fashion, modify the contents of a string.
const string baseString = "The quick brown fox jumps over the lazy dog!";
//initialize a new string
string candidateString = new string('\0', baseString.Length);
//Pin the string
GCHandle gcHandle = GCHandle.Alloc(candidateString, GCHandleType.Pinned);
//Copy the contents of the base string to the candidate string
unsafe
{
char* cCandidateString = (char*) gcHandle.AddrOfPinnedObject();
for (int i = 0; i < baseString.Length; i++)
{
cCandidateString[i] = baseString[i];
}
}
Does this approach indeed change the contents candidateString (without creating a new candidateString in memory) or does the runtime see through my tricks and treat it as a normal string?
Your example works just fine, thanks to several elements:
candidateString lives in the managed heap, so it's safe to modify. Compare this with baseString, which is interned. If you try to modify the interned string, unexpected things may happen. There's no guarantee that string won't live in write-protected memory at some point, although it seems to work today. That would be pretty similar to assigning a constant string to a char* variable in C and then modifying it. In C, that's undefined behavior.
You preallocate enough space in candidateString - so you're not overflowing the buffer.
Character data is not stored at offset 0 of the String class. It's stored at an offset equal to RuntimeHelpers.OffsetToStringData.
public static int OffsetToStringData
{
// This offset is baked in by string indexer intrinsic, so there is no harm
// in getting it baked in here as well.
[System.Runtime.Versioning.NonVersionable]
get {
// Number of bytes from the address pointed to by a reference to
// a String to the first 16-bit character in the String. Skip
// over the MethodTable pointer, & String
// length. Of course, the String reference points to the memory
// after the sync block, so don't count that.
// This property allows C#'s fixed statement to work on Strings.
// On 64 bit platforms, this should be 12 (8+4) and on 32 bit 8 (4+4).
#if WIN32
return 8;
#else
return 12;
#endif // WIN32
}
}
Except...
GCHandle.AddrOfPinnedObject is special cased for two types: string and array types. Instead of returning the address of the object itself, it lies and returns the offset to the data. See the source code in CoreCLR.
// Get the address of a pinned object referenced by the supplied pinned
// handle. This routine assumes the handle is pinned and does not check.
FCIMPL1(LPVOID, MarshalNative::GCHandleInternalAddrOfPinnedObject, OBJECTHANDLE handle)
{
FCALL_CONTRACT;
LPVOID p;
OBJECTREF objRef = ObjectFromHandle(handle);
if (objRef == NULL)
{
p = NULL;
}
else
{
// Get the interior pointer for the supported pinned types.
if (objRef->GetMethodTable() == g_pStringClass)
p = ((*(StringObject **)&objRef))->GetBuffer();
else if (objRef->GetMethodTable()->IsArray())
p = (*((ArrayBase**)&objRef))->GetDataPtr();
else
p = objRef->GetData();
}
return p;
}
FCIMPLEND
In summary, the runtime lets you play with its data and doesn't complain. You're using unsafe code after all. I've seen worse runtime messing than that, including creating reference types on the stack ;-)
Just remember to add one additional \0 after all the characters (at offset Length) if your final string is shorter than what's allocated. This won't overflow, each string has an implicit null character at the end to ease interop scenarios.
Now take a look at how StringBuilder creates a string, here's StringBuilder.ToString:
[System.Security.SecuritySafeCritical] // auto-generated
public override String ToString() {
Contract.Ensures(Contract.Result<String>() != null);
VerifyClassInvariant();
if (Length == 0)
return String.Empty;
string ret = string.FastAllocateString(Length);
StringBuilder chunk = this;
unsafe {
fixed (char* destinationPtr = ret)
{
do
{
if (chunk.m_ChunkLength > 0)
{
// Copy these into local variables so that they are stable even in the presence of race conditions
char[] sourceArray = chunk.m_ChunkChars;
int chunkOffset = chunk.m_ChunkOffset;
int chunkLength = chunk.m_ChunkLength;
// Check that we will not overrun our boundaries.
if ((uint)(chunkLength + chunkOffset) <= ret.Length && (uint)chunkLength <= (uint)sourceArray.Length)
{
fixed (char* sourcePtr = sourceArray)
string.wstrcpy(destinationPtr + chunkOffset, sourcePtr, chunkLength);
}
else
{
throw new ArgumentOutOfRangeException("chunkLength", Environment.GetResourceString("ArgumentOutOfRange_Index"));
}
}
chunk = chunk.m_ChunkPrevious;
} while (chunk != null);
}
}
return ret;
}
Yes, it uses unsafe code, and yes, you can optimize yours by using fixed, as this type of pinning is much more lightweight than allocating a GC handle:
const string baseString = "The quick brown fox jumps over the lazy dog!";
//initialize a new string
string candidateString = new string('\0', baseString.Length);
//Copy the contents of the base string to the candidate string
unsafe
{
fixed (char* cCandidateString = candidateString)
{
for (int i = 0; i < baseString.Length; i++)
cCandidateString[i] = baseString[i];
}
}
When you use fixed, the GC only discovers an object needs to be pinned when it stumbles upon it during a collection. If there's no collection going on, the GC isn't even involved. When you use GCHandle, a handle is registered in the GC each time.
As others have pointed out, mutating the String objects is useful in some rare cases. I give an example with a useful code snippet below.
Use-case/background
Although everyone should be a huge fan of the really excellent character Encoding support that .NET has always offered, sometimes it might be preferable to cut down that overhead, especially if doing a lot of roundtripping between 8-bit (legacy) characters and managed strings (i.e. typically interop scenarios).
As I hinted, .NET is particularly emphatic that you must explicitly specify a text Encoding for any/all conversions of non-Unicode character data to/from managed String objects. This rigorous control at the periphery is really commendable, since it ensures that once you have the string inside the managed runtime you never have to worry; everything is just Unicode (technically, UCS-2).
For contrast, consider a certain other popular scripting language that famously botched this whole area, resulting in an ongoing saga of parallel 2.x and 3.x versions, all due to adding support for Unicode in the latter.
By enforcing Unicode once you're inside, .NET pushes all that mess to the interop boundary where it is done "once-and-for-all", but this philosophy entails that that encoding/decoding work is eager and exhaustive, as opposed to "lazy" and more under your program's control. Because of this, the .NET Encoding/Encoder classes can be a performance bottleneck. If you're moving lots of text from wide (Unicode) to simple fixed 7- or 8-bit narrow ANSI, ASCII, etc. (note I'm not talking about MBCS or UTF-8, where you'll want to use the Encoders!), the .NET encoding paradigm might seem like overkill.
Furthermore, it could be the case that you don't know, or don't care to, specify an Encoding. Maybe all you care about is fast and accurate round-tripping for that low-byte of a 16-bit Char. If you look at the .NET source code, even the System.Text.ASCIIEncoding might be too bulky in some situations.
The code snippet...
Thin String: 8-bit characters directly stored in a managed
String, one 'thin char' per wide Unicode character, without
bothering with character encoding/decoding during round-tripping.
All of these methods just ignore/strip the upper byte of each 16-bit Unicode character, transmitting only each low byte exactly as-is. Obviously, successful recovery of the Unicode text after a round-trip will only be possible if those upper bits aren't relevant.
/// <summary> Convert byte array to "thin string" </summary>
public static unsafe String ToThinString(this byte[] src)
{
int c;
var ret = String.Empty;
if ((c = src.Length) > 0)
fixed (char* dst = (ret = new String('\0', c)))
do
dst[--c] = (char)src[c]; // fill new String by in-situ mutation
while (c > 0);
return ret;
}
In the direction just shown, which is typically bringing native data in to managed, you often don't have the managed byte array, so rather than allocate a temporary one just for the purpose of calling this function, you can process the raw native bytes directly into a managed string. As before, this bypasses all character encoding.
The (obvious) range checks that would be needed in this unsafe function are elided for clarity:
public static unsafe String ToThinString(byte* pSrc, int c)
{
var ret = String.Empty;
if (c > 0)
fixed (char* dst = (ret = new String('\0', c)))
do
dst[--c] = (char)pSrc[c]; // fill new String by in-situ mutation
while (c > 0);
return ret;
}
The advantage of String mutation here is that you avoid temporary allocations by writing directly to the final allocation. Even if you were to avoid the extra allocation by using stackalloc, there would be an unnecessary re-copying of the whole thing when you eventually call the String(Char*, int, int) constructor: clearly there's no way to associate data you just laboriously prepared with a String object that didn't exist until you were finished!
For completeness...
Here's the mirror-code which reverses operation to get back a byte array (even though this direction doesn't happen to illustrate the string-mutation technique). This is the direction you'd typically use to send Unicode text out of the managed .NET runtime, for use by a legacy app.
/// <summary> Convert "thin string" to byte array </summary>
public static unsafe byte[] ToByteArr(this String src)
{
int c;
byte[] ret = null;
if ((c = src.Length) > 0)
fixed (byte* dst = (ret = new byte[c]))
do
dst[--c] = (byte)src[c];
while (c > 0);
return ret ?? new byte[0];
}
I just tested to change a string literal and the results are very scary
var f1 = "paul";
var f2 = "paul";
fixed (char* bla = f1)
{
bla[0] = 'f';
}
This changes both f1 and f2 to "faul".
So i would never recommend to mess with the immutability of strings unless you
just created a new string instance and know exactly what you're doing.
Building off of my marshalling helloworld question, I'm running into issues marshalling an array allocated in C to C#. I've spent hours researching where I might be going wrong, but everything I've tried ends up with errors such as AccessViolationException.
The function that handles creating an array in C is below.
__declspec(dllexport) int __cdecl import_csv(char *path, struct human ***persons, int *numPersons)
{
int res;
FILE *csv;
char line[1024];
struct human **humans;
csv = fopen(path, "r");
if (csv == NULL) {
return errno;
}
*numPersons = 0; // init to sane value
/*
* All I'm trying to do for now is get more than one working.
* Starting with 2 seems reasonable. My test CSV file only has 2 lines.
*/
humans = calloc(2, sizeof(struct human *));
if (humans == NULL)
return ENOMEM;
while (fgets(line, 1024, csv)) {
char *tmp = strdup(line);
struct human *person;
humans[*numPersons] = calloc(1, sizeof(*person));
person = humans[*numPersons]; // easier to work with
if (person == NULL) {
return ENOMEM;
}
person->contact = calloc(1, sizeof(*(person->contact)));
if (person->contact == NULL) {
return ENOMEM;
}
res = parse_human(line, person);
if (res != 0) {
return res;
}
(*numPersons)++;
}
(*persons) = humans;
fclose(csv);
return 0;
}
The C# code:
IntPtr humansPtr = IntPtr.Zero;
int numHumans = 0;
HelloLibrary.import_csv(args[0], ref humansPtr, ref numHumans);
HelloLibrary.human[] humans = new HelloLibrary.human[numHumans];
IntPtr[] ptrs = new IntPtr[numHumans];
IntPtr aIndex = (IntPtr)Marshal.PtrToStructure(humansPtr, typeof(IntPtr));
// Populate the array of IntPtr
for (int i = 0; i < numHumans; i++)
{
ptrs[i] = new IntPtr(aIndex.ToInt64() +
(Marshal.SizeOf(typeof(IntPtr)) * i));
}
// Marshal the array of human structs
for (int i = 0; i < numHumans; i++)
{
humans[i] = (HelloLibrary.human)Marshal.PtrToStructure(
ptrs[i],
typeof(HelloLibrary.human));
}
// Use the marshalled data
foreach (HelloLibrary.human human in humans)
{
Console.WriteLine("first:'{0}'", human.first);
Console.WriteLine("last:'{0}'", human.last);
HelloLibrary.contact_info contact = (HelloLibrary.contact_info)Marshal.
PtrToStructure(human.contact, typeof(HelloLibrary.contact_info));
Console.WriteLine("cell:'{0}'", contact.cell);
Console.WriteLine("home:'{0}'", contact.home);
}
The first human struct gets marshalled fine. I get the access violation exceptions after the first one. I feel like I'm missing something with marshalling structs with struct pointers inside them. I hope I have some simple mistake I'm overlooking. Do you see anything wrong with this code?
See this GitHub gist for full source.
// Populate the array of IntPtr
This is where you went wrong. You are getting back a pointer to an array of pointers. You got the first one correct, actually reading the pointer value from the array. But then your for() loop got it wrong, just adding 4 (or 8) to the first pointer value. Instead of reading them from the array. Fix:
IntPtr[] ptrs = new IntPtr[numHumans];
// Populate the array of IntPtr
for (int i = 0; i < numHumans; i++)
{
ptrs[i] = (IntPtr)Marshal.PtrToStructure(humansPtr, typeof(IntPtr));
humansPtr = new IntPtr(humansPtr.ToInt64() + IntPtr.Size);
}
Or much more cleanly since marshaling arrays of simple types is already supported:
IntPtr[] ptrs = new IntPtr[numHumans];
Marshal.Copy(humansPtr, ptrs, 0, numHumans);
I found the bug by using the Debug + Windows + Memory + Memory 1. Put humansPtr in the Address field, switched to 4-byte integer view and observed that the C code was doing it correctly. Then quickly found out that ptrs[] did not contain the values I saw in the Memory window.
Not sure why you are writing code like this, other than as a mental exercise. It is not the correct way to go about it, you are for example completely ignoring the need to release the memory again. Which is very nontrivial. Parsing CSV files in C# is quite simple and just as fast as doing it in C, it is I/O bound, not execute-bound. You'll easily avoid these almost impossible to debug bugs and get lots of help from the .NET Framework.
I have just switched to c# from c++. I have already done some task in c++ and the same now i have to translate in c#.
I am going through some problems.
I have to find the frequency of symbols in binary files (which is taken as sole argument, so don't know it's size/length).(these frequency will be further used to create huffman tree).
My code to do that in c++ is below :
My structure is like this:
struct Node
{
unsigned int symbol;
int freq;
struct Node * next, * left, * right;
};
Node * tree;
And how i read the file is like this :
FILE * fp;
fp = fopen(argv, "rb");
ch = fgetc(fp);
while (fread( & ch, sizeof(ch), 1, fp)) {
create_frequency(ch);
}
fclose(fp);
Could any one please help me in translating the same in c# (specially this binary file read procedure to create frequency of symbols and storing in linked list)? Thanks for the help
Edit: Tried to write the code according to what Henk Holterman explained below but still there is error and the error is :
error CS1501: No overload for method 'Open' takes '1' arguments
/usr/lib/mono/2.0/mscorlib.dll (Location of the symbol related to previous error)
shekhar_c#.cs(22,32): error CS0825: The contextual keyword 'var' may only appear within a local variable declaration
Compilation failed: 2 error(s), 0 warnings
And my code to do this is:
static void Main(string[] args)
{
// using provides exception-safe closing
using (var fp = System.IO.File.Open(args))
{
int b; // note: not a byte
while ((b = fp.Readbyte()) >= 0)
{
byte ch = (byte) b;
// now use the byte in 'ch'
//create_frequency(ch);
}
}
}
And the line corresponding to the two errors is :
using (var fp = System.IO.File.Open(args))
could some one please help me ? I am beginner to c#
string fileName = ...
using (var fp = System.IO.File.OpenRead(fileName)) // using provides exception-safe closing
{
int b; // note: not a byte
while ((b = fp.ReadByte()) >= 0)
{
byte ch = (byte) b;
// now use the byte in 'ch'
create_frequency(ch);
}
}
Compiling the following code gives the error message: type illegal.
int main()
{
// Compilation error - switch expression of type illegal
switch(std::string("raj"))
{
case"sda":
}
}
You cannot use string in either switch or case. Why? Is there any solution that works nicely to support logic similar to switch on strings?
The reason why has to do with the type system. C/C++ doesn't really support strings as a type. It does support the idea of a constant char array but it doesn't really fully understand the notion of a string.
In order to generate the code for a switch statement the compiler must understand what it means for two values to be equal. For items like ints and enums, this is a trivial bit comparison. But how should the compiler compare 2 string values? Case sensitive, insensitive, culture aware, etc ... Without a full awareness of a string this cannot be accurately answered.
Additionally, C/C++ switch statements are typically generated as branch tables. It's not nearly as easy to generate a branch table for a string style switch.
As mentioned previously, compilers like to build lookup tables that optimize switch statements to near O(1) timing whenever possible. Combine this with the fact that the C++ Language doesn't have a string type - std::string is part of the Standard Library which is not part of the Language per se.
I will offer an alternative that you might want to consider, I've used it in the past to good effect. Instead of switching over the string itself, switch over the result of a hash function that uses the string as input. Your code will be almost as clear as switching over the string if you are using a predetermined set of strings:
enum string_code {
eFred,
eBarney,
eWilma,
eBetty,
...
};
string_code hashit (std::string const& inString) {
if (inString == "Fred") return eFred;
if (inString == "Barney") return eBarney;
...
}
void foo() {
switch (hashit(stringValue)) {
case eFred:
...
case eBarney:
...
}
}
There are a bunch of obvious optimizations that pretty much follow what the C compiler would do with a switch statement... funny how that happens.
C++
constexpr hash function:
constexpr unsigned int hash(const char *s, int off = 0) {
return !s[off] ? 5381 : (hash(s, off+1)*33) ^ s[off];
}
switch( hash(str) ){
case hash("one") : // do something
case hash("two") : // do something
}
Update:
The example above is C++11. There constexpr function must be with single statement. This was relaxed in next C++ versions.
In C++14 and C++17 you can use following hash function:
constexpr uint32_t hash(const char* data, size_t const size) noexcept{
uint32_t hash = 5381;
for(const char *c = data; c < data + size; ++c)
hash = ((hash << 5) + hash) + (unsigned char) *c;
return hash;
}
Also C++17 have std::string_view, so you can use it instead of const char *.
In C++20, you can try using consteval.
C++ 11 update of apparently not #MarmouCorp above but http://www.codeguru.com/cpp/cpp/cpp_mfc/article.php/c4067/Switch-on-Strings-in-C.htm
Uses two maps to convert between the strings and the class enum (better than plain enum because its values are scoped inside it, and reverse lookup for nice error messages).
The use of static in the codeguru code is possible with compiler support for initializer lists which means VS 2013 plus. gcc 4.8.1 was ok with it, not sure how much farther back it would be compatible.
/// <summary>
/// Enum for String values we want to switch on
/// </summary>
enum class TestType
{
SetType,
GetType
};
/// <summary>
/// Map from strings to enum values
/// </summary>
std::map<std::string, TestType> MnCTest::s_mapStringToTestType =
{
{ "setType", TestType::SetType },
{ "getType", TestType::GetType }
};
/// <summary>
/// Map from enum values to strings
/// </summary>
std::map<TestType, std::string> MnCTest::s_mapTestTypeToString
{
{TestType::SetType, "setType"},
{TestType::GetType, "getType"},
};
...
std::string someString = "setType";
TestType testType = s_mapStringToTestType[someString];
switch (testType)
{
case TestType::SetType:
break;
case TestType::GetType:
break;
default:
LogError("Unknown TestType ", s_mapTestTypeToString[testType]);
}
The problem is that for reasons of optimization the switch statement in C++ does not work on anything but primitive types, and you can only compare them with compile time constants.
Presumably the reason for the restriction is that the compiler is able to apply some form of optimization compiling the code down to one cmp instruction and a goto where the address is computed based on the value of the argument at runtime. Since branching and and loops don't play nicely with modern CPUs, this can be an important optimization.
To go around this, I am afraid you will have to resort to if statements.
std::map + C++11 lambdas pattern without enums
unordered_map for the potential amortized O(1): What is the best way to use a HashMap in C++?
#include <functional>
#include <iostream>
#include <string>
#include <unordered_map>
#include <vector>
int main() {
int result;
const std::unordered_map<std::string,std::function<void()>> m{
{"one", [&](){ result = 1; }},
{"two", [&](){ result = 2; }},
{"three", [&](){ result = 3; }},
};
const auto end = m.end();
std::vector<std::string> strings{"one", "two", "three", "foobar"};
for (const auto& s : strings) {
auto it = m.find(s);
if (it != end) {
it->second();
} else {
result = -1;
}
std::cout << s << " " << result << std::endl;
}
}
Output:
one 1
two 2
three 3
foobar -1
Usage inside methods with static
To use this pattern efficiently inside classes, initialize the lambda map statically, or else you pay O(n) every time to build it from scratch.
Here we can get away with the {} initialization of a static method variable: Static variables in class methods , but we could also use the methods described at: static constructors in C++? I need to initialize private static objects
It was necessary to transform the lambda context capture [&] into an argument, or that would have been undefined: const static auto lambda used with capture by reference
Example that produces the same output as above:
#include <functional>
#include <iostream>
#include <string>
#include <unordered_map>
#include <vector>
class RangeSwitch {
public:
void method(std::string key, int &result) {
static const std::unordered_map<std::string,std::function<void(int&)>> m{
{"one", [](int& result){ result = 1; }},
{"two", [](int& result){ result = 2; }},
{"three", [](int& result){ result = 3; }},
};
static const auto end = m.end();
auto it = m.find(key);
if (it != end) {
it->second(result);
} else {
result = -1;
}
}
};
int main() {
RangeSwitch rangeSwitch;
int result;
std::vector<std::string> strings{"one", "two", "three", "foobar"};
for (const auto& s : strings) {
rangeSwitch.method(s, result);
std::cout << s << " " << result << std::endl;
}
}
To add a variation using the simplest container possible (no need for an ordered map)... I wouldn't bother with an enum--just put the container definition immediately before the switch so it'll be easy to see which number represents which case.
This does a hashed lookup in the unordered_map and uses the associated int to drive the switch statement. Should be quite fast. Note that at is used instead of [], as I've made that container const. Using [] can be dangerous--if the string isn't in the map, you'll create a new mapping and may end up with undefined results or a continuously growing map.
Note that the at() function will throw an exception if the string isn't in the map. So you may want to test first using count().
const static std::unordered_map<std::string,int> string_to_case{
{"raj",1},
{"ben",2}
};
switch(string_to_case.at("raj")) {
case 1: // this is the "raj" case
break;
case 2: // this is the "ben" case
break;
}
The version with a test for an undefined string follows:
const static std::unordered_map<std::string,int> string_to_case{
{"raj",1},
{"ben",2}
};
// in C++20, you can replace .count with .contains
switch(string_to_case.count("raj") ? string_to_case.at("raj") : 0) {
case 1: // this is the "raj" case
break;
case 2: // this is the "ben" case
break;
case 0: //this is for the undefined case
}
In C++ and C switches only work on integer types. Use an if else ladder instead. C++ could obviously have implemented some sort of swich statement for strings - I guess nobody thought it worthwhile, and I agree with them.
Why not? You can use switch implementation with equivalent syntax and same semantics.
The C language does not have objects and strings objects at all, but
strings in C is null terminated strings referenced by pointer.
The C++ language have possibility to make overload functions for
objects comparision or checking objects equalities.
As C as C++ is enough flexible to have such switch for strings for C
language and for objects of any type that support comparaison or check
equality for C++ language. And modern C++11 allow to have this switch
implementation enough effective.
Your code will be like this:
std::string name = "Alice";
std::string gender = "boy";
std::string role;
SWITCH(name)
CASE("Alice") FALL
CASE("Carol") gender = "girl"; FALL
CASE("Bob") FALL
CASE("Dave") role = "participant"; BREAK
CASE("Mallory") FALL
CASE("Trudy") role = "attacker"; BREAK
CASE("Peggy") gender = "girl"; FALL
CASE("Victor") role = "verifier"; BREAK
DEFAULT role = "other";
END
// the role will be: "participant"
// the gender will be: "girl"
It is possible to use more complicated types for example std::pairs or any structs or classes that support equality operations (or comarisions for quick mode).
Features
any type of data which support comparisions or checking equality
possibility to build cascading nested switch statemens.
possibility to break or fall through case statements
possibility to use non constatnt case expressions
possible to enable quick static/dynamic mode with tree searching (for C++11)
Sintax differences with language switch is
uppercase keywords
need parentheses for CASE statement
semicolon ';' at end of statements is not allowed
colon ':' at CASE statement is not allowed
need one of BREAK or FALL keyword at end of CASE statement
For C++97 language used linear search.
For C++11 and more modern possible to use quick mode wuth tree search where return statement in CASE becoming not allowed.
The C language implementation exists where char* type and zero-terminated string comparisions is used.
Read more about this switch implementation.
I think the reason is that in C strings are not primitive types, as tomjen said, think in a string as a char array, so you can not do things like:
switch (char[]) { // ...
switch (int[]) { // ...
In c++ strings are not first class citizens. The string operations are done through standard library. I think, that is the reason. Also, C++ uses branch table optimization to optimize the switch case statements. Have a look at the link.
http://en.wikipedia.org/wiki/Switch_statement
Late to the party, here's a solution I came up with some time ago, which completely abides to the requested syntax.
#include <uberswitch/uberswitch.hpp>
int main()
{
uswitch (std::string("raj"))
{
ucase ("sda"): /* ... */ break; //notice the parenthesis around the value.
}
}
Here's the code: https://github.com/falemagn/uberswitch
You could put the strings in an array and use a constexpr to convert them to indices at compile time.
constexpr const char* arr[] = { "bar", "foo" };
constexpr int index(const char* str) { /*...*/ }
do_something(std::string str)
{
switch(quick_index(str))
{
case index("bar"):
// ...
break;
case index("foo"):
// ...
break;
case -1:
default:
// ...
break;
}
For quick_index, which doesn't have to be constexpr, you could e.g. use an unordered_map to do it O(1) at runtime. (Or sort the array and use binary search, see here for an example.)
Here's a full example for C++11, with a simple custom constexpr string comparer. Duplicate cases and cases not in the array (index gives -1) will be detected at compile time. Missing cases are obviously not detected. Later C++ versions have more flexible constexpr expressions, allowing for simpler code.
#include <iostream>
#include <algorithm>
#include <unordered_map>
constexpr const char* arr[] = { "bar", "foo", "foobar" };
constexpr int cmp(const char* str1, const char* str2)
{
return *str1 == *str2 && (!*str1 || cmp(str1+1, str2+1));
}
constexpr int index(const char* str, int pos=0)
{
return pos == sizeof(arr)/sizeof(arr[0]) ? -1 : cmp(str, arr[pos]) ? pos : index(str,pos+1);
}
int main()
{
// initialize hash table once
std::unordered_map<std::string,int> lookup;
int i = 0;
for(auto s : arr) lookup[s] = i++;
auto quick_index = [&](std::string& s)
{ auto it = lookup.find(s); return it == lookup.end() ? -1 : it->second; };
// usage in code
std::string str = "bar";
switch(quick_index(str))
{
case index("bar"):
std::cout << "bartender" << std::endl;
break;
case index("foo"):
std::cout << "fighter" << std::endl;
break;
case index("foobar"):
std::cout << "fighter bartender" << std::endl;
break;
case -1:
default:
std::cout << "moo" << std::endl;
break;
}
}
hare's comment to Nick's solution is really cool. here the complete code example (in C++11):
constexpr uint32_t hash(const std::string& s) noexcept
{
uint32_t hash = 5381;
for (const auto& c : s)
hash = ((hash << 5) + hash) + (unsigned char)c;
return hash;
}
constexpr inline uint32_t operator"" _(char const* p, size_t) { return hash(p); }
std::string s = "raj";
switch (hash(s)) {
case "sda"_:
// do_something();
break;
default:
break;
}
In C++ you can only use a switch statement on int and char
cout << "\nEnter word to select your choice\n";
cout << "ex to exit program (0)\n";
cout << "m to set month(1)\n";
cout << "y to set year(2)\n";
cout << "rm to return the month(4)\n";
cout << "ry to return year(5)\n";
cout << "pc to print the calendar for a month(6)\n";
cout << "fdc to print the first day of the month(1)\n";
cin >> c;
cout << endl;
a = c.compare("ex") ?c.compare("m") ?c.compare("y") ? c.compare("rm")?c.compare("ry") ? c.compare("pc") ? c.compare("fdc") ? 7 : 6 : 5 : 4 : 3 : 2 : 1 : 0;
switch (a)
{
case 0:
return 1;
case 1: ///m
{
cout << "enter month\n";
cin >> c;
cout << endl;
myCalendar.setMonth(c);
break;
}
case 2:
cout << "Enter year(yyyy)\n";
cin >> y;
cout << endl;
myCalendar.setYear(y);
break;
case 3:
myCalendar.getMonth();
break;
case 4:
myCalendar.getYear();
case 5:
cout << "Enter month and year\n";
cin >> c >> y;
cout << endl;
myCalendar.almanaq(c,y);
break;
case 6:
break;
}
More functional workaround to the switch problem:
class APIHandlerImpl
{
// define map of "cases"
std::map<string, std::function<void(server*, websocketpp::connection_hdl, string)>> in_events;
public:
APIHandlerImpl()
{
// bind handler method in constructor
in_events["/hello"] = std::bind(&APIHandlerImpl::handleHello, this, _1, _2, _3);
in_events["/bye"] = std::bind(&APIHandlerImpl::handleBye, this, _1, _2, _3);
}
void onEvent(string event = "/hello", string data = "{}")
{
// execute event based on incomming event
in_events[event](s, hdl, data);
}
void APIHandlerImpl::handleHello(server* s, websocketpp::connection_hdl hdl, string data)
{
// ...
}
void APIHandlerImpl::handleBye(server* s, websocketpp::connection_hdl hdl, string data)
{
// ...
}
}
You can use switch on strings.
What you need is table of strings, check every string
char** strings[4] = {"Banana", "Watermelon", "Apple", "Orange"};
unsigned get_case_string(char* str, char** _strings, unsigned n)
{
while(n)
{
n--
if(strcmp(str, _strings[n]) == 0) return n;
}
return 0;
}
unsigned index = get_case_string("Banana", strings, 4);
switch(index)
{
case 1: break;/*Found string `Banana`*/
default: /*No string*/
}
You can't use string in switch case.Only int & char are allowed. Instead you can try enum for representing the string and use it in the switch case block like
enum MyString(raj,taj,aaj);
Use it int the swich case statement.
That's because C++ turns switches into jump tables. It performs a trivial operation on the input data and jumps to the proper address without comparing. Since a string is not a number, but an array of numbers, C++ cannot create a jump table from it.
movf INDEX,W ; move the index value into the W (working) register from memory
addwf PCL,F ; add it to the program counter. each PIC instruction is one byte
; so there is no need to perform any multiplication.
; Most architectures will transform the index in some way before
; adding it to the program counter
table ; the branch table begins here with this label
goto index_zero ; each of these goto instructions is an unconditional branch
goto index_one ; of code
goto index_two
goto index_three
index_zero
; code is added here to perform whatever action is required when INDEX = zero
return
index_one
...
(code from wikipedia https://en.wikipedia.org/wiki/Branch_table)
in many cases you can avid extra work by pulling the first char from the string and switching on that. may end up having to do a nested switch on charat(1) if your cases start with the same value. anyone reading your code would appreciate a hint though because most would prob just if-else-if
Switches only work with integral types (int, char, bool, etc.). Why not use a map to pair a string with a number and then use that number with the switch?