In C# you'd have a string, to append to the string I'll do the following:
//C#
string str="";
str += "Hello";
str += " world!"
//So str is now 'Hello world!'
But in C++ for AVR I use a const char *. How could I append to it?
const char * str="";
str += "Hello world!"; //This doesn't work, I get some weird data.
str = str + "Hello world!"; //This doesn't work either
NOTE: I'm working in Atmel Studio 6 programming an avr so I think the functionality used in C++ by most people is unavailable to use because I get build failures as soon as I try some examples I've seen online. I don't have the String data type either.
you really should dig into some C Tutorial or book and read the chapter about strings.
const char * str=""; creates a pointer to an empty string in the (constant) data segment.
str += "Hello world!":
string processing dos not work like this in C
the memory the pointer points to is constant you should not be able to modify it
adding something to a pointer will change the location the pointer points to (and not the data)
since you are on an AVR you should avoid dynamic memory.
defining an empty string constant does not make sense.
little example:
#define MAX_LEN 100
char someBuf[MAX_LEN] = ""; // create buffer of length 100 preinitilized with empty string
const char c_helloWorld[] = "Hello world!"; // defining string constant
strcat(someBuf, c_helloWorld); // this adds content of c_helloWorld at the end of somebuf
strcat(someBuf, c_helloWorld); // this adds content of c_helloWorld at the end of somebuf
// someBuf now contains "Hello world!Hello world!"
Additional excurse/explanation:
since the avr has harvard arcitecture it cannot (at least not without circumstances) read the program memory. So if you use string literals (like "Hello world!") they require doubled space by default. one instance of them is in the flash memory and in startup code they will be copied to SRAM. depending of your AVR this may matter! you can work around this and only store them in program memory by declaring Pointer using PROGMEM attribute (or something similar) but now you need to explicitly read them from flash at runtime by yourself.
From what I know, strings in C# are immutable, so the line
str += " world!"
actually creates a new string whose value is that of the original string, with " world" appended, and then makes str refer to that new string. There are no longer any references to the old string, so it gets garbage collected eventually.
But C-style strings are mutable, and are meant to be modified in place unless you explicitly copy them. So in fact if you have a const char*, you cannot modify the string at all, since const T* means the T data pointed to by the pointer can't be modified. Instead, you have to make a new string,
// In C, omit the static_cast<char*>; this is only necessary in C++.
char* new_str = static_cast<char*>(malloc(strlen(str)
+ strlen("Hello world!")
+ 1));
strcpy(new_str, str);
strcat(new_str, "Hello world!");
str = new_str;
// remember to free(str) at some point!
This is cumbersome and not very expressive, so if you are using C++ the obvious solution is to use std::string instead. Unlike the C# string, the C++ string has value semantics and is not immutable, but it can be appended to in a straightforward fashion, unlike the C string:
std::string str = "";
str += "Hello world!";
Again, if you mark the original string const, you won't be able to append to it without creating a new string.
Related
I've to convert a project from old VB6 to c#, the aim is to preserve the old code as much possible as I can, for a matter of time.
A function of the old project loads a binary file into a string variable, and then this variable is analyzezed in its single characters values with the asc function:
OLD VB Code:
Public Function LoadText(ByVal DirIn As String) As String
Dim FileBuffer As String
Dim LenghtFile As Long
Dim ContIN As Long
ContIN = FreeFile
Open DirIn For Binary Access Read As #ContIN
LenghtFile = LOF(ContIN)
FileBuffer = Space(LenghtFile)
Get #ContIN, , FileBuffer
Close #ContIN
LoadText = FileBuffer
'following line for test purpose
debug.print(asc(mid(filebuffer,1,1)))
debug.print(asc(mid(filebuffer,2,1)))
debug.print(asc(mid(filebuffer,3,1)))
End Function
SUB Main
dim testSTring as String
teststring=loadtext("e:\testme.bin")
end sub
Result in immediate window:
1
10
133
C# code:
public static string LoadText(string dirIn)
{
string myString, myString2;
FileStream fs = new FileStream(dirIn, FileMode.Open);
BinaryReader br = new BinaryReader(fs);
byte[] bin = br.ReadBytes(Convert.ToInt32(fs.Length));
//myString = Convert.ToBase64String(bin);
myString = Encoding.Default.GetString(bin);
string m1 = Encoding.Default.GetString(bin);
//string m1 = Encoding.ASCII.GetString(bin);
//string m1 = Encoding.BigEndianUnicode.GetString(bin);
//string m1 = Encoding.UTF32.GetString(bin);
//string m1 = Encoding.UTF7.GetString(bin);
//string m1 = Encoding.UTF8.GetString(bin);
//string m1 = Encoding.Unicode.GetString(bin);
//string m1 = Encoding.Unicode.GetString(bin);
Console.WriteLine(General.Asc(m1.Substring(0, 1)));
Console.WriteLine(General.Asc(m1.Substring(1, 1)));
Console.WriteLine(General.Asc(m1.Substring(2, 1)));
br.Close();
fs.Close();
return myString;
}
General class:
public static int Asc(string stringToEValuate)
{
return (int)stringToEValuate[0];
}
Result in output window:
1
10
8230 <--fail!
The string in VB6 has a length 174848, identical to the size of the test file.
In c# is the same size for DEFAUILT and ASCII encoding, while all the others has different size and i cannot use them unless i change everithing in the whole project.
The problem is that I can't find the correct encoding that permits to have a string which asc function returns identical numbers to the VB6 one.
The problem is all there, if the string is not identical I have to change a lot of lines of code, because the whole program is based on ASCii value and the position of it in the string.
Maybe it's the wrong way to load a binary into a string, or the Asc function..
If you want to try the example file you can download it from here:
http:// www.snokie.org / testme.bin
8230 is correct. It is a UTF-16 code unit for the Unicode codepoint (U+2026, which only needs one UTF-16 code unit). You expected 133. 133 as one byte is the encoding for the same character in at least one other character set: Windows-1252.
There is no text but encoded text.
When you read a text file you have to know the encoding that was used to write it. Once you read into a .NET String or Char, you have it in Unicode's UTF-16 encoding. Because Unicode is a superset of any character set you would be using, it is not incorrect.
If you don't want to compare characters as characters, read them as binary to keep it in them in the same encoding as the file. You can then compare the byte sequences.
The problem is that the VB6 code, rather than using Unicode for character code like it should have, used the "default ANSI" character set, which changes meaning from system to system and user to user.
The problem is this: "old project loads a binary file into a string variable". Yes, this was a common—but bad—VB6 practice. String datatypes are for text. Strings in VB6 are UTF-16 code unit sequences, just like in .NET (and Java, JavaScript, HTML, XML, …).
Get #ContIN, , FileBuffer converts from the system's default ANSI code page to UTF-16 and Asc converts it back again. So, you just have to do that in your .NET code, too.
Note: Just like in the VB6, Encoding.Default is hazardous because it can vary from system to system and user to user.
Reference Microsoft.VisualBasic.dll and
using static Microsoft.VisualBasic.Strings;
Then
var fileBuffer = File.ReadAllText(path, Encoding.Default);
Debug.WriteLine(Asc(Mid(fileBuffer, 3, 1));
If you'd rather not bring Microsoft.VisualBasic.dll into a C# project, you can write your own versions
static class VB6StringReplacements
{
static public Byte Asc(String source) =>
Encoding.Default.GetBytes(source.Substring(0,1)).FirstOrDefault();
static public String Mid(String source, Int32 offset, Int32 length) =>
source.Substring(offset, length);
}
and, change your using directive to
using static VB6StringReplacements;
I'm confused as a java dev trying his way into C#. I've read about the string type and it being immutable and such , not much different from java except that it doesn't seem to be an object like there but I'm getting weird behavior regardless. I have following toString method on a class
public override string ToString()
{
StringBuilder builder = new StringBuilder();
builder.Append("BlockType: ");
builder.Append(BlockType + "\n");
//builder.Append(System.Text.ASCIIEncoding.ASCII.GetChars(Convert.FromBase64String("dHh0AA==")));
//builder.Append("\n");
builder.Append("BlockName: ");
builder.Append(BlockName + "\n");
//builder.Append(System.Text.ASCIIEncoding.ASCII.GetChars(Convert.FromBase64String(this.BlockName)));
//builder.Append("\n");
builder.Append("BlockLength: " + this.BlockLength + "\n");
builder.Append("pBlockData: " + this.pBlockData + "\n");
return builder.ToString();
}
When I fill it with data. Taking in account that BlockType and BlockName will contain a Base64 String. I get following result
FileVersionNo: 0
nx: 1024
ny: 512
TileSize: 256
HorizScale: 10
Precis: 0,01
ExtHeaderLength: 35
nExtHeaderBlocks: 1
pExtHeaderBlocks: System.Collections.Generic.LinkedList`1[LibFhz.HfzExtHeaderBlock]
BlockType: dHh0AA==
BlockName: YXBwLW5hbWUAAAAAAAAAAA==
BlockLength: 11
pBlockData: System.Byte[]
Which is perfect exactly what I want, however when I try to get the ASCII value of those Base64 (or UTF-8, I tried both) I get the following result
FileVersionNo: 0
nx: 1024
ny: 512
TileSize: 256
HorizScale: 10
Precis: 0,01
ExtHeaderLength: 35
nExtHeaderBlocks: 1
pExtHeaderBlocks: System.Collections.Generic.LinkedList`1[LibFhz.HfzExtHeaderBlock]
BlockType: txt
The code just seems to stop, without error or stacktrace. I have no idea what is going on. I thought first that a \0 is missing so I've added it to the string, then I thought I need a \r\n ... again not the sollution, I started to google with people just wanting to know how to do a Bas64 to UTF-8 conversion ... but that part seems easy ... this code stop isn't.
Any insights or links to decent articles about string handling in .net would be appreciated
I've had a look at what you get from this:
var test = Convert.FromBase64String("YXBwLW5hbWUAAAAAAAAAAA==");
var builder = new StringBuilder();
builder.Append(System.Text.Encoding.ASCII.GetChars(test));
The answer is the string "app-name" with a load of null (0) characters at the end.
You could try removing all the null characters by adding this line just before you return builder.ToString():
builder.Replace("\0", null);
That may or may not help, depending on what you're doing with the returned string.
First
builder.Append("pBlockData: " + this.pBlockData + "\n");
Doesn't do what you think it does, specifically if pBlockData is a byte array you will get something like this (output from scriptcs):
> byte[] data = new byte[11];
> StringBuilder sb = new StringBuilder();
> sb.Append("data = ")
{Capacity:16,MaxCapacity:2147483647,Length:7}
> sb.Append(data);
{Capacity:32,MaxCapacity:2147483647,Length:20}
> sb.ToString()
data = System.Byte[]
Second C# strings (.NET strings in general) are UTF-16, so it doesn't really know how to handle displaying bytes. It doesn't matter if it is bas64 encoded or ASCII or French pickles ;-) the runtime just treats it as binary. Also null termination is not required, the length of the string is kept as a property of the string object.
So you need to turn the byte array you have into a UTF-16 character array, or string before you output it. If the byte array contains valid ASCII you can look into the 'System.Text.ASCIIEncoding.ASCII.GetDecoder().Convert' method as one way to accomplish this.
How I set new value for an string by index value?
I tried:
string a = "abc";
a[0] = "A";
not works for strings, but yes for chars. Why?
Strings in C# (and other .NET languages which use System.String in the base class library) are immutable. That is, you can't modify a string character by character that way (or for that matter, can you modify a string ever).
If you want to modify a string based on the index, you have to convert it to an array using System.String.ToCharArray() first. You convert it back to a string using System.String's constructor, passing in the modified array.
Your example would have to be changed to look like:
string a = "abc";
char[] array = a.ToCharArray();
array[0] = 'A'; //Note single quotes, not double quotes
a = new string(array);
The System.String type does not permit writing by index (or via any means -- to change a the content of a String variable, one must replace it with a reference to an entirely new String). The System.Text.StringBuilder type does, however, permit writing by index. One may create a new System.Text.StringBuilder object (optionally passing a string to the constructor), manipulate it, and then use its ToString method to convert it back to a string.
A replacement would be this:
string a = "abc";
a = a.Remove(0, 1);
a = a.Insert(0, "A");
or for the C say:
string a = "abc";
a = a.Remove(2, 1);
a = a.Insert(2, "C");
Also using a stringbuilder may work as per http://msdn.microsoft.com/en-us/library/362314fe.aspx
StringBuilder sb = new StringBuilder("abc");
sb[0] = 'A';
sb[2] = 'C';
string str = sb.ToString();
Use StringBuilder if you need a mutable String.
Also: a[0] can represent one character while "A" is a String object-it is illegal.
a[0] for a character is a address in memory to which you can assign a value.
string on the other hand is a class and in this case the a[0] is actually a function call to the overloaded operator[]. You can't assign values to functions.
Assume I have the following string constants:
const string constString1 = "Const String 1";
const string constString2 = "Const String 2";
const string constString3 = "Const String 3";
const string constString4 = "Const String 4";
Now I can append the strings in two ways:
Option1:
string resultString = constString1 + constString2 + constString3 + constString4;
Option2:
string resultString = string.Format("{0}{1}{2}{3}",constString1,constString2,constString3,constString4);
Internally string.Format uses StringBuilder.AppendFormat. Now given the fact that I am appending constant strings, which of the options (option1 or option 2) is better with respect to performance and/or memory?
The first one will be done by the compiler (at least the Microsoft C# Compiler) (in the same way that the compiler does 1+2), the second one must be done at runtime. So clearly the first one is faster.
As an added benefit, in the first one the string is internalized, in the second one it isn't.
And String.Format is quite slow :-) (read this
http://msmvps.com/blogs/jon_skeet/archive/2008/10/06/formatting-strings.aspx). NOT "slow enough to be a problem", UNLESS all your program do all the day is format strings (MILLIONS of them, not TENS). Then you could probably to it faster Appending them to a StringBuilder.
The first variant will be best, but only when you are using constant strings.
There are two compilator optimizations (from the C# compiler, not the JIT compiler) that are in effect here. Lets take one example of a program
const string A = "Hello ";
const string B = "World";
...
string test = A + B;
First optimization is constant propagation that will change your code basically into this:
string test = "Hello " + "World";
Then a concatenation of literal strings (as they are now, due to the first optimization) optimization will kick in and change it to
string test = "Hello World";
So if you write any variants of the program shown above, the actual IL will be the same (or at least very similar) due to the optimizations done by the C# compiler.
I have the following intentionally trivial function:
void ReplaceSome(ref string text)
{
StringBuilder sb = new StringBuilder(text);
sb[5] = 'a';
text = sb.ToString();
}
It appears to be inefficient to convert this to a StringBuilder to index into and replace some of the characters only to copy it back to the ref'd param. Is it possible to index directly into the text param as an L-Value?
Or how else can I improve this?
C# strings are "immutable," which means that they can't be modified. If you have a string, and you want a similar but different string, you must create a new string. Using a StringBuilder as you do above is probably as easy a method as any.
Armed with Reflector and the decompiled IL - On a pure LOC basis then the StringBuilder approach is definitely the most efficient. Eg tracing the IL calls that StringBuilder makes internally vs the IL calls for String::Remove and String::Insert etc.
I couldn't be bothered testing the memory overhead of each approach, but would imagine it would be in line with reflector results - the StringBuilder approach would be the best.
I think the fact the StringBuilder has a set memory size using the constructor
StringBuilder sb = new StringBuilder(text);
would help overall too.
Like others have mentioned, it would come down to readability vs efficiency...
text = text.Substring(0, 4) + "a" + text.Substring(5);
Not dramatically different than your StringBuilder solution, but slightly more concise than the Remove(), Insert() answer.
I don't know if this is more efficient, but it works. Either way you'll have to recreate the string after each change since they're immutable.
string test = "hello world";
Console.WriteLine(test);
test = test.Remove(5, 1);
test = test.Insert(5, "z");
Console.WriteLine(test);
Or if you want it more concise:
string test = "hello world".Remove(5, 1).Insert(5, "z");