Why Empty Text File Contains 3 bytes? - c#

I'm using a text file inside my C# project in vs2010. I added to solution and set its "Copy Output" to "Copy Always". When I use the following codes, it gives me the text result with leading three bytes or in utf8 one byte. I looked at windows explorers file properties, its size appears 3 bytes.
public static string ReadFile(string fileName)
{
FileStream fs = null;
try
{
fs = new FileStream(fileName, FileMode.Open);
FileInfo fi = new FileInfo(fileName);
byte[] data = new byte[fi.Length];
fs.Read(data, 0, data.Length);
fs.Close();
fs.Dispose();
string text = Encoding.ASCII.GetString(data);
return text;
}
catch (Exception)
{
if(fs != null)
{
fs.Close();
fs.Dispose();
}
return string.Empty;
}
}
Why is this like above? How can I read text files without StreamReader class?
Any helps, codes wil be very appreciated.

So, those three bytes you are seeing are the byte order marker for the unicode file I am guessing. For UTF-8, it is three bytes.
You can avoid those by saving the file using UTF-8 without signature.

Related

Writing the Assembly version to a string and saving this into a textfile C#

Hi I am trying to write the assembly version to a textfile for use with an autoupdater program.
This is what I have for getting the current assembly version to a string.
// Get assembly info to string
string assemblyVersion = AssemblyName.GetAssemblyName("MainApplication.exe").Version.ToString();
This is then being written to a textfile using filestream.
private void SaveVersion()
{
// creating filestream that can write a file
FileStream fs = new FileStream("Version.txt", FileMode.Create, FileAccess.Write);
// if we don't have permission to write we exit function
if (!fs.CanWrite)
return;
byte[] buffer = Encoding.ASCII.GetBytes(assemblyVersion);
// writing whole buffer array
fs.Write(buffer, 0, buffer.Length);
// closing filestream
fs.Flush();
fs.Close();
}
However for some reason the Version.txt file is never being populated.
What am I missing here. Thanks.
You can do this with a single line of code
System.IO.File.WriteAllText("Version.txt", assemblyVersion);

how to append data to a file [duplicate]

I would like to append a byte array to an already existing file (C:\test.exe). Assume the following byte array:
byte[] appendMe = new byte[ 1000 ] ;
File.AppendAllBytes(#"C:\test.exe", appendMe); // Something like this - Yes, I know this method does not really exist.
I would do this using File.WriteAllBytes, but I am going to be using an ENORMOUS byte array, and System.MemoryOverload exception is constantly being thrown. So, I will most likely have to split the large array up into pieces and append each byte array to the end of the file.
Thank you,
Evan
One way would be to create a FileStream with the FileMode.Append creation mode.
Opens the file if it exists and seeks to the end of the file, or
creates a new file.
This would look something like:
public static void AppendAllBytes(string path, byte[] bytes)
{
//argument-checking here.
using (var stream = new FileStream(path, FileMode.Append))
{
stream.Write(bytes, 0, bytes.Length);
}
}
Create a new FileStream.
Seek() to the end.
Write() the bytes.
Close() the stream.
You can also use the built-in FileSystem.WriteAllBytes Method (String, Byte[], Boolean).
public static void WriteAllBytes(
string file,
byte[] data,
bool append
)
Set append to True to append to the file contents; False to overwrite the file contents. Default is False.
I'm not exactly sure what the question is, but C# has a BinaryWriter method that takes an array of bytes.
BinaryWriter(Byte[])
bool writeFinished = false;
string fileName = "C:\\test.exe";
FileStream fs = new FileString(fileName);
BinaryWriter bw = new BinaryWriter(fs);
int pos = fs.Length;
while(!writeFinished)
{
byte[] data = GetData();
bw.Write(data, pos, data.Length);
pos += data.Length;
}
Where writeFinished is true when all the data has been appended, and GetData() returns an array of data to be appended.
you can simply create a function to do this
public static void AppendToFile(string fileToWrite, byte[] DT)
{
using (FileStream FS = new FileStream(fileToWrite, File.Exists(fileToWrite) ? FileMode.Append : FileMode.OpenOrCreate, FileAccess.Write)) {
FS.Write(DT, 0, DT.Length);
FS.Close();
}
}

Zip file with utf-8 file names

In my website i have option to download all images uploaded by users. The problem is in images with hebrew names (i need original name of file). I tried to decode file names but this is not helping. Here is a code :
using ICSharpCode.SharpZipLib.Zip;
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(file.Name);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
string name = iso.GetString(isoBytes);
var entry = new ZipEntry(name + ".jpg");
zipStream.PutNextEntry(entry);
using (var reader = new System.IO.FileStream(file.Name, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
byte[] buffer = new byte[ChunkSize];
int bytesRead;
while ((bytesRead = reader.Read(buffer, 0, buffer.Length)) > 0)
{
byte[] actual = new byte[bytesRead];
Buffer.BlockCopy(buffer, 0, actual, 0, bytesRead);
zipStream.Write(actual, 0, actual.Length);
}
}
After utf-8 encoding i get hebrew file names like this : ??????.jpg
Where is my mistake?
Unicode (UTF-8 is one of the binary encoding) can represent more characters than the other 8-bit encoding. Moreover, you are not doing a proper conversion but a re-interpretation, which means that you get garbage for your filenames. You should really read the article from Joel on Unicode.
...
Now that you've read the article, you should know that in C# string can store unicode data, so you probably don't need to do any conversion of file.Name and can pass this directly to ZipEntry constructor if the library does not contains encoding handling bugs (this is always possible).
Try using
ZipStrings.UseUnicode = true;
It should be a part of the ICSharpCode.SharpZipLib.Zip namespace.
After that you can use something like
var newZipEntry = new ZipEntry($"My ünicödë string.pdf");
and add the entry as normal to the stream. You shouldn't need to do any conversion of the string before that in C#.
You are doing wrong conversion, since strings in C# are already unicode.
What tools do you use to check file names in archive?
By default Windows ZIP implementations use system DOS encoding for file names, while other implementations can use other encoding.

string serialization and deserialization problem

I'm trying to serialize/deserialize string. Using the code:
private byte[] StrToBytes(string str)
{
BinaryFormatter bf = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
bf.Serialize(ms, str);
ms.Seek(0, 0);
return ms.ToArray();
}
private string BytesToStr(byte[] bytes)
{
BinaryFormatter bfx = new BinaryFormatter();
MemoryStream msx = new MemoryStream();
msx.Write(bytes, 0, bytes.Length);
msx.Seek(0, 0);
return Convert.ToString(bfx.Deserialize(msx));
}
This two code works fine if I play with string variables.
But If I deserialize a string and save it to a file, after reading the back and serializing it again, I end up with only first portion of the string.
So I believe I have a problem with my file save/read operation. Here is the code for my save/read
private byte[] ReadWhole(string fileName)
{
try
{
using (BinaryReader br = new BinaryReader(new FileStream(fileName, FileMode.Open)))
{
return br.ReadBytes((int)br.BaseStream.Length);
}
}
catch (Exception)
{
return null;
}
}
private void WriteWhole(byte[] wrt,string fileName,bool append)
{
FileMode fm = FileMode.OpenOrCreate;
if (append)
fm = FileMode.Append;
using (BinaryWriter bw = new BinaryWriter(new FileStream(fileName, fm)))
{
bw.Write(wrt);
}
return;
}
Any help will be appreciated.
Many thanks
Sample Problematic Run:
WriteWhole(StrToBytes("First portion of text"),"filename",true);
WriteWhole(StrToBytes("Second portion of text"),"filename",true);
byte[] readBytes = ReadWhole("filename");
string deserializedStr = BytesToStr(readBytes); // here deserializeddStr becomes "First portion of text"
Just use
Encoding.UTF8.GetBytes(string s)
Encoding.UTF8.GetString(byte[] b)
and don't forget to add System.Text in your using statements
BTW, why do you need to serialize a string and save it that way?
You can just use File.WriteAllText() or File.WriteAllBytes. The same way you can read it back, File.ReadAllBytes() and File.ReadAllText()
The problem is that you are writing two strings to the file, but only reading one back.
If you want to read back multiple strings, then you must deserialize multiple strings. If there are always two strings, then you can just deserialize two strings. If you want to store any number of strings, then you must first store how many strings there are, so that you can control the deserialization process.
If you are trying to hide data (as indicated by your comment to another answer), then this is not a reliable way to accomplish that goal. On the other hand, if you are storing data an a user's hard-drive, and the user is running your program on their local machine, then there is no way to hide the data from them, so this is as good as anything else.

Why does text from Assembly.GetManifestResourceStream() start with three junk characters?

I have a SQL file added to my VS.NET 2008 project as an embedded resource. Whenever I use the following code to read the file's content, the string returned always starts with three junk characters and then the text I expect. I assume this has something to do with the Encoding.Default I am using, but that is just a guess. Why does this text keep showing up? Should I just trim off the first three characters or is there a more informed approach?
public string GetUpdateRestoreSchemaScript()
{
var type = GetType();
var a = Assembly.GetAssembly(type);
var script = "UpdateRestoreSchema.sql";
var resourceName = String.Concat(type.Namespace, ".", script);
using(Stream stream = a.GetManifestResourceStream(resourceName))
{
byte[] buffer = new byte[stream.Length];
stream.Read(buffer, 0, buffer.Length);
// UPDATE: Should be Encoding.UTF8
return Encoding.Default.GetString(buffer);
}
}
Update:
I now know that my code works as expected if I simply change the last line to return a UTF-8 encoded string. It will always be true for this embedded file, but will it always be true? Is there a way to test any buffer to determine its encoding?
Probably the file is in utf-8 encoding and Encoding.Default is ASCII. Why don't you use a specific encoding?
Edit to answer a comment:
In order to guess the file encoding you could look for BOM at the start of the stream. If it exists, it helps, if not then you can only guess or ask user.
if you try to load xml from assembly you actually need to inspect and skip the byte order mark bytes (drove me nuts):
....
byte[] data;
using (var stream = assembly.GetManifestResourceStream(filename))
{
var length = stream.Length;
data = new byte[length];
stream.Read(data, 0, (int) length);
}
if (!HasUtf8ByteOrderMark(data))
{
throw new InvalidOperationException("Expected UTF8 byte order mark EF BB BF");
}
return Encoding.UTF8.GetChars(data.Skip(3).ToArray());
And
static bool HasUtf8ByteOrderMark(byte[] data)
{
var bom = new byte[] { 0xEF, 0xBB, 0xBF };
return data[0] == bom[0] && data[1] == bom[1] && data[2] == bom[2];
}
More information here
I had the same problem in net.core
You can let streamreader do the encoding
using (var stream = = a.GetManifestResourceStream(resourceName))
using (var reader = new StreamReader(stream))
return reader.ReadToEnd();

Categories