Extra bytes using FileStream (or so it seems) - c#

I am writing some JSON directly to a file, which is then read at a later date. The JSON is then deserialized using newton into an object.
I was initially do the writing like so and it was working fine:
using (var sw = File.CreateText(jsonFile))
{
sw.Write(content);
}
I then ran into a race condition and I was told to go this route instead
using (var fs = new FileStream(jsonFile, FileMode.Open, FileAccess.Write,
FileShare.ReadWrite))
{
using (var sr = new StreamWriter(fs))
{
sr.Write(content);
}
}
But when deserializing the JSON, I got this this message
Newtonsoft.Json.JsonReaderException: Additional text encountered after finished reading JSON content: u. Path ''
When I added encoding to the StreamWriter (UTF8), I got the same message but a different character.
So as a suggestion from a comment I tried this for the streamwriter for the encoding to remove the BOM
var utf8WithoutBom = new System.Text.UTF8Encoding(false);
Then added it to the streamwriter. The same error was returned but this time the character was blank.
Using jsonlint the JSON (content) that was written validates correctly using both methods above.
Does anyone know why the second method (using FileStream) doesn't work? And if so, do you have a suggestion on what I should do instead?

Related

Encode a string to UTF-8 with BOM in C# [duplicate]

I'm having an issue with StreamWriter and Byte Order Marks. The documentation seems to state that the Encoding.UTF8 encoding has byte order marks enabled but when files are being written some have the marks while other don't.
I'm creating the stream writer in the following way:
this.Writer = new StreamWriter(this.Stream, System.Text.Encoding.UTF8);
Any ideas on what could be happening would be appreciated.
As someone pointed that out already, calling without the encoding argument does the trick.
However, if you want to be explicit, try this:
using (var sw = new StreamWriter(this.Stream, new UTF8Encoding(false)))
To disable BOM, the key is to construct with a new UTF8Encoding(false), instead of just Encoding.UTF8Encoding. This is the same as calling StreamWriter without the encoding argument, internally it's just doing the same thing.
To enable BOM, use new UTF8Encoding(true) instead.
Update: Since Windows 10 v1903, when saving as UTF-8 in notepad.exe, BOM byte is now an opt-in feature instead.
The issue is due to the fact that you are using the static UTF8 property on the Encoding class.
When the GetPreamble method is called on the instance of the Encoding class returned by the UTF8 property, it returns the byte order mark (the byte array of three characters) and is written to the stream before any other content is written to the stream (assuming a new stream).
You can avoid this by creating the instance of the UTF8Encoding class yourself, like so:
// As before.
this.Writer = new StreamWriter(this.Stream,
// Create yourself, passing false will prevent the BOM from being written.
new System.Text.UTF8Encoding());
As per the documentation for the default parameterless constructor (emphasis mine):
This constructor creates an instance that does not provide a Unicode byte order mark and does not throw an exception when an invalid encoding is detected.
This means that the call to GetPreamble will return an empty array, and therefore no BOM will be written to the underlying stream.
My answer is based on HelloSam's one which contains all the necessary information.
Only I believe what OP is asking for is how to make sure that BOM is emitted into the file.
So instead of passing false to UTF8Encoding ctor you need to pass true.
using (var sw = new StreamWriter("text.txt", new UTF8Encoding(true)))
Try the code below, open the resulting files in a hex editor and see which one contains BOM and which doesn't.
class Program
{
static void Main(string[] args)
{
const string nobomtxt = "nobom.txt";
File.Delete(nobomtxt);
using (Stream stream = File.OpenWrite(nobomtxt))
using (var writer = new StreamWriter(stream, new UTF8Encoding(false)))
{
writer.WriteLine("HelloПривет");
}
const string bomtxt = "bom.txt";
File.Delete(bomtxt);
using (Stream stream = File.OpenWrite(bomtxt))
using (var writer = new StreamWriter(stream, new UTF8Encoding(true)))
{
writer.WriteLine("HelloПривет");
}
}
The only time I've seen that constructor not add the UTF-8 BOM is if the stream is not at position 0 when you call it. For example, in the code below, the BOM isn't written:
using (var s = File.Create("test2.txt"))
{
s.WriteByte(32);
using (var sw = new StreamWriter(s, Encoding.UTF8))
{
sw.WriteLine("hello, world");
}
}
As others have said, if you're using the StreamWriter(stream) constructor, without specifying the encoding, then you won't see the BOM.
Do you use the same constructor of the StreamWriter for every file? Because the documentation says:
To create a StreamWriter using UTF-8 encoding and a BOM, consider using a constructor that specifies encoding, such as StreamWriter(String, Boolean, Encoding).
I was in a similar situation a while ago. I ended up using the Stream.Write method instead of the StreamWriter and wrote the result of Encoding.GetPreamble() before writing the Encoding.GetBytes(stringToWrite)
I found this answer useful (thanks to #Philipp Grathwohl and #Nik), but in my case I'm using FileStream to accomplish the task, so, the code that generates the BOM goes like this:
using (FileStream vStream = File.Create(pfilePath))
{
// Creates the UTF-8 encoding with parameter "encoderShouldEmitUTF8Identifier" set to true
Encoding vUTF8Encoding = new UTF8Encoding(true);
// Gets the preamble in order to attach the BOM
var vPreambleByte = vUTF8Encoding.GetPreamble();
// Writes the preamble first
vStream.Write(vPreambleByte, 0, vPreambleByte.Length);
// Gets the bytes from text
byte[] vByteData = vUTF8Encoding.GetBytes(pTextToSaveToFile);
vStream.Write(vByteData, 0, vByteData.Length);
vStream.Close();
}
Seems that if the file already existed and didn't contain BOM, then it won't contain BOM when overwritten, in other words StreamWriter preserves BOM (or it's absence) when overwriting a file.
Could you please show a situation where it don't produce it ? The only case where the preamble isn't present that I can find is when nothing is ever written to the writer (Jim Mischel seem to have find an other, logical and more likely to be your problem, see it's answer).
My test code :
var stream = new MemoryStream();
using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8))
{
writer.Write('a');
}
Console.WriteLine(stream.ToArray()
.Select(b => b.ToString("X2"))
.Aggregate((i, a) => i + " " + a)
);
After reading the source code of SteamWriter, you need to make sure you are creating a new file, then the byte order mark will add to the file.
https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L267
Code in Flush method
if (!_haveWrittenPreamble)
{
_haveWrittenPreamble = true;
ReadOnlySpan preamble = _encoding.Preamble;
if (preamble.Length > 0)
{
_stream.Write(preamble);
}
}
https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L129
Code set the value of _haveWrittenPreamble
// If we're appending to a Stream that already has data, don't
write
// the preamble.
if (_stream.CanSeek && _stream.Position > 0)
{
_haveWrittenPreamble = true;
}
using Encoding.Default instead of Encoding.UTF8 solved my problem

Isolated Storage adding caracters at the end of a Stream

I'm having problems converting long into string.
What I'm doing is trying to save a DateTime.Now.Ticks property in isolatedStorage, then retrieve it afterwords. This is what I did to save it:
IsolatedStorageFile appStorage = IsolatedStorageFile.GetUserStoreForApplication();
using (var file = appStorage.CreateFile("appState"))
{
using (var sw = new StreamWriter(file))
{
sw.Write(DateTime.Now.Ticks);
}
}
When I retrieve the file, I do it like this:
if (appStorage.FileExists("appState"))
{
using (var file = appStorage.OpenFile("appState", FileMode.Open))
{
using (StreamReader sr = new StreamReader(file))
{
string s = sr.ReadToEnd();
}
}
appStorage.DeleteFile("appState");
}
Until here I have no problem, but when I try to convert the string I retrieved, the compiler throws a FormatExeption. This are the two ways I tried to do it with:
long time = long.Parse(s);
long time = (long)Convert.ToDouble(s);
So is there any other ways to so this?
EDIT:
The problem is not in the conversion but rather in the StreamWriter adding extra characters.
I suspect you are seeing some other data at the end. Something else may have written other data to the stream.
I think you should use StreamWriter.WriteLine() instead of StreamWriter.Write() to write the data and then call StreamReader.ReadLine() instead of StreamReader.ReadToEnd() to read it back in.

why wont XML read from string? (but will from .txt of same data)

this is the code in question:
using (var file = MemoryMappedFile.OpenExisting("AIDA64_SensorValues"))
{
using (var readerz = file.CreateViewAccessor(0, 0))
{
var bytes = new byte[567];
var encoding = Encoding.ASCII;
readerz.ReadArray<byte>(0, bytes, 0, bytes.Length);
File.WriteAllText("C:\\myFile.txt", encoding.GetString(bytes));
var readerSettings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };
using (var reader = XmlReader.Create("C:\\myFile.txt", readerSettings))
{
This is what myfile.txt looks like:
<sys><id>SCPUCLK</id><label>CPU Clock</label><value>1598</value></sys><sys><id>SCPUFSB</id><label>CPU FSB</label><value>266</value></sys><sys><id>SMEMSPEED</id><label>Memory Speed</label><value>DDR2-667</value></sys><sys><id>SFREEMEM</id><label>Free Memory</label><value>415</value></sys><sys><id>SGPU1CLK</id><label>GPU Clock</label><value>562</value></sys><sys><id>SFREELVMEM</id><label>Free Local Video Memory</label><value>229</value></sys><temp><id>TCPU</id><label>CPU</label><value>42</value></temp><temp><id>TGPU1</id><label>GPU</label><value>58</value></temp>
if i write the data to a txt file on the hard drive with:
File.WriteAllText("C:\\myFile.txt", encoding.GetString(bytes));
then read that same text file with the fragment XmlReader:
XmlReader.Create("C:\\myFile.txt");
it reads it just fine, the program runs and completes like it supposed to, but then if i directly read with the fragment XmlReader like:
XmlReader.Create(encoding.GetString(bytes));
I get exception when run " illegal characters in path" on the XmlReader.Create line.
ive tried writing it to a separate string first and reading that with xmlreader, and it wouldn't help to try to print it to CMD to see what it looks like because CMD wouldnt show the invalid characters im dealing with right?
but oh well i did Console.WriteLine(encoding.GetString(bytes)); and it precisely matched the txt file.
so somehow writing it to the text file is removing some "illegal characters"? what do you guys think?
XmlReader.Create(encoding.GetString(bytes));
XmlReader.Create() interprets your string as the URI where it should read a file from. Instead encapsulate your bytes in a StringReader:
StringReader sr = new StringReader(encoding.GetString(bytes));
XmlReader.Create(sr);
Here:
XmlReader.Create(encoding.GetString(bytes));
you are simply invoking the following method which takes a string representing a filename. However you are passing the actual XML string to it which obviously is an invalid filename.
If you want to load the reader from a buffer you could use a stream:
byte[] bytes = ... represents the XML bytes
using (var stream = new MemoryStream(bytes))
using (var reader = XmlReader.Create(stream))
{
...
}
The method XmlReader.Create() with a single string as argument needs a URI passed and not the XML document as string, please refer to the MSDN. It tries to open a file named "<..." which is an invalid URI. You can pass a Stream instead.
You are passing the xml content in the place where it is expecting a path, as evidenced by the error - illegal characters in path
Use an appropriate overload, and pass a stream - http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.create.aspx

StreamWriter and UTF-8 Byte Order Marks

I'm having an issue with StreamWriter and Byte Order Marks. The documentation seems to state that the Encoding.UTF8 encoding has byte order marks enabled but when files are being written some have the marks while other don't.
I'm creating the stream writer in the following way:
this.Writer = new StreamWriter(this.Stream, System.Text.Encoding.UTF8);
Any ideas on what could be happening would be appreciated.
As someone pointed that out already, calling without the encoding argument does the trick.
However, if you want to be explicit, try this:
using (var sw = new StreamWriter(this.Stream, new UTF8Encoding(false)))
To disable BOM, the key is to construct with a new UTF8Encoding(false), instead of just Encoding.UTF8Encoding. This is the same as calling StreamWriter without the encoding argument, internally it's just doing the same thing.
To enable BOM, use new UTF8Encoding(true) instead.
Update: Since Windows 10 v1903, when saving as UTF-8 in notepad.exe, BOM byte is now an opt-in feature instead.
The issue is due to the fact that you are using the static UTF8 property on the Encoding class.
When the GetPreamble method is called on the instance of the Encoding class returned by the UTF8 property, it returns the byte order mark (the byte array of three characters) and is written to the stream before any other content is written to the stream (assuming a new stream).
You can avoid this by creating the instance of the UTF8Encoding class yourself, like so:
// As before.
this.Writer = new StreamWriter(this.Stream,
// Create yourself, passing false will prevent the BOM from being written.
new System.Text.UTF8Encoding());
As per the documentation for the default parameterless constructor (emphasis mine):
This constructor creates an instance that does not provide a Unicode byte order mark and does not throw an exception when an invalid encoding is detected.
This means that the call to GetPreamble will return an empty array, and therefore no BOM will be written to the underlying stream.
My answer is based on HelloSam's one which contains all the necessary information.
Only I believe what OP is asking for is how to make sure that BOM is emitted into the file.
So instead of passing false to UTF8Encoding ctor you need to pass true.
using (var sw = new StreamWriter("text.txt", new UTF8Encoding(true)))
Try the code below, open the resulting files in a hex editor and see which one contains BOM and which doesn't.
class Program
{
static void Main(string[] args)
{
const string nobomtxt = "nobom.txt";
File.Delete(nobomtxt);
using (Stream stream = File.OpenWrite(nobomtxt))
using (var writer = new StreamWriter(stream, new UTF8Encoding(false)))
{
writer.WriteLine("HelloПривет");
}
const string bomtxt = "bom.txt";
File.Delete(bomtxt);
using (Stream stream = File.OpenWrite(bomtxt))
using (var writer = new StreamWriter(stream, new UTF8Encoding(true)))
{
writer.WriteLine("HelloПривет");
}
}
The only time I've seen that constructor not add the UTF-8 BOM is if the stream is not at position 0 when you call it. For example, in the code below, the BOM isn't written:
using (var s = File.Create("test2.txt"))
{
s.WriteByte(32);
using (var sw = new StreamWriter(s, Encoding.UTF8))
{
sw.WriteLine("hello, world");
}
}
As others have said, if you're using the StreamWriter(stream) constructor, without specifying the encoding, then you won't see the BOM.
Do you use the same constructor of the StreamWriter for every file? Because the documentation says:
To create a StreamWriter using UTF-8 encoding and a BOM, consider using a constructor that specifies encoding, such as StreamWriter(String, Boolean, Encoding).
I was in a similar situation a while ago. I ended up using the Stream.Write method instead of the StreamWriter and wrote the result of Encoding.GetPreamble() before writing the Encoding.GetBytes(stringToWrite)
I found this answer useful (thanks to #Philipp Grathwohl and #Nik), but in my case I'm using FileStream to accomplish the task, so, the code that generates the BOM goes like this:
using (FileStream vStream = File.Create(pfilePath))
{
// Creates the UTF-8 encoding with parameter "encoderShouldEmitUTF8Identifier" set to true
Encoding vUTF8Encoding = new UTF8Encoding(true);
// Gets the preamble in order to attach the BOM
var vPreambleByte = vUTF8Encoding.GetPreamble();
// Writes the preamble first
vStream.Write(vPreambleByte, 0, vPreambleByte.Length);
// Gets the bytes from text
byte[] vByteData = vUTF8Encoding.GetBytes(pTextToSaveToFile);
vStream.Write(vByteData, 0, vByteData.Length);
vStream.Close();
}
Seems that if the file already existed and didn't contain BOM, then it won't contain BOM when overwritten, in other words StreamWriter preserves BOM (or it's absence) when overwriting a file.
Could you please show a situation where it don't produce it ? The only case where the preamble isn't present that I can find is when nothing is ever written to the writer (Jim Mischel seem to have find an other, logical and more likely to be your problem, see it's answer).
My test code :
var stream = new MemoryStream();
using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8))
{
writer.Write('a');
}
Console.WriteLine(stream.ToArray()
.Select(b => b.ToString("X2"))
.Aggregate((i, a) => i + " " + a)
);
After reading the source code of SteamWriter, you need to make sure you are creating a new file, then the byte order mark will add to the file.
https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L267
Code in Flush method
if (!_haveWrittenPreamble)
{
_haveWrittenPreamble = true;
ReadOnlySpan preamble = _encoding.Preamble;
if (preamble.Length > 0)
{
_stream.Write(preamble);
}
}
https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L129
Code set the value of _haveWrittenPreamble
// If we're appending to a Stream that already has data, don't
write
// the preamble.
if (_stream.CanSeek && _stream.Position > 0)
{
_haveWrittenPreamble = true;
}
using Encoding.Default instead of Encoding.UTF8 solved my problem

How to append data to a binary file?

I have a binary file to which I want to append a chunk of data at the end of the file, how can I achieve this using C# and .net? Also is there anything to consider when writing to the end of a binary file? Thanks a lot for your help.
private static void AppendData(string filename, int intData, string stringData, byte[] lotsOfData)
{
using (var fileStream = new FileStream(filename, FileMode.Append, FileAccess.Write, FileShare.None))
using (var bw = new BinaryWriter(fileStream))
{
bw.Write(intData);
bw.Write(stringData);
bw.Write(lotsOfData);
}
}
You should be able to do this via the Stream:
using (FileStream data = new FileStream(path, FileMode.Append))
{
data.Write(...);
}
As for considerations - the main one would be: does the underlying data format support append? Many don't, unless it is your own raw data, or text etc. A well-formed xml document doesn't support append (without considering the final end-element), for example. Nor will something like a Word document. Some do, however. So; is your data OK with this...
Using StreamWriter and referencing DotNetPerls, make sure to add the True boolean to the StreamWriter constructor, if otherwise left blank, it'll overwrite as usual:
using System.IO;
class Program
{
static void Main()
{
// 1: Write single line to new file
using (StreamWriter writer = new StreamWriter("C:\\log.txt", true))
{
writer.WriteLine("Important data line 1");
}
// 2: Append line to the file
using (StreamWriter writer = new StreamWriter("C:\\log.txt", true))
{
writer.WriteLine("Line 2");
}
}
}
Output
(File "log.txt" contains these lines.)
Important data line 1
Line 2
This is the solution that I was actually looking for when I got here from Google, although it wasn't a binary file though, hope it helps someone else.

Categories