We've noticed that UTF8 characters don't come out correctly when using UIDevice.CurrentDevice.Name in MonoTouch.
It comes out as "iPad 2 ??", if you use some of the special characters like holding down the apostrophe key on the iPad keyboard. (Sorry don't know the equivalent to show these characters in windows)
Is there a recommended workaround to get the correct text? We don't mind to convert to UTF8 ourselves. I also tried simulating this from a UITextField and it worked fine--no UTF8 problems.
The reason this is causing problems is we are sending this text off to a web service, and it's causing XML parsing issues.
Here is a snipped of the XmlWriter code (_parser.WriteRequest):
using (XmlWriter xmlWriter = XmlWriter.Create(textWriter, new XmlWriterSettings
{
#if DEBUG
Indent = true,
#else
Indent = false, NewLineHandling = NewLineHandling.None,
#endif
OmitXmlDeclaration = true
}))
{
xmlWriter.WriteStartDocument();
xmlWriter.WriteStartElement("REQUEST");
xmlWriter.WriteAttributeString("TYPE", "EXAMPLE");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndDocument();
}
The TextWriter is passed in from:
public Response MakeRequest(Request request)
{
var httpRequest = CreateRequest(request);
WriteRequest(httpRequest.GetRequestStream(), request);
using (var httpResponse = httpRequest.GetResponse() as HttpWebResponse)
{
using (var responseStream = httpResponse.GetResponseStream())
{
var response = new Response();
ReadResponse(response, responseStream);
return response;
}
}
}
private void WriteRequest(Stream requestStream, Request request)
{
if (request.Type == null)
{
throw new InvalidOperationException("Request Type was null!");
}
if (_logger.Enabled)
{
var builder = new StringBuilder();
using (var writer = new StringWriter(builder, CultureInfo.InvariantCulture))
{
_parser.WriteRequest(writer, request);
}
_logger.Log("REQUEST: " + builder.ToString());
using (requestStream)
{
using (StreamWriter writer = new StreamWriter(requestStream))
{
writer.Write(builder.ToString());
}
}
}
else
{
using (requestStream)
{
using (StreamWriter writer = new StreamWriter(requestStream))
{
_parser.WriteRequest(writer, request);
}
}
}
}
_logger writes to Console.WriteLine, it is enabled in #if DEBUG mode. Request is just a storage class with properties, sorry easy to confuse with HttpWebRequest.
I'm seeing ?? in both XCode's console and MonoDevelop's console. I'm also assuming the server is receiving them strangely as well, as I get an error. Using UITextField.Text with the same strange characters instead of the device description works fine with no issues. It makes me think the device description is the culprit.
EDIT: this fixed it -
Encoding.UTF8.GetString (Encoding.ASCII.GetBytes(UIDevice.CurrentDevice.Name));
Okay, I think I know the problem. You're creating a StringWriter, which always reports its encoding as UTF-16 (unless you override the Encoding property). You're then taking the string from that StringWriter (which will start with <?xml version="1.0" encoding="UTF-16" ?>) and writing it to a StreamWriter which will default to UTF-8. That mixture of encodings is causing the problem.
The simplest approach would be to change your code to pass a Stream directly to the XmlWriter - a MemoryStream if you really want, or just requestStream. That way the XmlWriter can declare that it's using the exact encoding that it's actually writing the binary data in - you haven't got an intermediate step to mess things up.
Alternatively, you could create a subclass of StringWriter which allows you to specify the encoding. See this answer for some sample code.
MonoTouch simply calls NSString.FromHandle on the value it receive from the call on UIDevice.CurrentDevice.Name. That just like most string are created from NSString inside all bindings.
That should get you a string that you can see it MonoDevelop (no ?) so I can't rule out a bug.
Can you tell us exactly how the device is named ? if so then please open a bug report and we'll check this possibility.
Related
I am writing some JSON directly to a file, which is then read at a later date. The JSON is then deserialized using newton into an object.
I was initially do the writing like so and it was working fine:
using (var sw = File.CreateText(jsonFile))
{
sw.Write(content);
}
I then ran into a race condition and I was told to go this route instead
using (var fs = new FileStream(jsonFile, FileMode.Open, FileAccess.Write,
FileShare.ReadWrite))
{
using (var sr = new StreamWriter(fs))
{
sr.Write(content);
}
}
But when deserializing the JSON, I got this this message
Newtonsoft.Json.JsonReaderException: Additional text encountered after finished reading JSON content: u. Path ''
When I added encoding to the StreamWriter (UTF8), I got the same message but a different character.
So as a suggestion from a comment I tried this for the streamwriter for the encoding to remove the BOM
var utf8WithoutBom = new System.Text.UTF8Encoding(false);
Then added it to the streamwriter. The same error was returned but this time the character was blank.
Using jsonlint the JSON (content) that was written validates correctly using both methods above.
Does anyone know why the second method (using FileStream) doesn't work? And if so, do you have a suggestion on what I should do instead?
I'm reading the content from a page using DownloadString from the WebClient class and then writing the contents of that to a static HTML file using the StreamWriter class. On the page that I'm reading in, there's an inline javascript method that just sets an anchor element's OnClick attribute to set the window.location = history.go(-1); I'm finding when I view the static HTML page, there's an odd looking letter showing up that isn't present on the dynamic web page.
WebClient & SteamWriter Code
using (var client = new WebClient())
{
var html = client.DownloadString(url);
//This constructor prepares a StreamWriter (UTF-8) to write to the specified file or will create it if it doesn't already exist
using (var stream = new StreamWriter(file, false, Encoding.UTF8))
{
stream.Write(html);
stream.Close();
}
}
The dynamic page's HTML snippet in question
<span>Sorry, but something went wrong on our end. Click here to go back to the previous page.</span>
The static page's HTML snippet
<span>Sorry, but something went wrong on our end. Â Click here to go back to the previous page.</span>
I was thinking that adding the Encoding.UTF8 parameter would solve this issue but it didn't seem to help. Is there some sort of extra encoding or decoding that I need to do? Or did I completely miss something else that's needed for this type of operation?
I updated the WebClient to encode in UTF8 as it converts the resource into a string, seems to have taken care of the issue.
using (var client = new WebClient())
{
client.Encoding = System.Text.Encoding.UTF8;
var html = client.DownloadString(url);
//This constructor prepares a StreamWriter (UTF-8) to write to the specified file or will create it if it doesn't already exist
using (var stream = new StreamWriter(file, false, Encoding.UTF8))
{
stream.Write(html);
stream.Close();
}
}
I'm having an issue with StreamWriter and Byte Order Marks. The documentation seems to state that the Encoding.UTF8 encoding has byte order marks enabled but when files are being written some have the marks while other don't.
I'm creating the stream writer in the following way:
this.Writer = new StreamWriter(this.Stream, System.Text.Encoding.UTF8);
Any ideas on what could be happening would be appreciated.
As someone pointed that out already, calling without the encoding argument does the trick.
However, if you want to be explicit, try this:
using (var sw = new StreamWriter(this.Stream, new UTF8Encoding(false)))
To disable BOM, the key is to construct with a new UTF8Encoding(false), instead of just Encoding.UTF8Encoding. This is the same as calling StreamWriter without the encoding argument, internally it's just doing the same thing.
To enable BOM, use new UTF8Encoding(true) instead.
Update: Since Windows 10 v1903, when saving as UTF-8 in notepad.exe, BOM byte is now an opt-in feature instead.
The issue is due to the fact that you are using the static UTF8 property on the Encoding class.
When the GetPreamble method is called on the instance of the Encoding class returned by the UTF8 property, it returns the byte order mark (the byte array of three characters) and is written to the stream before any other content is written to the stream (assuming a new stream).
You can avoid this by creating the instance of the UTF8Encoding class yourself, like so:
// As before.
this.Writer = new StreamWriter(this.Stream,
// Create yourself, passing false will prevent the BOM from being written.
new System.Text.UTF8Encoding());
As per the documentation for the default parameterless constructor (emphasis mine):
This constructor creates an instance that does not provide a Unicode byte order mark and does not throw an exception when an invalid encoding is detected.
This means that the call to GetPreamble will return an empty array, and therefore no BOM will be written to the underlying stream.
My answer is based on HelloSam's one which contains all the necessary information.
Only I believe what OP is asking for is how to make sure that BOM is emitted into the file.
So instead of passing false to UTF8Encoding ctor you need to pass true.
using (var sw = new StreamWriter("text.txt", new UTF8Encoding(true)))
Try the code below, open the resulting files in a hex editor and see which one contains BOM and which doesn't.
class Program
{
static void Main(string[] args)
{
const string nobomtxt = "nobom.txt";
File.Delete(nobomtxt);
using (Stream stream = File.OpenWrite(nobomtxt))
using (var writer = new StreamWriter(stream, new UTF8Encoding(false)))
{
writer.WriteLine("HelloПривет");
}
const string bomtxt = "bom.txt";
File.Delete(bomtxt);
using (Stream stream = File.OpenWrite(bomtxt))
using (var writer = new StreamWriter(stream, new UTF8Encoding(true)))
{
writer.WriteLine("HelloПривет");
}
}
The only time I've seen that constructor not add the UTF-8 BOM is if the stream is not at position 0 when you call it. For example, in the code below, the BOM isn't written:
using (var s = File.Create("test2.txt"))
{
s.WriteByte(32);
using (var sw = new StreamWriter(s, Encoding.UTF8))
{
sw.WriteLine("hello, world");
}
}
As others have said, if you're using the StreamWriter(stream) constructor, without specifying the encoding, then you won't see the BOM.
Do you use the same constructor of the StreamWriter for every file? Because the documentation says:
To create a StreamWriter using UTF-8 encoding and a BOM, consider using a constructor that specifies encoding, such as StreamWriter(String, Boolean, Encoding).
I was in a similar situation a while ago. I ended up using the Stream.Write method instead of the StreamWriter and wrote the result of Encoding.GetPreamble() before writing the Encoding.GetBytes(stringToWrite)
I found this answer useful (thanks to #Philipp Grathwohl and #Nik), but in my case I'm using FileStream to accomplish the task, so, the code that generates the BOM goes like this:
using (FileStream vStream = File.Create(pfilePath))
{
// Creates the UTF-8 encoding with parameter "encoderShouldEmitUTF8Identifier" set to true
Encoding vUTF8Encoding = new UTF8Encoding(true);
// Gets the preamble in order to attach the BOM
var vPreambleByte = vUTF8Encoding.GetPreamble();
// Writes the preamble first
vStream.Write(vPreambleByte, 0, vPreambleByte.Length);
// Gets the bytes from text
byte[] vByteData = vUTF8Encoding.GetBytes(pTextToSaveToFile);
vStream.Write(vByteData, 0, vByteData.Length);
vStream.Close();
}
Seems that if the file already existed and didn't contain BOM, then it won't contain BOM when overwritten, in other words StreamWriter preserves BOM (or it's absence) when overwriting a file.
Could you please show a situation where it don't produce it ? The only case where the preamble isn't present that I can find is when nothing is ever written to the writer (Jim Mischel seem to have find an other, logical and more likely to be your problem, see it's answer).
My test code :
var stream = new MemoryStream();
using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8))
{
writer.Write('a');
}
Console.WriteLine(stream.ToArray()
.Select(b => b.ToString("X2"))
.Aggregate((i, a) => i + " " + a)
);
After reading the source code of SteamWriter, you need to make sure you are creating a new file, then the byte order mark will add to the file.
https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L267
Code in Flush method
if (!_haveWrittenPreamble)
{
_haveWrittenPreamble = true;
ReadOnlySpan preamble = _encoding.Preamble;
if (preamble.Length > 0)
{
_stream.Write(preamble);
}
}
https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L129
Code set the value of _haveWrittenPreamble
// If we're appending to a Stream that already has data, don't
write
// the preamble.
if (_stream.CanSeek && _stream.Position > 0)
{
_haveWrittenPreamble = true;
}
using Encoding.Default instead of Encoding.UTF8 solved my problem
I'm having an issue with StreamWriter and Byte Order Marks. The documentation seems to state that the Encoding.UTF8 encoding has byte order marks enabled but when files are being written some have the marks while other don't.
I'm creating the stream writer in the following way:
this.Writer = new StreamWriter(this.Stream, System.Text.Encoding.UTF8);
Any ideas on what could be happening would be appreciated.
As someone pointed that out already, calling without the encoding argument does the trick.
However, if you want to be explicit, try this:
using (var sw = new StreamWriter(this.Stream, new UTF8Encoding(false)))
To disable BOM, the key is to construct with a new UTF8Encoding(false), instead of just Encoding.UTF8Encoding. This is the same as calling StreamWriter without the encoding argument, internally it's just doing the same thing.
To enable BOM, use new UTF8Encoding(true) instead.
Update: Since Windows 10 v1903, when saving as UTF-8 in notepad.exe, BOM byte is now an opt-in feature instead.
The issue is due to the fact that you are using the static UTF8 property on the Encoding class.
When the GetPreamble method is called on the instance of the Encoding class returned by the UTF8 property, it returns the byte order mark (the byte array of three characters) and is written to the stream before any other content is written to the stream (assuming a new stream).
You can avoid this by creating the instance of the UTF8Encoding class yourself, like so:
// As before.
this.Writer = new StreamWriter(this.Stream,
// Create yourself, passing false will prevent the BOM from being written.
new System.Text.UTF8Encoding());
As per the documentation for the default parameterless constructor (emphasis mine):
This constructor creates an instance that does not provide a Unicode byte order mark and does not throw an exception when an invalid encoding is detected.
This means that the call to GetPreamble will return an empty array, and therefore no BOM will be written to the underlying stream.
My answer is based on HelloSam's one which contains all the necessary information.
Only I believe what OP is asking for is how to make sure that BOM is emitted into the file.
So instead of passing false to UTF8Encoding ctor you need to pass true.
using (var sw = new StreamWriter("text.txt", new UTF8Encoding(true)))
Try the code below, open the resulting files in a hex editor and see which one contains BOM and which doesn't.
class Program
{
static void Main(string[] args)
{
const string nobomtxt = "nobom.txt";
File.Delete(nobomtxt);
using (Stream stream = File.OpenWrite(nobomtxt))
using (var writer = new StreamWriter(stream, new UTF8Encoding(false)))
{
writer.WriteLine("HelloПривет");
}
const string bomtxt = "bom.txt";
File.Delete(bomtxt);
using (Stream stream = File.OpenWrite(bomtxt))
using (var writer = new StreamWriter(stream, new UTF8Encoding(true)))
{
writer.WriteLine("HelloПривет");
}
}
The only time I've seen that constructor not add the UTF-8 BOM is if the stream is not at position 0 when you call it. For example, in the code below, the BOM isn't written:
using (var s = File.Create("test2.txt"))
{
s.WriteByte(32);
using (var sw = new StreamWriter(s, Encoding.UTF8))
{
sw.WriteLine("hello, world");
}
}
As others have said, if you're using the StreamWriter(stream) constructor, without specifying the encoding, then you won't see the BOM.
Do you use the same constructor of the StreamWriter for every file? Because the documentation says:
To create a StreamWriter using UTF-8 encoding and a BOM, consider using a constructor that specifies encoding, such as StreamWriter(String, Boolean, Encoding).
I was in a similar situation a while ago. I ended up using the Stream.Write method instead of the StreamWriter and wrote the result of Encoding.GetPreamble() before writing the Encoding.GetBytes(stringToWrite)
I found this answer useful (thanks to #Philipp Grathwohl and #Nik), but in my case I'm using FileStream to accomplish the task, so, the code that generates the BOM goes like this:
using (FileStream vStream = File.Create(pfilePath))
{
// Creates the UTF-8 encoding with parameter "encoderShouldEmitUTF8Identifier" set to true
Encoding vUTF8Encoding = new UTF8Encoding(true);
// Gets the preamble in order to attach the BOM
var vPreambleByte = vUTF8Encoding.GetPreamble();
// Writes the preamble first
vStream.Write(vPreambleByte, 0, vPreambleByte.Length);
// Gets the bytes from text
byte[] vByteData = vUTF8Encoding.GetBytes(pTextToSaveToFile);
vStream.Write(vByteData, 0, vByteData.Length);
vStream.Close();
}
Seems that if the file already existed and didn't contain BOM, then it won't contain BOM when overwritten, in other words StreamWriter preserves BOM (or it's absence) when overwriting a file.
Could you please show a situation where it don't produce it ? The only case where the preamble isn't present that I can find is when nothing is ever written to the writer (Jim Mischel seem to have find an other, logical and more likely to be your problem, see it's answer).
My test code :
var stream = new MemoryStream();
using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8))
{
writer.Write('a');
}
Console.WriteLine(stream.ToArray()
.Select(b => b.ToString("X2"))
.Aggregate((i, a) => i + " " + a)
);
After reading the source code of SteamWriter, you need to make sure you are creating a new file, then the byte order mark will add to the file.
https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L267
Code in Flush method
if (!_haveWrittenPreamble)
{
_haveWrittenPreamble = true;
ReadOnlySpan preamble = _encoding.Preamble;
if (preamble.Length > 0)
{
_stream.Write(preamble);
}
}
https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L129
Code set the value of _haveWrittenPreamble
// If we're appending to a Stream that already has data, don't
write
// the preamble.
if (_stream.CanSeek && _stream.Position > 0)
{
_haveWrittenPreamble = true;
}
using Encoding.Default instead of Encoding.UTF8 solved my problem
It appears that JSON.NET is writing invalid JSON, although I wouldn't be surprised if it was due to my misuse.
It appears that it is repeating the last few characters of JSON:
/* ... */ "Teaser":"\nfoo.\n","Title":"bar","ImageSrc":null,"Nid":44462,"Vid":17}]}4462,"Vid":17}]}
The repeating string is:
4462,"Vid":17}]}
I printed it out to the console, so I don't think this is a bug in Visual Studio's text visualizer.
The serialization code:
static IDictionary<int, ObservableCollection<Story>> _sectionStories;
private static void writeToFile()
{
IsolatedStorageFile storage = IsolatedStorageFile.GetUserStoreForApplication();
using (IsolatedStorageFileStream stream = storage.OpenFile(STORIES_FILE, FileMode.OpenOrCreate))
{
using (StreamWriter writer = new StreamWriter(stream))
{
writer.Write(JsonConvert.SerializeObject(_sectionStories));
}
}
#if DEBUG
StreamReader reader = new StreamReader(storage.OpenFile(STORIES_FILE, FileMode.Open));
string contents = reader.ReadToEnd();
JObject data = JObject.Parse(contents);
string result = "";
foreach (char c in contents.Skip(contents.Length - 20))
{
result += c;
}
Debug.WriteLine(result);
// crashes here with ArgumentException
// perhaps because JSON is invalid?
var foo = JsonConvert.DeserializeObject<Dictionary<int, List<Story>>>(contents);
#endif
}
Am I doing something wrong here? Or is this a bug? Are there any known workarounds?
Curiously, JObject.Parse() doesn't throw any errors.
I'm building a Silverlight app for Windows Phone 7.
When writing the file you specify
FileMode.OpenOrCreate
If the file exists and is 16 bytes longer than the data you intend to write to it (from an older version of your data that just happens to end with the exact same data) then that data will still be present when you're done writing your new data.
Solution:
FileMode.Create
From:
http://msdn.microsoft.com/en-us/library/system.io.filemode.aspx
FileMode.Create: Specifies that the operating system should create a new file. If the file already exists, it will be overwritten