I have a string of XML(utf-8).I need to store the string in the database(MS SQL). Encoding a string must be UTF-16.
This code does not work, utf16Xml is empty
XDocument xDoc = XDocument.Parse(utf8Xml);
xDoc.Declaration.Encoding = "utf-16";
StringWriter writer = new StringWriter();
XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings()
{ Encoding = writer.Encoding, Indent = true });
xDoc.WriteTo(xml);
string utf16Xml = writer.ToString();
utf8Xml - string contains a serialize object(encoding UTF8).
How convert xml string UTF8 to UTF16?
This might help you
MemoryStream ms = new MemoryStream();
XmlWriterSettings xws = new XmlWriterSettings();
xws.OmitXmlDeclaration = true;
xws.Indent = true;
XDocument xDoc = XDocument.Parse(utf8Xml);
xDoc.Declaration.Encoding = "utf-16";
using (XmlWriter xw = XmlWriter.Create(ms, xws))
{
xDoc.WriteTo(xw);
}
Encoding ut8 = Encoding.UTF8;
Encoding ut116 = Encoding.Unicode;
byte[] utf16XmlArray = Encoding.Convert(ut8, ut116, ms.ToArray());
var utf16Xml = Encoding.Unicode.GetString(utf16XmlArray);
Given that XDocument.Parse only accepts a string, and that string in .NET is always UTF-16 Little Endian, it looks like you are going through a lot of steps to effectively do nothing. Either:
The string – utf8Xml – is already UTF-16 LE and can be inserted into SQL Server as is (i.e. do nothing) as SqlDbType.Xml or SqlDbType.NVarChar,
or
utf8Xml somehow contains UTF-8 byte sequences, which would be invalid UTF-16 LE (i.e. "Unicode" in Microsoft-land) byte sequences. If this is the case, then you might be able to simply:
add the XML Declaration, stating that the encoding is UTF-8:
xDoc.Declaration.Encoding = "utf-8";
do not omit the XML declaration:
OmitXmlDeclaration = false;
pass utf8Xml into SQL Server as DbType.VarChar
For further explanation, please see my answer to the related question (here on S.O.):
How to solve “unable to switch the encoding” error when inserting XML into SQL Server
Related
I am trying to return an XML string as a CLOB from Oracle stored procedure to C# string.
Then I am write this string to a file using XmlWriter class.
My code looks like following:
string myString= ((Oracle.ManagedDataAccess.Types.OracleClob)(cmd.Parameters["paramName"].Value)).Value.ToString();
string fileName = DateTime.Now.ToString("yyyyMMddHHmmss");
var stream = new MemoryStream();
var writer = XmlWriter.Create(stream);
writer.WriteRaw(myString);
stream.Position = 0;
var fileStreamResult = File(stream, "application/octet-stream", "ABCD"+fileName+".xml");
return fileStreamResult;
When I checked my CLOB output it returns completely to myString.
When I check my end result, XML file is trimmed at the end.
My string will be huge for ex: Length of 3382563 and more.
Is there any setting for XmlWriter to write the complete string to file.
Thanks in advance.
Sounds like all you want to do is grab some string value out of your Database, and write that string value in a text file. The string being xml does not actually force you into using an XML specific class or method unless you want to do XML specific operations, which I do not see in your snippet. Therefore, I suggest you simply grab the string value and spit it out in a file in the easiest way.
string myString = " blah blah blah keep my spaces ";
using (StreamWriter sw = new StreamWriter(#"M:\StackOverflowQuestionsAndAnswers\XMLWriterTrimmingString_45380476\bin\Debug\outputfile.xml"))
{
sw.Write(myString);
}
I have string of XML .
how can I change the header from:
string xml = "<?xml version='1.0' encoding='ISO-8859-8'?>";
to
string xml = "<?xml version='1.0' encoding='UTF-8'?>";
using c#?
UPDATE
I tryed to get the xml to User object
XmlSerializer serializer = new XmlSerializer(typeof(User));
MemoryStream memStream = new MemoryStream(Encoding.UTF8.GetBytes(xml));
User user = (User)serializer.Deserialize(memStream);
but in the User object I get the string not encoding well.
because of the encoding of the Xml I need to change the encoding.
Instead of Encoding.UTF8.GetBytes use Encoding.GetEncoding("ISO-8859-8").GetBytes.
If the XML is stored in a string variable and you need to only replace the value in the encoding attribute, then you can perform a replace as following:
const string searchEncoding = "ISO-8859-8";
const string newEncoding = "UTF-8";
string xml = #"<?xml version='1.0' encoding='ISO-8859-8'?><abc></abc>";
int encodingPos = xml.IndexOf(searchEncoding);
if (encodingPos==30)
{
xml = xml.Substring(0, encodingPos) + newEncoding + xml.Substring(encodingPos + searchEncoding.Length);
}
However, a different process is necessary if the XML is stored in another datatype and/or you need to re-encode the XML content.
In .NET, I'm trying to use Encoding.UTF8.GetString method, which takes a byte array and converts it to a string.
It looks like this method ignores the BOM (Byte Order Mark), which might be a part of a legitimate binary representation of a UTF8 string, and takes it as a character.
I know I can use a TextReader to digest the BOM as needed, but I thought that the GetString method should be some kind of a macro that makes our code shorter.
Am I missing something? Is this like so intentionally?
Here's a reproduction code:
static void Main(string[] args)
{
string s1 = "abc";
byte[] abcWithBom;
using (var ms = new MemoryStream())
using (var sw = new StreamWriter(ms, new UTF8Encoding(true)))
{
sw.Write(s1);
sw.Flush();
abcWithBom = ms.ToArray();
Console.WriteLine(FormatArray(abcWithBom)); // ef, bb, bf, 61, 62, 63
}
byte[] abcWithoutBom;
using (var ms = new MemoryStream())
using (var sw = new StreamWriter(ms, new UTF8Encoding(false)))
{
sw.Write(s1);
sw.Flush();
abcWithoutBom = ms.ToArray();
Console.WriteLine(FormatArray(abcWithoutBom)); // 61, 62, 63
}
var restore1 = Encoding.UTF8.GetString(abcWithoutBom);
Console.WriteLine(restore1.Length); // 3
Console.WriteLine(restore1); // abc
var restore2 = Encoding.UTF8.GetString(abcWithBom);
Console.WriteLine(restore2.Length); // 4 (!)
Console.WriteLine(restore2); // ?abc
}
private static string FormatArray(byte[] bytes1)
{
return string.Join(", ", from b in bytes1 select b.ToString("x"));
}
It looks like this method ignores the BOM (Byte Order Mark), which might be a part of a legitimate binary representation of a UTF8 string, and takes it as a character.
It doesn't look like it "ignores" it at all - it faithfully converts it to the BOM character. That's what it is, after all.
If you want to make your code ignore the BOM in any string it converts, that's up to you to do... or use StreamReader.
Note that if you either use Encoding.GetBytes followed by Encoding.GetString or use StreamWriter followed by StreamReader, both forms will either produce then swallow or not produce the BOM. It's only when you mix using a StreamWriter (which uses Encoding.GetPreamble) with a direct Encoding.GetString call that you end up with the "extra" character.
Based on the answer by Jon Skeet (thanks!), this is how I just did it:
var memoryStream = new MemoryStream(byteArray);
var s = new StreamReader(memoryStream).ReadToEnd();
Note that this will probably only work reliably if there is a BOM in the byte array you are reading from. If not, you might want to look into another StreamReader constructor overload which takes an Encoding parameter so you can tell it what the byte array contains.
for those who do not want to use streams I found a quite simple solution using Linq:
public static string GetStringExcludeBOMPreamble(this Encoding encoding, byte[] bytes)
{
var preamble = encoding.GetPreamble();
if (preamble?.Length > 0 && bytes.Length >= preamble.Length && bytes.Take(preamble.Length).SequenceEqual(preamble))
{
return encoding.GetString(bytes, preamble.Length, bytes.Length - preamble.Length);
}
else
{
return encoding.GetString(bytes);
}
}
I know I am kind of late to the party but here's the code I am using (feel free to adapt to C#) if you need:
Public Function Serialize(Of YourXMLClass)(ByVal obj As YourXMLClass,
Optional ByVal omitXMLDeclaration As Boolean = True,
Optional ByVal omitXMLNamespace As Boolean = True) As String
Dim serializer As New XmlSerializer(obj.GetType)
Using memStream As New MemoryStream()
Dim settings As New XmlWriterSettings() With {
.Encoding = Encoding.UTF8,
.Indent = True,
.omitXMLDeclaration = omitXMLDeclaration}
Using writer As XmlWriter = XmlWriter.Create(memStream, settings)
Dim xns As New XmlSerializerNamespaces
If (omitXMLNamespace) Then xns.Add("", "")
serializer.Serialize(writer, obj, xns)
End Using
Return Encoding.UTF8.GetString(memStream.ToArray())
End Using
End Function
Public Function Deserialize(Of YourXMLClass)(ByVal obj As YourXMLClass, ByVal xml As String) As YourXMLClass
Dim result As YourXMLClass
Dim serializer As New XmlSerializer(GetType(YourXMLClass))
Using memStream As New MemoryStream()
Dim bytes As Byte() = Encoding.UTF8.GetBytes(xml.ToArray)
memStream.Write(bytes, 0, bytes.Count)
memStream.Seek(0, SeekOrigin.Begin)
Using reader As XmlReader = XmlReader.Create(memStream)
result = DirectCast(serializer.Deserialize(reader), YourXMLClass)
End Using
End Using
Return result
End Function
I have an issue with Serializing XML. I have an object with a DateTime property where the millisecond value is 990. However when I view the outputted string it is showing like this...
<ReadingsDateTime>2016-07-04T10:10:00.99Z</ReadingsDateTime>
The code used to convert this to xml is below, what is going on, I can not find a reason that this is happening.
string xml;
try
{
var serializer = new XmlSerializerFactory().CreateSerializer(typeof(T), xmlNamespace);
using (var memoryStream = new MemoryStream())
{
var settings = new XmlWriterSettings
{
Indent = false,
NamespaceHandling = NamespaceHandling.OmitDuplicates,
CloseOutput = false,
WriteEndDocumentOnClose = true,
};
using (var xmlWriter = XmlWriter.Create(memoryStream, settings))
{
serializer?.Serialize(xmlWriter, obj);
}
memoryStream.Seek(0, SeekOrigin.Begin);
using (var steamReader = new StreamReader(memoryStream))
{
xml = steamReader.ReadToEnd();
}
}
}
catch (Exception ex)
{
throw new ApplicationException("Unable to convert to XML from an object", ex);
}
return xml;
.990 is the same as .99, its a fractional number so the last 0 digit is dropped. Digits have importance starting from the left hand side and going to the right. Example:
1.0000 is the same value as 1
2.94 is the same value as 2.940 or 2.9400 or 2.94000.
The serializer just removes the trailing 0 digits. If you want to always capture any trailing 0 digits (not sure why you would) you can add a custom string property and specify the exact output to be serialized and read in there and ignore the DateTime property, see this previous SO post as example.
I'm trying to create a spreadsheet in XML Spreadsheet 2003 format (so Excel can read it). I'm writing out the document using the XDocument class, and I need to get a newline in the body of one of the <Cell> tags. Excel, when it reads and writes, requires the files to have the literal string
embedded in the string to correctly show the newline in the spreadsheet. It also writes it out as such.
The problem is that XDocument is writing CR-LF (\r\n) when I have newlines in my data, and it automatically escapes ampersands for me when I try to do a .Replace() on the input string, so I end up with in my file, which Excel just happily writes out as a string literal.
Is there any way to make XDocument write out the literal
as part of the XML stream? I know I can do it by deriving from XmlTextWriter, or literally just writing out the file with a TextWriter, but I'd prefer not to if possible.
I wonder if it might be better to use XmlWriter directly, and WriteRaw?
A quick check shows that XmlDocument makes a slightly better job of it, but xml and whitespace gets tricky very quickly...
I battled with this problem for a couple of days and finally came up with this solution. I used XMLDocument.Save(Stream) method, then got the formatted XML string from the stream. Then I replaced the occurrences with
and used the TextWriter to write the string to a file.
string xml = "<?xml version=\"1.0\"?><?mso-application progid='Excel.Sheet'?><Workbook xmlns=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:o=\"urn:schemas-microsoft-com:office:office\" xmlns:x=\"urn:schemas-microsoft-com:office:excel\" xmlns:ss=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:html=\"http://www.w3.org/TR/REC-html40\">";
xml += "<Styles><Style ss:ID=\"s1\"><Alignment ss:Vertical=\"Center\" ss:WrapText=\"1\"/></Style></Styles>";
xml += "<Worksheet ss:Name=\"Default\"><Table><Column ss:Index=\"1\" ss:AutoFitWidth=\"0\" ss:Width=\"75\" /><Row><Cell ss:StyleID=\"s1\"><Data ss:Type=\"String\">Hello World</Data></Cell></Row></Table></Worksheet></Workbook>";
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.LoadXml(xml); //load the xml string
System.IO.MemoryStream stream = new System.IO.MemoryStream();
doc.Save(stream); //save the xml as a formatted string
stream.Position = 0; //reset the stream position since it will be at the end from the Save method
System.IO.StreamReader reader = new System.IO.StreamReader(stream);
string formattedXML = reader.ReadToEnd(); //fetch the formatted XML into a string
formattedXML = formattedXML.Replace(" ", "
"); //Replace the unhelpful 's with the wanted endline entity
System.IO.TextWriter writer = new System.IO.StreamWriter("C:\\Temp\test1.xls");
writer.Write(formattedXML); //write the XML to a file
writer.Close();