Convert from Unicode Characters to blank in XML File - c#

How can i converts specific Unicode Characters From Xml file which is not valid in XML file at the time Desensitization. I just tried to use below Regex Function but not getting Success.
string strXML = File.ReadAllText("Xml File Path", Encoding.UTF8);
System.Text.RegularExpressions.Regex _invalidXMLChars = new System.Text.RegularExpressions.Regex(#"(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F\uFEFF\uFFFE\uFFFF\00EF\00BB\00BF]", System.Text.RegularExpressions.RegexOptions.Compliled);
strXML = _invalidXMLChars.Replace(strXML, "");

Related

xml string change the header encoding using c#

I have string of XML .
how can I change the header from:
string xml = "<?xml version='1.0' encoding='ISO-8859-8'?>";
to
string xml = "<?xml version='1.0' encoding='UTF-8'?>";
using c#?
UPDATE
I tryed to get the xml to User object
XmlSerializer serializer = new XmlSerializer(typeof(User));
MemoryStream memStream = new MemoryStream(Encoding.UTF8.GetBytes(xml));
User user = (User)serializer.Deserialize(memStream);
but in the User object I get the string not encoding well.
because of the encoding of the Xml I need to change the encoding.
Instead of Encoding.UTF8.GetBytes use Encoding.GetEncoding("ISO-8859-8").GetBytes.
If the XML is stored in a string variable and you need to only replace the value in the encoding attribute, then you can perform a replace as following:
const string searchEncoding = "ISO-8859-8";
const string newEncoding = "UTF-8";
string xml = #"<?xml version='1.0' encoding='ISO-8859-8'?><abc></abc>";
int encodingPos = xml.IndexOf(searchEncoding);
if (encodingPos==30)
{
xml = xml.Substring(0, encodingPos) + newEncoding + xml.Substring(encodingPos + searchEncoding.Length);
}
However, a different process is necessary if the XML is stored in another datatype and/or you need to re-encode the XML content.

Replace a string in a text read from a csv and save it

I managed to load the csv and now want to change a few strings inside and then save it again.
First problem: He doesnt want to change the text to '0 . Replacing only "4" with "0" works, but never when my string has more than 1 character.
Second problem: The last replace where I delete all ' to "". When opening the csv in an editor it shows some weird asian characters instead of nothing.
(䈀攀稀甀最猀瀀爀)
There are no spaces in my csv. The csv looks like
.....";"++49 then more random numbers and so on.
This is just the part where ++49 is to be found.
Relevant code:
Encoding ansi = Encoding.GetEncoding(1252);
foreach (string file in Directory.EnumerateFiles(#"path comes here, "*.csv"))
{
string text = File.ReadAllText(file, ansi);
text = text.Replace(#"++49", "'0");
text = text.Replace("+49", "'0");
text = text.Replace(#"""", "");
File.WriteAllText(file, text, ansi);
}
Am i doing something fundamentally wrong?
edit: What it looks like: ";"++49<morenumbers>";; What it should look like: ;0<morenumbers>;;
As people mentioned in comments, problem is with your file encoding decoding. So in this case you can try this:
foreach(string file in Directory.EnumerateFiles(#"path comes here","*.csv"))
{
Encoding ansi;
using (var reader = new System.IO.StreamReader(file, true))
{
ansi = reader.CurrentEncoding; // please tell what you have here ! :)
}
string text = File.ReadAllText(file, ansi);
text = text.Replace(#"++49", "'0");
text = text.Replace(#"+49", "'0");
text = text.Replace(#"""", "");
File.WriteAllText(file, text, ansi);
}
For me it works fine with all formats I was able to set. Then you do not have to set your encoding as hardcoded value

How to read and output the XML within an SPFile?

I have this line of code that retrieves and XML file and saves it to an SPFile
SPFile XMLFile = SPContext.Current.Web.GetFile("C:\\Users\\maleem\\Documents\\XMLTest.xml");
I want to get the XML/Text within it and output it to a literal, I tried
StreamReader reader = new StreamReader(XMLFile.OpenBinaryStream());
And a few variants but its not working.
If you use the OpenBinary method of SPFile the return is a byte array you can then convert it into a string.
Depending on the encoding you can try this:
For default encoding:
string str = System.Text.Encoding.Default.GetString(XMLFile.OpenBinary());
For UTF8:
string str = System.Text.Encoding.UTF8.GetString(XMLFile.OpenBinary());

Reading a CSV file containing greek characters

I am trying to read the data from a CSV file using the following:
var lines = File.ReadAllLines(#"c:\test.csv").Select(a => a.Split(';'));
It works but the fields that contain words are written with Greek charactes and they are presented as symbols.
How can I set the Encoding correctly in order to read those greek characters?
ReadAllLines has overload, which takes Encoding along file path
var lines = File.ReadAllLines(#"c:\test.csv", Encoding.Unicode)
.Select(line => line.Split(';'));
Testing:
File.WriteAllText(#"c:\test.csv", "ϗϡϢϣϤ", Encoding.Unicode);
Console.WriteLine(File.ReadAllLines(#"c:\test.csv", Encoding.Unicode));
will print:
ϗϡϢϣϤ
To find out in which encoding the file was actually written, use next snippet:
using (var r = new StreamReader(#"c:\test.csv", detectEncodingFromByteOrderMarks: true))
{
Console.WriteLine (r.CurrentEncoding.BodyName);
}
for my scenario it will print
utf-8

Using XDocument to write raw XML

I'm trying to create a spreadsheet in XML Spreadsheet 2003 format (so Excel can read it). I'm writing out the document using the XDocument class, and I need to get a newline in the body of one of the <Cell> tags. Excel, when it reads and writes, requires the files to have the literal string
embedded in the string to correctly show the newline in the spreadsheet. It also writes it out as such.
The problem is that XDocument is writing CR-LF (\r\n) when I have newlines in my data, and it automatically escapes ampersands for me when I try to do a .Replace() on the input string, so I end up with &#10; in my file, which Excel just happily writes out as a string literal.
Is there any way to make XDocument write out the literal
as part of the XML stream? I know I can do it by deriving from XmlTextWriter, or literally just writing out the file with a TextWriter, but I'd prefer not to if possible.
I wonder if it might be better to use XmlWriter directly, and WriteRaw?
A quick check shows that XmlDocument makes a slightly better job of it, but xml and whitespace gets tricky very quickly...
I battled with this problem for a couple of days and finally came up with this solution. I used XMLDocument.Save(Stream) method, then got the formatted XML string from the stream. Then I replaced the &#10; occurrences with
and used the TextWriter to write the string to a file.
string xml = "<?xml version=\"1.0\"?><?mso-application progid='Excel.Sheet'?><Workbook xmlns=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:o=\"urn:schemas-microsoft-com:office:office\" xmlns:x=\"urn:schemas-microsoft-com:office:excel\" xmlns:ss=\"urn:schemas-microsoft-com:office:spreadsheet\" xmlns:html=\"http://www.w3.org/TR/REC-html40\">";
xml += "<Styles><Style ss:ID=\"s1\"><Alignment ss:Vertical=\"Center\" ss:WrapText=\"1\"/></Style></Styles>";
xml += "<Worksheet ss:Name=\"Default\"><Table><Column ss:Index=\"1\" ss:AutoFitWidth=\"0\" ss:Width=\"75\" /><Row><Cell ss:StyleID=\"s1\"><Data ss:Type=\"String\">Hello&#10;&#10;World</Data></Cell></Row></Table></Worksheet></Workbook>";
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.LoadXml(xml); //load the xml string
System.IO.MemoryStream stream = new System.IO.MemoryStream();
doc.Save(stream); //save the xml as a formatted string
stream.Position = 0; //reset the stream position since it will be at the end from the Save method
System.IO.StreamReader reader = new System.IO.StreamReader(stream);
string formattedXML = reader.ReadToEnd(); //fetch the formatted XML into a string
formattedXML = formattedXML.Replace("&#10;", "
"); //Replace the unhelpful &#10;'s with the wanted endline entity
System.IO.TextWriter writer = new System.IO.StreamWriter("C:\\Temp\test1.xls");
writer.Write(formattedXML); //write the XML to a file
writer.Close();

Categories