Regex to replace " with " only if it's not in another " - c#

I'm converting an encoded XML document to its original format
string myXml = oldXml.Replace("<", "<").Replace("&", "&")
.Replace(">", ">")
.Replace(""", "\"")
.Replace("&apos;", "'");
It works fine. However I want to exclude " if its in another ".
Example:
Original XML
//Note the title value
<v:shape id="_x0000_i1025" title="a" title "b"> </v:shape>
Encoded XML
<v:shape id="_x0000_i1025" title="a" title "b"> </v:shape>
Recovered XML after replace
//Note the title value
<v:shape id="_x0000_i1025" title="a" title "b"> </v:shape>
As you can see the inside " shouldn't be convert to ". So how can I do the replace with Regex so that it doesn't replace the inside "
Thank you

Turned out I don't need to recover the encoded XML.
Since nodElement.SetAttribute("myXmlAttribute", myXml); encodes the original Xml at the first place, when I read the attribute value and assign it to a string, C# will recover the original Xml automatically.
string content = theNode.Attributes["myXmlAttribute"].Value;
I don't need to do any replacing.

Related

String formatting for rich text box with constant and dynamic text

Currently I am attempting to create a dictionary which maps a selected item form a list view with a corresponding string output in a rich text box. I would like to bold specific text in the string, which will always be the same (constant) and also adding dynamic text to the string that would change.
Something like this:
ID: 8494903282
Where ID is the constant text I need bolded and the numbers would be a dynamic ID that changes. I will need to have multiple lines with different data in this format which will be changing:
ID: 8494903282
Name: Some Name
Date: 3/15/2018
Currently I have a rich text box to output to and I am trying to use some string formatting to do what I want but this is not working correctly. Essentially I need a string value I can store in a dictionary so when an item gets selected I can just set the rtf property of the text box to the value of that dictionary item.
Below I have my current format string I am attempting to set the rtf property to:
string s1 = string.Format(#"{{\rtf1\ansi \b Commit ID: \b0 {0}\line}}", entry.ID);
string s2 = string.Format(#"{{\b Author: \b0 {0}\line}}", entry.Author);
string s3 = string.Format(#"{{\b Date: \b0 {0}\line}}", entry.Date.ToString("d"));
string s4 = Environment.NewLine + Environment.NewLine + entry.Message;
contents = (s1 + s2 + s3 + s4);
Then setting the rtf property of my rich text box:
LogContentsTB.Rtf = Logs[LogNamesLV.SelectedItems[0].Name];
Where logs is a dictionary of the form < string, string > that holds the format string for the specific item.
However, I get the following output rather than my expected output:
This is the correct form of output for the first item but nothing else appears. If there are any other ways to do this I am open to suggestion.
After doing some light reading on the rtf syntax I noticed that I was trying to close off each string with curly braces. Curly braces are used for RTF groups. For some reason the rich text box in windows forms did not play well with that.
Another thing to notice is that the string.format method was probably the main culprit for cause of issues with this type of formatting. In my answer I do not use it but rather just add the string directly into the rtf formatted string i.e. < format >< variable string >< ending format >
If you look at NetMage's response, you will notice he only puts an opening brace on the very first string, s1. This is to group the whole string. But we need to add a closing brace on the final string, s4, to finish the grouping. Below is the final code and screenshot that worked for my application.
string s1 = #"{\rtf1\ansi\b ID: \b0 " + entry.ID + #" \line\line";
string s2 = #"\b Author: \b0 " + entry.Author + #" \line\line";
string s3 = #"\b Date: \b0 " + entry.Date.ToString("d") + #" \line\line ";
string s4 = entry.Message + #"}";
contents = s1 + s2 + s3 + s4;
Thanks for pointing me in the right direction!
I think your RTF formatting is wrong. You could try:
string s1 = string.Format(#"{{\rtf1\ansi\r\b Commit ID:\b0 {0}\line\r", entry.ID);
string s2 = string.Format(#"\b Author: \b0 {0}\line\r", entry.Author);
string s3 = string.Format(#"\b Date: \b0 {0}\line\r", entry.Date.ToString("d"));
string s4 = Environment.NewLine + Environment.NewLine + entry.Message + "}}";
contents = (s1 + s2 + s3 + s4);

How to add mixed Text and XElements on XElement value

I am trying to add as value to an XElement mixed text and inline elements.
For example when setting the string "this is a mixed text <foo>and</foo> inline element." the XElement.Nodes to be able to return the text node as XmlNodeType.Text & the element as XmlNodeType.Element.
Thanks in advance.
Use e.g. new XElement("parent", "this is a mixed text ", new XElement("foo", "and"), " inline element.") respectively element.Add("this is a mixed text ", new XElement("foo", "and"), " inline element.").
If you have a plain string then use e.g.
element.Add(XElement.Parse("<root>" + "this is a mixed text <foo>and</foo> inline element." + "</root>").Nodes());

How to format the given xml into single line (without spaces)

Using C# how can i format a given xml file into a single single line (without spaces)?
My output is giving symbols if there are spaces and new lines.
Use this:
public static string StripXmlWhitespace(string Xml)
{
Regex Parser = new Regex(#">\s*<");
Xml = Parser.Replace(Xml, "><");
return Xml.Trim();
}
You can use string's Replace method to format xmlString and then save it to output:
string singleLineXml = xml.Replace(System.Environment.NewLine, " ")
or
string singleLineXml = xml.Replace("\r\n", " ")
After removing line breaks > remove spaces:
singleLineXml.Remove(' ');
Yes #Steve Wellens, Remove(' ') is a bad idea.. let's try
singleLineXml.Replace("> <","><");
And i found relative thread, may be it helps Writing string to XML file without formatting (C#)

special chars in XML

I want to parse the following XML
XmlElement costCenterElement2 = doc.CreateElement("CostCenter");
costCenterElement2.InnerXml =
"<CostCenterNumber>2</CostCenterNumber> <CostCenter>" +
"G&A: Fin & Acctng" +
"</CostCenter>";
but I found XML Exception
An error occurred while parsing EntityName.
Yeah - a & is not valid in XML and needs to be escaped to &.
The other characters invalid characters and their escapes:
< - <
> - >
" - &quote;
' - &apos;
The following should work:
XmlElement costCenterElement2 = doc.CreateElement("CostCenter");
costCenterElement2.InnerXml =
"<CostCenterNumber>2</CostCenterNumber> <CostCenter>" +
"G&A: Fin & Acctng" +
"</CostCenter>";
However, you really should be creating the CostCenterNumber and CostCenter as elements and not as InnerXml.
private string SanitizeXml(string source)
{
if (string.IsNullOrEmpty(source))
{
return source;
}
if (source.IndexOf('&') < 0)
{
return source;
}
StringBuilder result = new StringBuilder(source);
result = result.Replace("<", "<>lt;")
.Replace(">", "<>gt;")
.Replace("&", "<>amp;")
.Replace("&apos;", "<>apos;")
.Replace(""", "<>quot;");
result = result.Replace("&", "&");
result = result.Replace("<>lt;", "<")
.Replace("<>gt;", ">")
.Replace("<>amp;", "&")
.Replace("<>apos;", "&apos;")
.Replace("<>quot;", """);
return result.ToString();
}
Updated:
#thabet, if the string "<CostCenterNumber>...G&A: Fin & Acctng</CostCenter>" is coming in as a parameter, and it's supposed to represent XML to be parsed, then it has to be well-formed XML to start with. In the example you gave, it isn't. & signals the start of an entity reference, is followed by an entity name, and is terminated by ;, which never appears in the string above.
If you are given that whole string as a parameter, some of which is markup that must be parsed (i.e. the start/end tags), and some of which may contain markup that should not be parsed (i.e. the &), there is no clean and reliable way to "escape" the latter and not escape the former. You could replace all & characters with &, but in doing so you might accidentally turn   into &#160; and your resulting content would be wrong. If this is your situation, that you are receiving input "XML" where markup is mixed with unparseable text, the best recourse is to tell the person from whom you are getting the XML that it's not well-formed and they need to fix their output. There are ways for them to do that that are not difficult with standard XML tools.
If on the other hand you have
<CostCenterNumber>2</CostCenterNumber>
<CostCenter>...</CostCenter>
separately from the passed string, and you need to plug in the passed string as the text content of the child <CostCenter>, and you know it is not to be parsed (does not contain elements), then you can do this:
create <CostCenterNumber> and <CostCenter> as elements
make them children of the parent <CostCenter>
set CostCenterNumber's text content using InnerXML assuming there is no risk of markup in there: eltCCN.InnerXml = "2";
create for the child CostCenter element a Text node child whose value is the passed string: textCC = doc.CreateText(argStr);
assign that text node as a child of the child CostCenter element: eltCC.AppendChild(textCC);

why Request.QueryString replace + with empty char in some cases?

I have a problem that if I pass a string that contain + in a query string and try to read it , it get the same string but by replacing + with empty char
For example if i pass query like ../Page.aspx?data=sdf1+sdf then in page load I read data by data = Request.QueryString["data"] it will get as below data ="sdf1 sdf"
I solve the problem by replacing any empty char with + ..
But Is there any problem that cause that ? and Is my solution by replacing empty char with + is the best solution in all cases ?
Because + is the url encoded representation of space " ". If you want to preseve the plus sign in your value you will need to url encode it:
"/Page.aspx?data=" + HttpUtility.UrlEncode("sdf1+sdf")
which will produce:
/Page.aspx?data=sdf1%2bsdf
Now when you read Request.QueryString["data"] you will get what you expect.

Categories