Saving an XML that has invalid characters

Saving an XML that has invalid characters - c#

there are code snippets that strip the invalid characters inside a string before we save it as an XML ... but I have one more problem: Let's say my user wants to have a column name like "[MyColumnOne] ...so now I do not want to strip these "[","] well because these are the ones that user has defined and wants to see them so if I use some codes that are stripping the invalid characters they are also removing "[" and "[" but in this case I still need them to be saved... what can I do?

Never mind, I changed my RegEx format to use XML 1.1 instead of XML 1.0 and now it is working good :
string pattern = String.Empty;
//pattern = #"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|7F|8[0-46-9A-F]9[0-9A-F])"; //XML 1.0
pattern = #"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|[19][0-9A-F]|7F|8[0-46-9A-F]|0?[1-8BCEF])"; // XML 1.1
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
if (regex.IsMatch(sString))
{
sString = regex.Replace(sString, String.Empty);
File.WriteAllText(sString, sString, Encoding.UTF8);
}
return sString;

This worked for me, and it was fast.
private object NormalizeString(object p) {
object result = p;
if (p is string || p is long) {
string s = string.Format("{0}", p);
string resultString = s.Trim();
if (string.IsNullOrWhiteSpace(resultString)) return "";
Regex rxInvalidChars = new Regex("[\r\n\t]+", RegexOptions.IgnoreCase);
if (rxInvalidChars.IsMatch(resultString)) {
resultString = rxInvalidChars.Replace(resultString, " ");
}
//string pattern = String.Empty;
//pattern = #"";
////pattern = #"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|7F|8[0-46-9A-F]9[0-9A-F])"; //XML 1.0
////pattern = #"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|[19][0-9A-F]|7F|8[0-46-9A-F]|0?[1-8BCEF])"; // XML 1.1
//Regex rxInvalidXMLChars = new Regex(pattern, RegexOptions.IgnoreCase);
//if (rxInvalidXMLChars.IsMatch(resultString)) {
// resultString = rxInvalidXMLChars.Replace(resultString, "");
//}
result = string.Join("", resultString.Where(c => c >= ' '));
}
return result;
}

Related

replace links in string with my link

i want to replace every link(s) in a string with the link i want to provide. What i have tried is-
StreamReader reader = new StreamReader(dd1.SelectedItem.Value);
string readFile = reader.ReadToEnd();
Regex regx = new Regex("http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*([a-zA-Z0-9\\?\\#\\=\\/]){1})?", RegexOptions.IgnoreCase);
string output=regx.ToString();
output = readFile;
MatchCollection matches = regx.Matches(output);
foreach (Match match in matches)
{
output = output.Replace(#"match.Value", #"http://localhost:61187/two?" + "sender=" + Server.UrlEncode(this.txtUsername.Text) + "&reciever=" + output);
}
Here, i have a string output which contains some links. So, i have used regex to parse the links in the string. But, the string named "output" is not read and its neither showing an error nor an output.

It seems to me that you should be using regx.Replace(...) instead:
StreamReader reader = new StreamReader(dd1.SelectedItem.Value);
string readFile = reader.ReadToEnd();
Regex regx = new Regex("http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*([a-zA-Z0-9\\?\\#\\=\\/]){1})?", RegexOptions.IgnoreCase);
string output = regx.ToString();
output = readFile;
string username = Server.UrlEncode(this.txtUsername.Text);
output = regx.Replace(output, new MatchEvaluator((match) =>
{
var url = Uri.EscapeDataString(match.Value);
return $"http://localhost:61187/two?sender={username}&receiver={url}";
}));
This will replace every match with the URL returned by the anonymous function.

finding a variable from string input and extracting it with regex c#

~I have a client which is sending the a message to my server and I am trying to get substrings in order to extract them into variables. I want to use regex for this. Although I have no syntax problems, it will not match. This is the message I am sending and my code.
" PUT /John\r\n\r\n
London "
private StreamReader sReader = null;
private StreamWriter sWriter = null;
public SocketClass(Socket s)
{
socket = s;
NetworkStream nStream = new NetworkStream(s);
sReader = new StreamReader(nStream);
sWriter = new StreamWriter(nStream);
startSocket();
}
String txt = "";
while (sReader.Peek() >= 0)
{
txt += sReader.ReadLine() + "\r\n";
}
else if (txt.Contains("PUT"))
{
Console.WriteLine("triggered");
Regex pattern = new Regex(#"PUT /(?<Name>\d+)\r\n\r\n(?<Location>\d+)\r\n");
Match match = pattern.Match(txt);
if (match.Success)
{
String Name = match.Groups["Name"].Value;
String Location = match.Groups["Location"].Value;
Console.WriteLine(Name);
Console.WriteLine(Location);
}
}

The problem seems to be that while your input has alphanumeric characters your regex is looking for \d which are numeric digits. The regex can be easily changed to this to make it work:
Regex pattern = new Regex(#"PUT /(?<Name>.+)\r\n\r\n(?<Location>.+)\r\n");
. represents any character. It may be that you could narrow it down more to say the match has to be alphabetic characters or something else but the above will certainly work for your given input.

Replacing Special Character with their codes

I am passing XML data to a server from a text Box, now issue is XML is giving issues with symbols like & < |. So i want to replace these symbols with their equivalent codes.
if i use string.replace function it will replace the characters recently replaced as well.
.Replace("&", "&")
.Replace("<", "<")
.Replace("|", "|")
.Replace("!", "!")
.Replace("#", "#")
As it go through complete string again and again.
So &<# will become "&#38;&#60;"
I also tried Dictionary method:
var replacements = new Dictionary<string, string>
{
{"&", "&"},
{"<", "<"},
{"|", "|"},
{"!", "!"},
{"#", "#"}
}
var output = replacements.Aggregate(input, (current, replacement) => current.Replace(replacement.Key, replacement.Value));
return output;
But same issue here as well. I also tried string builder method, but same repeating replacement issue. Any Advise?

You shouldn't be trying to escape characters manually. There are libraries and methods that are already built to do this such the SecurityElement.Escape(). It specifically escapes invalid XML characters into a known safe format that can be unescaped later.

I strongly advise using proper XML handling to build XML:
var id = 3;
var message = "&'<crazyMessage&&";
var xmlDoc = new XmlDocument();
using(var writer = xmlDoc.CreateNavigator().AppendChild())
{
writer.WriteStartElement("ROOT");
writer.WriteElementString("ID", id.ToString());
writer.WriteStartElement("INPUT");
writer.WriteElementString("ENGMSG", message);
writer.WriteEndElement(); // INPUT
writer.WriteEndElement(); // ROOT
}
var xmlString = xmlDoc.InnerXml;
Console.WriteLine(xmlString);
Ideone example
If you are using .NET 3.5 or higher, you can use Linq2Xml to build the XML, which is a bit cleaner:
var id = 3;
var message = "&'<crazyMessage&&";
var xml = new XElement("ROOT",
new XElement("ID", id),
new XElement("INPUT",
new XElement("ENGMSG", message)
)
);
var xmlString = xml.ToString();
Console.WriteLine(xmlString);

public static string Transform(string input, Dictionary<string, string> replacements)
{
string finalString = string.Empty;
for (int i = 0; i < input.Length; i++)
{
if (replacements.ContainsKey(input[i].ToString()))
{
finalString = finalString + replacements[input[i].ToString()];
}
else
{
finalString = finalString + input[i].ToString();
}
}
return finalString;
}

Extract text from <1></1> (HTML/XML-Like but with Number Tag)

So I have a long string containing pointy brackets that I wish to extract text parts from.
string exampleString = "<1>text1</1><27>text27</27><3>text3</3>";
I want to be able to get this
1 = "text1"
27 = "text27"
3 = "text3"
How would I obtain this easily? I haven't been able to come up with a non-hacky way to do it.
Thanks.

Using basic XmlReader and some other tricks to do wrapper to create XML-like data, I would do something like this
string xmlString = "<1>text1</1><27>text27</27><3>text3</3>";
xmlString = "<Root>" + xmlString.Replace("<", "<o").Replace("<o/", "</o") + "</Root>";
string key = "";
List<KeyValuePair<string,string>> kvpList = new List<KeyValuePair<string,string>>(); //assuming the result is in the KVP format
using (XmlReader xmlReader = XmlReader.Create(new StringReader(xmlString))){
bool firstElement = true;
while (xmlReader.Read()) {
if (firstElement) { //throwing away root
firstElement = false;
continue;
}
if (xmlReader.NodeType == XmlNodeType.Element) {
key = xmlReader.Name.Substring(1); //cut of "o"
} else if (xmlReader.NodeType == XmlNodeType.Text) {
kvpList.Add(new KeyValuePair<string,string>(key, xmlReader.Value));
}
}
}
Edit:
The main trick is this line:
xmlString = "<Root>" + xmlString.Replace("<", "<o").Replace("<o/", "</o") + "</Root>"; //wrap to make this having single root, o is put to force the tagName started with known letter (comment edit suggested by Mr. chwarr)
Where you first replace all opening pointy brackets with itself + char, i.e.
<1>text1</1> -> <o1>text1<o/1> //first replacement, fix the number issue
and then reverse the sequence of all the opening point brackets + char + forward slash to opening point brackets + forward slash + char
<o1>text1<o/1> -> <o1>text1</o1> //second replacement, fix the ending tag issue
Using simple WinForm with RichTextBox to print out the result,
for (int i = 0; i < kvpList.Count; ++i) {
richTextBox1.AppendText(kvpList[i].Key + " = " + kvpList[i].Value + "\n");
}
Here is the result I get:

This is far from bulletproof, but you could use a combination of split and Regex matching:
string exampleString = "<1>text1</1><27>text27</27><3>text3</3>";
string[] results = exampleString.Split(new string[] { "><" }, StringSplitOptions.None);
Regex r = new Regex(#"^<?(\d+)>([^<]+)<");
foreach (string result in results)
{
Match m = r.Match(result);
if (m.Success)
{
string index = m.Groups[1].Value;
string value = m.Groups[2].Value;
}
}
The most non-bulletproof example I can think of is if your text contains a "<", that would pretty much break this.

Facebook feed - remove extra Facebook JS from anchor

Please help me to replace all the additional Facebook information from here using C# .net Regex Replace method.
Example
http://on.fb.me/OE6gnBsomehtml
Output
somehtml on.fb.me/OE6gnB somehtml
I tried following regex but they didn't work for me
searchPattern = "<a([.]*)?/l.php([.]*)?(\">)?([.]*)?(</a>)?";
replacePattern = "$3";
Thanks

I manage to do this using regex with following code
searchPattern = "<a(.*?)href=\"/l.php...(.*?)&?(.*?)>(.*?)</a>";
string html1 = Regex.Replace(html, searchPattern, delegate(Match oMatch)
{
return string.Format("{1}", HttpUtility.UrlDecode(oMatch.Groups[2].Value), oMatch.Groups[4].Value);
});

You can try this (System.Web has to be added to use System.Web.HttpUtility):
string input = #"http://on.fb.me/OE6gnBsomehtml";
string rootedInput = String.Format("<root>{0}</root>", input);
XDocument doc = XDocument.Parse(rootedInput, LoadOptions.PreserveWhitespace);
string href;
var anchors = doc.Descendants("a").ToArray();
for (int i = anchors.Count() - 1; i >= 0; i--)
{
href = HttpUtility.ParseQueryString(anchors[i].Attribute("href").Value)[0];
XElement newAnchor = new XElement("a");
newAnchor.SetAttributeValue("href", href);
newAnchor.SetValue(href.Replace(#"http://", String.Empty));
anchors[i].ReplaceWith(newAnchor);
}
string output = doc.Root.ToString(SaveOptions.DisableFormatting)
.Replace("<root>", String.Empty)
.Replace("</root>", String.Empty);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Saving an XML that has invalid characters - c#

Related

replace links in string with my link

finding a variable from string input and extracting it with regex c#

Replacing Special Character with their codes

Extract text from <1></1> (HTML/XML-Like but with Number Tag)

Facebook feed - remove extra Facebook JS from anchor

Categories

Resources