Facebook feed - remove extra Facebook JS from anchor - c#

Please help me to replace all the additional Facebook information from here using C# .net Regex Replace method.
Example
http://on.fb.me/OE6gnBsomehtml
Output
somehtml on.fb.me/OE6gnB somehtml
I tried following regex but they didn't work for me
searchPattern = "<a([.]*)?/l.php([.]*)?(\">)?([.]*)?(</a>)?";
replacePattern = "$3";
Thanks

I manage to do this using regex with following code
searchPattern = "<a(.*?)href=\"/l.php...(.*?)&?(.*?)>(.*?)</a>";
string html1 = Regex.Replace(html, searchPattern, delegate(Match oMatch)
{
return string.Format("{1}", HttpUtility.UrlDecode(oMatch.Groups[2].Value), oMatch.Groups[4].Value);
});

You can try this (System.Web has to be added to use System.Web.HttpUtility):
string input = #"http://on.fb.me/OE6gnBsomehtml";
string rootedInput = String.Format("<root>{0}</root>", input);
XDocument doc = XDocument.Parse(rootedInput, LoadOptions.PreserveWhitespace);
string href;
var anchors = doc.Descendants("a").ToArray();
for (int i = anchors.Count() - 1; i >= 0; i--)
{
href = HttpUtility.ParseQueryString(anchors[i].Attribute("href").Value)[0];
XElement newAnchor = new XElement("a");
newAnchor.SetAttributeValue("href", href);
newAnchor.SetValue(href.Replace(#"http://", String.Empty));
anchors[i].ReplaceWith(newAnchor);
}
string output = doc.Root.ToString(SaveOptions.DisableFormatting)
.Replace("<root>", String.Empty)
.Replace("</root>", String.Empty);

Related

extracting a substring within a multiline string

I have a text file containing the following lines:
<TestInfo."Content">
{
<Label> "Content"
<Visible> "true"
"This is the text I want to get"
}
<TestInfo."Content2">
{
<Label> "Content2"
<Visible> "true"
"I don't want e.g. this"
}
I want to extract This is the text I want to get.
I tried e.g. the following:
string tmp = File.ReadAllText(textfile);
string result = Regex.Match(tmp, #"<Label> ""Content"" \n\s+ <Visible> ""true"" \n\s+ ""(.+?)""", RegexOptions.Singleline).Groups[1].Value;
However, in this case I get only the first word.
So, my output is: This
And I have no idea why...
I would appreciate any help. Thanks!
If you want the entire line after the line that starts with <Visible>, you'd better read the file line by line instead of using File.ReadAllText and a regular expression:
string result;
using (StreamReader sr = new StreamReader(textfile))
{
while (sr.Peek() >= 0)
{
string line = sr.ReadLine();
if (line.StartsWith("<Visible>"))
{
result = sr.ReadLine();
break;
}
}
}
Try this:
var tmp = File.ReadAllText("TextFile1.txt");
var result = Regex.Match(tmp, "This is the text I want to get", RegexOptions.Multiline);
if (result.Groups.Count> 0)
for (int i = 0; i < result.Groups.Count; i++)
Console.WriteLine(result.Groups[i].Value);
else
Console.WriteLine("string not found.");
Regards,
//jafc
You could change your regex this way:
var result = Regex.Match(tmp, #"<Visible> ""true""\s*""([\S ]+)""", RegexOptions.Singleline).Groups[1].Value;
If you want to get all the matches, not only the first one, you could use Regex.Matches
Thanks a lot for your input! This helped me to find a final solution:
First, I extracted only a small part containing the string I want to extract to avoid ambiguities:
string[] tmp = File.ReadAllLines(textfile);
List<string> Content = new List<string>();
bool dumpA = false;
Regex regBEGIN = new Regex(#"<TestInfo\.""Content"">");
Regex regEND = new Regex(#"<TestInfo\.""Content2"">");
foreach (string line in tmp)
{
if (dumpA)
Content.Add(line.Trim());
if (regBEGIN.IsMatch(line))
dumpA = true;
if (regEND.IsMatch(line)) break;
}
Then I can extract the (now only once existing) line starting with '"':
string result = "";
foreach (string line in Content)
{
if (line.StartsWith("\""))
{
result = line;
result = result.Replace("\"", "");
result = result.Trim();
}
}

replace links in string with my link

i want to replace every link(s) in a string with the link i want to provide. What i have tried is-
StreamReader reader = new StreamReader(dd1.SelectedItem.Value);
string readFile = reader.ReadToEnd();
Regex regx = new Regex("http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*([a-zA-Z0-9\\?\\#\\=\\/]){1})?", RegexOptions.IgnoreCase);
string output=regx.ToString();
output = readFile;
MatchCollection matches = regx.Matches(output);
foreach (Match match in matches)
{
output = output.Replace(#"match.Value", #"http://localhost:61187/two?" + "sender=" + Server.UrlEncode(this.txtUsername.Text) + "&reciever=" + output);
}
Here, i have a string output which contains some links. So, i have used regex to parse the links in the string. But, the string named "output" is not read and its neither showing an error nor an output.
It seems to me that you should be using regx.Replace(...) instead:
StreamReader reader = new StreamReader(dd1.SelectedItem.Value);
string readFile = reader.ReadToEnd();
Regex regx = new Regex("http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*([a-zA-Z0-9\\?\\#\\=\\/]){1})?", RegexOptions.IgnoreCase);
string output = regx.ToString();
output = readFile;
string username = Server.UrlEncode(this.txtUsername.Text);
output = regx.Replace(output, new MatchEvaluator((match) =>
{
var url = Uri.EscapeDataString(match.Value);
return $"http://localhost:61187/two?sender={username}&receiver={url}";
}));
This will replace every match with the URL returned by the anonymous function.

Replacing Special Character with their codes

I am passing XML data to a server from a text Box, now issue is XML is giving issues with symbols like & < |. So i want to replace these symbols with their equivalent codes.
if i use string.replace function it will replace the characters recently replaced as well.
.Replace("&", "&")
.Replace("<", "<")
.Replace("|", "|")
.Replace("!", "!")
.Replace("#", "#")
As it go through complete string again and again.
So &<# will become "&#38;&#60;"
I also tried Dictionary method:
var replacements = new Dictionary<string, string>
{
{"&", "&"},
{"<", "<"},
{"|", "|"},
{"!", "!"},
{"#", "#"}
}
var output = replacements.Aggregate(input, (current, replacement) => current.Replace(replacement.Key, replacement.Value));
return output;
But same issue here as well. I also tried string builder method, but same repeating replacement issue. Any Advise?
You shouldn't be trying to escape characters manually. There are libraries and methods that are already built to do this such the SecurityElement.Escape(). It specifically escapes invalid XML characters into a known safe format that can be unescaped later.
I strongly advise using proper XML handling to build XML:
var id = 3;
var message = "&'<crazyMessage&&";
var xmlDoc = new XmlDocument();
using(var writer = xmlDoc.CreateNavigator().AppendChild())
{
writer.WriteStartElement("ROOT");
writer.WriteElementString("ID", id.ToString());
writer.WriteStartElement("INPUT");
writer.WriteElementString("ENGMSG", message);
writer.WriteEndElement(); // INPUT
writer.WriteEndElement(); // ROOT
}
var xmlString = xmlDoc.InnerXml;
Console.WriteLine(xmlString);
Ideone example
If you are using .NET 3.5 or higher, you can use Linq2Xml to build the XML, which is a bit cleaner:
var id = 3;
var message = "&'<crazyMessage&&";
var xml = new XElement("ROOT",
new XElement("ID", id),
new XElement("INPUT",
new XElement("ENGMSG", message)
)
);
var xmlString = xml.ToString();
Console.WriteLine(xmlString);
public static string Transform(string input, Dictionary<string, string> replacements)
{
string finalString = string.Empty;
for (int i = 0; i < input.Length; i++)
{
if (replacements.ContainsKey(input[i].ToString()))
{
finalString = finalString + replacements[input[i].ToString()];
}
else
{
finalString = finalString + input[i].ToString();
}
}
return finalString;
}

C# Regexp change link format

On my forum I have a lot of redundant link data like:
[url:30l7ypk7]http://www.box.net/shared/0p28sf6hib[/url:30l7ypk7]
In regexp how can I change these to the format:
http://www.box.net/shared/0p28sf6hib
string orig = "[url:30l7ypk7]http://www.box.net/shared/0p28sf6hib[/url:30l7ypk7]";
string replace = "$1";
string regex = #"\[url:.*?](.*?)\[/url:.*?]";
string fixedLink = Regex.Replace(orig, regex, replace);
This isn't doing it totally in Regex but will still work...
string oldUrl = "[url:30l7ypk7]http://www.box.net/shared/0p28sf6hib[/url:30l7ypk7]";
Regex regExp = new Regex(#"http://[^\[]*");
var match = regExp.Match(oldUrl);
string newUrl = string.Format("<a href='{0}' rel='nofollow'>{0}</a>", match.Value);
This should capture the string \[([^\]]+)\]([^[]+)\[/\1\] and group it so you can pull out the URL like this:
Regex re = new Regex(#"\[([^\]]+)\]([^[]+)\[/\1\]");
var s = #"[url:30l7ypk7]http://www.box.net/shared/0p28sf6hib[/url:30l7ypk7]";
var replaced = s.Replace(s, string.Format("{0}", re.Match(s).Groups[1].Value));
Console.WriteLine(replaced)
This is just from memory but I will try to check it over when I have more time. Should help get you started.
string matchPattern = #"\[(url\:\w)\](.+?)\[/\1\]";
String replacePattern = #"<a href='$2' rel='nofollow'>$2</a>";
String blogText = ...;
blogText = Regex.Replace(matchPattern, blogText, replacePattern);

Saving an XML that has invalid characters

there are code snippets that strip the invalid characters inside a string before we save it as an XML ... but I have one more problem: Let's say my user wants to have a column name like "[MyColumnOne] ...so now I do not want to strip these "[","] well because these are the ones that user has defined and wants to see them so if I use some codes that are stripping the invalid characters they are also removing "[" and "[" but in this case I still need them to be saved... what can I do?
Never mind, I changed my RegEx format to use XML 1.1 instead of XML 1.0 and now it is working good :
string pattern = String.Empty;
//pattern = #"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|7F|8[0-46-9A-F]9[0-9A-F])"; //XML 1.0
pattern = #"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|[19][0-9A-F]|7F|8[0-46-9A-F]|0?[1-8BCEF])"; // XML 1.1
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
if (regex.IsMatch(sString))
{
sString = regex.Replace(sString, String.Empty);
File.WriteAllText(sString, sString, Encoding.UTF8);
}
return sString;
This worked for me, and it was fast.
private object NormalizeString(object p) {
object result = p;
if (p is string || p is long) {
string s = string.Format("{0}", p);
string resultString = s.Trim();
if (string.IsNullOrWhiteSpace(resultString)) return "";
Regex rxInvalidChars = new Regex("[\r\n\t]+", RegexOptions.IgnoreCase);
if (rxInvalidChars.IsMatch(resultString)) {
resultString = rxInvalidChars.Replace(resultString, " ");
}
//string pattern = String.Empty;
//pattern = #"";
////pattern = #"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|7F|8[0-46-9A-F]9[0-9A-F])"; //XML 1.0
////pattern = #"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|[19][0-9A-F]|7F|8[0-46-9A-F]|0?[1-8BCEF])"; // XML 1.1
//Regex rxInvalidXMLChars = new Regex(pattern, RegexOptions.IgnoreCase);
//if (rxInvalidXMLChars.IsMatch(resultString)) {
// resultString = rxInvalidXMLChars.Replace(resultString, "");
//}
result = string.Join("", resultString.Where(c => c >= ' '));
}
return result;
}

Categories