Formatting Twitter text (TweetText) with C#

Formatting Twitter text (TweetText) with C# - c#

Is there a better way to format text from Twitter to link the hyperlinks, username and hashtags? What I have is working but I know this could be done better. I am interested in alternative techniques. I am setting this up as a HTML Helper for ASP.NET MVC.
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Web;
using System.Web.Mvc;
namespace Acme.Mvc.Extensions
{
public static class MvcExtensions
{
const string ScreenNamePattern = #"#([A-Za-z0-9\-_&;]+)";
const string HashTagPattern = #"#([A-Za-z0-9\-_&;]+)";
const string HyperLinkPattern = #"(http://\S+)\s?";
public static string TweetText(this HtmlHelper helper, string text)
{
return FormatTweetText(text);
}
public static string FormatTweetText(string text)
{
string result = text;
if (result.Contains("http://"))
{
var links = new List<string>();
foreach (Match match in Regex.Matches(result, HyperLinkPattern))
{
var url = match.Groups[1].Value;
if (!links.Contains(url))
{
links.Add(url);
result = result.Replace(url, String.Format("{0}", url));
}
}
}
if (result.Contains("#"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, ScreenNamePattern))
{
var screenName = match.Groups[1].Value;
if (!names.Contains(screenName))
{
names.Add(screenName);
result = result.Replace("#" + screenName,
String.Format("#{0}", screenName));
}
}
}
if (result.Contains("#"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, HashTagPattern))
{
var hashTag = match.Groups[1].Value;
if (!names.Contains(hashTag))
{
names.Add(hashTag);
result = result.Replace("#" + hashTag,
String.Format("#{1}",
HttpUtility.UrlEncode("#" + hashTag), hashTag));
}
}
}
return result;
}
}
}

That is remarkably similar to the code I wrote that displays my Twitter status on my blog. The only further things I do that I do are
1) looking up #name and replacing it with Real Name;
2) multiple #name's in a row get commas, if they don't have them;
3) Tweets that start with #name(s) are formatted "To #name:".
I don't see any reason this can't be an effective way to parse a tweet - they are a very consistent format (good for regex) and in most situations the speed (milliseconds) is more than acceptable.
Edit:
Here is the code for my Tweet parser. It's a bit too long to put in a Stack Overflow answer. It takes a tweet like:
#user1 #user2 check out this cool link I got from #user3: http://url.com/page.htm#anchor #coollinks
And turns it into:
<span class="salutation">
To Real Name,
Real Name:
</span> check out this cool link I got from
<span class="salutation">
Real Name
</span>:
http://site.com/...
#coollinks
It also wraps all that markup in a little JavaScript:
document.getElementById('twitter').innerHTML = '{markup}';
This is so the tweet fetcher can run asynchronously as a JS and if Twitter is down or slow it won't affect my site's page load time.

I created helper method to shorten text to 140 chars with url included. You can set share length to 0 to exclude url from tweet.
public static string FormatTwitterText(this string text, string shareurl)
{
if (string.IsNullOrEmpty(text))
return string.Empty;
string finaltext = string.Empty;
string sharepath = string.Format("http://url.com/{0}", shareurl);
//list of all words, trimmed and new space removed
List<string> textlist = text.Split(' ').Select(txt => Regex.Replace(txt, #"\n", "").Trim())
.Where(formatedtxt => !string.IsNullOrEmpty(formatedtxt))
.ToList();
int extraChars = 3; //to account for the two dots ".."
int finalLength = 140 - sharepath.Length - extraChars;
int runningLengthCount = 0;
int collectionCount = textlist.Count;
int count = 0;
foreach (string eachwordformated in textlist
.Select(eachword => string.Format("{0} ", eachword)))
{
count++;
int textlength = eachwordformated.Length;
runningLengthCount += textlength;
int nextcount = count + 1;
var nextTextlength = nextcount < collectionCount ?
textlist[nextcount].Length :
0;
if (runningLengthCount + nextTextlength < finalLength)
finaltext += eachwordformated;
}
return runningLengthCount > finalLength ? finaltext.Trim() + ".." : finaltext.Trim();
}

There is a good resource for parsing Twitter messages this link, worked for me:
How to Parse Twitter Usernames, Hashtags and URLs in C# 3.0
http://jes.al/2009/05/how-to-parse-twitter-usernames-hashtags-and-urls-in-c-30/
It contains support for:
Urls
#hashtags
#usernames
BTW: Regex in the ParseURL() method needs reviewing, it parses stock symbols (BARC.L) into links.

Related

Extracting data from CSV file (fusion table and kml workaround)

In Xamarin google maps for Android using C# you can create polygons like so based on this tutorial:
public void OnMapReady(GoogleMap googleMap)
{
mMap = googleMap;
PolylineOptions geometry = new PolylineOptions()
.Add(new LatLng(37.35, -37.0123))
.Add(new LatLng(37.35, -37.0123))
.Add(new LatLng(37.35, -37.0123));
Polyline polyline = mMap.AddPolyline(geometry);
}
However I have downloaded a CSV file from my Fusion Table Layer from google maps as I think this might be the easiest option to work with polygon/polyline data. The output looks like this:
description,name,label,geometry
,Highland,61,"<Polygon><outerBoundaryIs><LinearRing><coordinates>-5.657018,57.3352 -5.656396,57.334463 -5.655076,57.334556 -5.653439,57.334477 -5.652366,57.334724 -5.650064,57.334477 -5.648096,57.335082 -5.646846,57.335388 -5.644733,57.335539 -5.643309,57.335428 -5.641981,57.335448 -5.640451,57.33578 -5.633217,57.339118 -5.627278,57.338921 -5.617161,57.337649 -5.607948,57.341015 -5.595812,57.343583 -5.586043,57.345373 -5.583581,57.350648 -5.576851,57.353609 -5.570088,57.354017 -5.560732,57.354102 -5.555254,57.354033 -5.549713,57.353146 -5.547766,57.352275 -5.538932,57.352255 -5.525891,57.356217 -5.514888,57.361865 -5.504272,57.366027 -5.494515,57.374515 -5.469829,57.383765 -5.458661,57.389781 -5.453695,57.395033 -5.454057,57.402943 -5.449189,57.40731 -5.440583,57.411447 -5.436133,57.414616 -5.438312,57.415474 -5.438628,57.417955 -5.440956,57.417909 -5.444013,57.414976 -5.450778,57.421362 -5.455035,57.422333 -5.462081,57.420719 -5.468775,57.416975 -5.475205,57.41135 -5.475976,57.409117 -5.47705,57.407092 -5.478101,57.406056 -5.478901,57.40536 -5.479489,57.404534 -5.480051,57.403782 -5.481036,57.403107 -5.484538,57.402102 -5.485647,57.401856 -5.487358,57.401287 -5.488709,57.400962 -5.490175,57.400616 -5.491116,57.400176 -5.493832,57.399318 -5.495279,57.399134 -5.496726,57.39771 -5.498724,57.396836 -5.49974,57.396314 -5.501317,57.39627 -5.502869,57.395426</coordinates></LinearRing></innerBoundaryIs></Polygon>"
,Strathclyde,63,"<Polygon><outerBoundaryIs><LinearRing><coordinates>-5.603129,56.313564 -5.603163,56.312536 -5.603643,56.311794 -5.601467,56.311875 -5.601038,56.312481 -5.600697,56.313489 -5.60071,56.31535 -5.60159,56.316107 -5.600729,56.316598 -5.598625,56.316058 -5.596203,56.317477 -5.597024,56.318119 -5.596095,56.318739 -5.595432,56.320116 -5.589343,56.322469 -5.584888,56.325178 -5.582907,56.327169 -5.581414,56.327472 -5.581435,56.326663 -5.582355,56.325602 -5.581515,56.323891 -5.576993,56.331062 -5.57886,56.331475 -5.57676,56.334449 -5.572748,56.335689 -5.569012,56.338143 -5.564802,56.342113 -5.555237,56.346668 -5.551214,56.347448 -5.547651,56.346391 -5.54444,56.344945 -5.541247,56.345945 -5.539099,56.349674 -5.533874,56.34763 -5.525195,56.342888 -5.523518,56.345066 -5.52345,56.346605 -5.526417,56.354361 -5.535455,56.353681 -5.537463,56.35508 -5.536035,56.356271 -5.538923,56.357205 -5.53891,56.359336 -5.539952,56.361491 -5.538102,56.36372 -5.535934,56.36567 -5.53392,56.367705 -5.531369,56.369729 -5.529853,56.371022 -5.532371,56.371274 -5.534177,56.371708 -5.532846,56.373256 -5.529845,56.37496 -5.527675,56.375327 -5.528531,56.375995 -5.526732,56.376343 -5.525442,56.377809 -5.524739,56.379843 -5.526069,56.380561</coordinates></LinearRing></innerBoundaryIs></Polygon>"
I uploaded a KML file to Google Maps Fusion Table Layer, it then created the map. I then went File>Download>CSV and it gave me the above example.
I have added this csv file to my assets folder of my xamarin android google map app and my question would be because LatLng takes two doubles as its input, is there a way I could input the above data from the csv file into this method and if so how?
Not sure how to read the above csv and then extract the <coordinates> and then add those coordinates as new LatLng in the example code above?
If you notice however the coordinates are split into lat and lng and then the next latlng is seperated by a space -5.657018,57.3352 -5.656396,57.334463.
Sudo code (this may or may not require xamarin or android experience and may just require C#/Linq):
Read CSV var sr = new StreamReader(Read csv from Asset folder);
Remove description,name,label,geometry
Foreach line in CSV
Extract Item that contains double qoutes
Foreach Item Remove Qoutes and <Polygon><outerBoundaryIs><LinearRing><coordinates> from start and end
Foreach item seperated by a space Extract coordinates
(This will now leave a long list of 37.35,-37.0123 coordinates for each line)
Place in something like this maybe?:
public class Row
{
public double Lat { get; set; }
public double Lng { get; set; }
public Row(string str)
{
string[] separator = { "," };
var arr = str.Split(separator, StringSplitOptions.None);
Lat = Convert.ToDouble(arr[0]);
Lng = Convert.ToDouble(arr[1]);
}
}
private void OnMapReady()
var rows = new List<Row>();
Foreach name/new line
PolylineOptions geometry = new PolylineOptions()
ForEach (item in rows) //not sure how polyline options will take a foreach
.Add(New LatLng(item.Lat, item.Lng))
Polyline polyline = mMap.AddPolyline(geometry);
As there is no way of using Fusion Table Layers in Xamarin Android with Google Maps API v2 this may provide a quick and easier workaround for those that need to split maps into regions.

If I understand correctly, the question is how to parse the above CSV file.
Each line (except the first one with headers) can be represented with the following class:
class MapEntry
{
public string Description { get; set; }
public string Name { get; set; }
public string Label { get; set; }
public IEnumerable<LatLng> InnerCoordinates { get; set; }
public IEnumerable<LatLng> OuterCoordinates { get; set; }
}
Note the Inner and Outer coordinates. They are represented inside the XML by outerBoundaryIs (required) and innerBoundaryIs (optional) elements.
A side note: the Highland line in your post is incorrect - you seem to trimmed part of the line, leading to incorrect XML (<outerBoundaryIs>...</innerBoundaryIs>).
Here is the code that does the parsing:
static IEnumerable<MapEntry> ParseMap(string csvFile)
{
return from line in File.ReadLines(csvFile).Skip(1)
let tokens = line.Split(new[] { ',' }, 4)
let xmlToken = tokens[3]
let xmlText = xmlToken.Substring(1, xmlToken.Length - 2)
let xmlRoot = XElement.Parse(xmlText)
select new MapEntry
{
Description = tokens[0],
Name = tokens[1],
Label = tokens[2],
InnerCoordinates = GetCoordinates(xmlRoot.Element("innerBoundaryIs")),
OuterCoordinates = GetCoordinates(xmlRoot.Element("outerBoundaryIs")),
};
}
static IEnumerable<LatLng> GetCoordinates(XElement node)
{
if (node == null) return Enumerable.Empty<LatLng>();
var element = node.Element("LinearRing").Element("coordinates");
return from token in element.Value.Split(' ')
let values = token.Split(',')
select new LatLng(XmlConvert.ToDouble(values[0]), XmlConvert.ToDouble(values[1]));
}
I think the code is self explanatory. The only details to be mentioned are:
let tokens = line.Split(new[] { ',' }, 4)
Here we use the string.Split overload that allows us to specify the maximum number of substrings to return, thus avoiding the trap of processing the commas inside the XML token.
and:
let xmlText = xmlToken.Substring(1, xmlToken.Length - 2)
which strips the quotes from the XML token.
Finally, a sample usage for your case:
foreach (var entry in ParseMap(csv_file_full_path))
{
PolylineOptions geometry = new PolylineOptions()
foreach (var item in entry.OuterCoordinates)
geometry.Add(item)
Polyline polyline = mMap.AddPolyline(geometry);
}
UPDATE: To make Xamarin happy (as mentioned in the comments), replace the File.ReadLines call with a call to the following helper:
static IEnumerable<string> ReadLines(string path)
{
var sr = new StreamReader(Assets.Open(path));
try
{
string line;
while ((line = sr.ReadLine()) != null)
yield return line;
}
finally { sr.Dispose(); }
}

I'm not sure to understand your questions but I think that can help:
You have to read the csv file line by line and foreach line extract the third argument, remove the quote and read the result with a XMLReader for extract the line you want (coordinate). After you split the result on white space, you get the list of LatLng and you split each LatLng on ",", then you've got all the information you need.
System.IO.StreamReader file = new System.IO.StreamReader(#"file.csv");
string line = file.ReadLine(); //escape the first line
while((line = file.ReadLine()) != null)
{
var xmlString = line.Split({","})[2]; // or 3
**Update**
xmlString = xmlString.Replace("\"","")
using (XmlReader reader = XmlReader.Create(new StringReader(xmlString)))
{
reader.ReadToFollowing("coordinate");
string coordinate = reader.Value;
PolylineOptions geometry = new PolylineOptions();
var latLngs = coordinate.Split({' '});
foreach (var latLng in latLngs)
{
double lat = 0;
double lng = 0;
var arr = latLng.Split({','});
double.TryParse(arr[0], out lat);
double.TryParse(arr[1], out lng);
geometry.Add(new LatLng(lat, lng));
}
Polyline polyline = mMap.AddPolyline(geometry);
// Do something with your polyline
}
}
file.Close();

using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Xml.Linq;
namespace ConsoleApplication35
{
public class LatLng
{
public LatLng(double lat, double lng)
{
}
}
class Program
{
private static IEnumerable<LatLng> GetCoordinates(string polygon)
{
var xElement = XElement.Parse(polygon);
//var innerBoundaryCoordinates = xElement.Elements("innerBoundaryIs").FirstOrDefault()?.Value ?? "";
var outerBoundaryCoordinates = xElement.Elements("outerBoundaryIs").Single()?.Value ?? "";
return outerBoundaryCoordinates
.Split(' ')
.Select(latLng =>
{
var splits = latLng.Split(',');
var lat = double.Parse(splits[0], CultureInfo.InvariantCulture);
var lng = double.Parse(splits[1], CultureInfo.InvariantCulture);
return new LatLng(lat, lng);
});
}
static void Main()
{
const string header = "description,name,label,geometry";
var latLngs = File.ReadLines("file.csv")
.SelectMany(x => x.Split(new[] { header }, StringSplitOptions.RemoveEmptyEntries)) //all geometry`s in one array
.Select(x => x.Split('"'))
.SelectMany(x => GetCoordinates(x[1]))
.ToArray();
}
}
}

Can't get Regex group value

The string I'm trying to parse from looks something like this:
{"F4q6i9xe":{"Hhgi79M1":"cTZ3W2JG","j0Uszek2":"0"},"a3vSYuq2":{"Kn51uR4Y":"c6QUklQzv+kRLnmvS0zsheqDsRCXbWFVpYhq2rsElKyFQnla01E6P3qQ26b+xWSrscFJCi2qsh6WjJKSW5FN9EwxRAWc3rfyToaEhRngI2WPu3W/b1/hkkS2tEEk7LEpS2ItxLYKjEQGneO5E9rcGzbtfSOikdIjhxpD5m9HsKayo2ZSc2EYd/cN9yfYrNfLOCx+xeGqGcPmmImAzOZM0Q1IXIDQZ5r70vUS1aUOMOFnN1fVxmQ8ISofEKNEpHFfwWW9QUl9eVDgpQ2HES13s20kvVH2FOlE5uahIJBnyTLWYzViAWYyX13VK2PgrQcZ0oLuV84bSbjHGZf+JAjvImuyYhkKDhtTWAGkQ8LZ6r07duo71Gbz3glOUZZyz+FFiH5KrwafBpzGhRlI8El6lLZASN9Z5iZNTKs+sOti1fzsuOBzUXEjFURJGa93GMqPC+OyTw6YxKvgapz7go1XL7EAf50UXpMHd9xAJzsuIb6/lB/6v+XjD70wc74D2OosxtR6DIbQj3gbaBUBABQsJHQZLNnHWqikh+HLCASVbkqw6YPm5NecGaOQxonDkh3ZVQSF15WCEsgNdoWG94OuLF8DSw1t1Kt8sMFkvBJgB7LJS3pAw/Tcg/rvR5qwZ/n31rtOzaJgCdVc6XeIK6Rttj4KvNtgtTwkJIjb97FgIS0CXrR0UCp3BsuLwQg3CRTnjQqsjGgJeinLXX2lq6vNOkxRzgXlWnap96kheYj7sIE/IxbJJWEIInMbH9HU/w0bS0MtIgofpIQAZ/m6XKhu9LpPZBgsBf9KX5chUPeuRfs3MfHFzPYPlCCd2dkE8BHKOTmfS3okqhkzIZvFN3Ni+7enktFdgfpQ0toQbjhpn/TszP34pBaIy77me+GvePNO5bAFECLWSpGvsmW16rLCP0H+xIS/lNf9hK98jQqGU7eqpQdai7WFie2yN75Up7MrgBMp5w9Z5C7qpG2iYGiqynpqTCEnfQ3IZumbL+YvGTiuI2c320MGjKzOdO45MIU5fxUNZQSIfCSyIq5G/XGIeXCG3KETyKDZygXUTgEWbJWNTADE0AhXvP7HMtsuvstyuHvlTZGcfKS4oLnDPFiV1ndIV7+W74Ytv9bAdDIVl36xTzA2PV6waqXBfSPUCTx3jVrAeXcHGFjZtxbk3pFmuqPqgVcxeX/aDbK0NHkR6phQcFEREBjfdCOLAAbCkWiRF0JAA=="}}
And this is my regular expression:
Hhgi79M1":"(?<encodeKeyID>.*?)",.*Kn51uR4Y":"(?<encodedBody>.*?)"}
And this is the code I'm using in my C# application:
string responsePattern = "Hhgi79M1\":\"(?<encodeKeyID>.*?)\",.*Kn51uR4Y\":\"(?<encodedBody>.*?)\"}";
if (Regex.IsMatch(body, responsePattern))
{
var match = Regex.Match(body, responsePattern);
string encodeKeyID = match.Groups["encodeKeyID"].Value;
string encodedBody = match.Groups["encodedBody"].Value;
Now it works, but it doesn't get the value of "encodedBody". I tested my expression with the data on https://regex101.com/ and it seems to work fine on there. However, when getting the value in my program it's just an empty string.

I sense the issue is that your body string is not escaped properly, as your pattern works fine in the following code:
string body = "{\"F4q6i9xe\":{\"Hhgi79M1\":\"cTZ3W2JG\",\"j0Uszek2\":\"0\"},\"a3vSYuq2\":{\"Kn51uR4Y\":\"c6QUklQzv+kRLnmvS0zsheqDsRCXbWFVpYhq2rsElKyFQnla01E6P3qQ26b+xWSrscFJCi2qsh6WjJKSW5FN9EwxRAWc3rfyToaEhRngI2WPu3W/b1/hkkS2tEEk7LEpS2ItxLYKjEQGneO5E9rcGzbtfSOikdIjhxpD5m9HsKayo2ZSc2EYd/cN9yfYrNfLOCx+xeGqGcPmmImAzOZM0Q1IXIDQZ5r70vUS1aUOMOFnN1fVxmQ8ISofEKNEpHFfwWW9QUl9eVDgpQ2HES13s20kvVH2FOlE5uahIJBnyTLWYzViAWYyX13VK2PgrQcZ0oLuV84bSbjHGZf+JAjvImuyYhkKDhtTWAGkQ8LZ6r07duo71Gbz3glOUZZyz+FFiH5KrwafBpzGhRlI8El6lLZASN9Z5iZNTKs+sOti1fzsuOBzUXEjFURJGa93GMqPC+OyTw6YxKvgapz7go1XL7EAf50UXpMHd9xAJzsuIb6/lB/6v+XjD70wc74D2OosxtR6DIbQj3gbaBUBABQsJHQZLNnHWqikh+HLCASVbkqw6YPm5NecGaOQxonDkh3ZVQSF15WCEsgNdoWG94OuLF8DSw1t1Kt8sMFkvBJgB7LJS3pAw/Tcg/rvR5qwZ/n31rtOzaJgCdVc6XeIK6Rttj4KvNtgtTwkJIjb97FgIS0CXrR0UCp3BsuLwQg3CRTnjQqsjGgJeinLXX2lq6vNOkxRzgXlWnap96kheYj7sIE/IxbJJWEIInMbH9HU/w0bS0MtIgofpIQAZ/m6XKhu9LpPZBgsBf9KX5chUPeuRfs3MfHFzPYPlCCd2dkE8BHKOTmfS3okqhkzIZvFN3Ni+7enktFdgfpQ0toQbjhpn/TszP34pBaIy77me+GvePNO5bAFECLWSpGvsmW16rLCP0H+xIS/lNf9hK98jQqGU7eqpQdai7WFie2yN75Up7MrgBMp5w9Z5C7qpG2iYGiqynpqTCEnfQ3IZumbL+YvGTiuI2c320MGjKzOdO45MIU5fxUNZQSIfCSyIq5G/XGIeXCG3KETyKDZygXUTgEWbJWNTADE0AhXvP7HMtsuvstyuHvlTZGcfKS4oLnDPFiV1ndIV7+W74Ytv9bAdDIVl36xTzA2PV6waqXBfSPUCTx3jVrAeXcHGFjZtxbk3pFmuqPqgVcxeX/aDbK0NHkR6phQcFEREBjfdCOLAAbCkWiRF0JAA==\"}}\n" +
"";
string responsePattern = "Hhgi79M1\":\"(?<encodeKeyID>.*?)\",.*Kn51uR4Y\":\"(?<encodedBody>.*?)\"}";
if (Regex.IsMatch(body, responsePattern))
{
var match = Regex.Match(body, responsePattern);
string encodeKeyID = match.Groups["encodeKeyID"].Value;
string encodedBody = match.Groups["encodedBody"].Value;
string msg = String.Format("encodeKeyID: {0}\nencodedBody: {1}", encodeKeyID, encodedBody);
//show in message box
MessageBox.Show(msg, "Pattern Match Result");
}
Output:

Try it this way.
string sTarget = #"
{""F4q6i9xe"":{""Hhgi79M1"":""cTZ3W2JG"",""j0Uszek2"":""0""},""a3vSYuq2"":{""Kn51uR4Y"":""c6QUklQzv+kRLnmvS0zsheqDsRCXbWFVpYhq2rsElKyFQnla01E6P3qQ26b+xWSrscFJCi2qsh6WjJKSW5FN9EwxRAWc3rfyToaEhRngI2WPu3W\/b1\/hkkS2tEEk7LEpS2ItxLYKjEQGneO5E9rcGzbtfSOikdIjhxpD5m9HsKayo2ZSc2EYd\/cN9yfYrNfLOCx+xeGqGcPmmImAzOZM0Q1IXIDQZ5r70vUS1aUOMOFnN1fVxmQ8ISofEKNEpHFfwWW9QUl9eVDgpQ2HES13s20kvVH2FOlE5uahIJBnyTLWYzViAWYyX13VK2PgrQcZ0oLuV84bSbjHGZf+JAjvImuyYhkKDhtTWAGkQ8LZ6r07duo71Gbz3glOUZZyz+FFiH5KrwafBpzGhRlI8El6lLZASN9Z5iZNTKs+sOti1fzsuOBzUXEjFURJGa93GMqPC+OyTw6YxKvgapz7go1XL7EAf50UXpMHd9xAJzsuIb6\/lB\/6v+XjD70wc74D2OosxtR6DIbQj3gbaBUBABQsJHQZLNnHWqikh+HLCASVbkqw6YPm5NecGaOQxonDkh3ZVQSF15WCEsgNdoWG94OuLF8DSw1t1Kt8sMFkvBJgB7LJS3pAw\/Tcg\/rvR5qwZ\/n31rtOzaJgCdVc6XeIK6Rttj4KvNtgtTwkJIjb97FgIS0CXrR0UCp3BsuLwQg3CRTnjQqsjGgJeinLXX2lq6vNOkxRzgXlWnap96kheYj7sIE\/IxbJJWEIInMbH9HU\/w0bS0MtIgofpIQAZ\/m6XKhu9LpPZBgsBf9KX5chUPeuRfs3MfHFzPYPlCCd2dkE8BHKOTmfS3okqhkzIZvFN3Ni+7enktFdgfpQ0toQbjhpn\/TszP34pBaIy77me+GvePNO5bAFECLWSpGvsmW16rLCP0H+xIS\/lNf9hK98jQqGU7eqpQdai7WFie2yN75Up7MrgBMp5w9Z5C7qpG2iYGiqynpqTCEnfQ3IZumbL+YvGTiuI2c320MGjKzOdO45MIU5fxUNZQSIfCSyIq5G\/XGIeXCG3KETyKDZygXUTgEWbJWNTADE0AhXvP7HMtsuvstyuHvlTZGcfKS4oLnDPFiV1ndIV7+W74Ytv9bAdDIVl36xTzA2PV6waqXBfSPUCTx3jVrAeXcHGFjZtxbk3pFmuqPqgVcxeX\/aDbK0NHkR6phQcFEREBjfdCOLAAbCkWiRF0JAA==""}}
";
Regex responseRx = new Regex(#"Hhgi79M1"":""(?<encodeKeyID>.*?)"",.*Kn51uR4Y"":""(?<encodedBody>.*?)""}");
Match responseMatch = responseRx.Match(sTarget);
if (responseMatch.Success)
{
Console.WriteLine("ID = {0}", responseMatch.Groups["encodeKeyID"].Value);
Console.WriteLine("Body = {0}", responseMatch.Groups["encodedBody"].Value);
}
Output
ID = cTZ3W2JG
Body = c6QUklQzv+kRLnmvS0zsheqDsRCXbWFVpYhq2rsElKyFQnla01E6P3qQ26b+xWSrscFJCi2qs
h6WjJKSW5FN9EwxRAWc3rfyToaEhRngI2WPu3W\/b1\/hkkS2tEEk7LEpS2ItxLYKjEQGneO5E9rcGzb
tfSOikdIjhxpD5m9HsKayo2ZSc2EYd\/cN9yfYrNfLOCx+xeGqGcPmmImAzOZM0Q1IXIDQZ5r70vUS1a
UOMOFnN1fVxmQ8ISofEKNEpHFfwWW9QUl9eVDgpQ2HES13s20kvVH2FOlE5uahIJBnyTLWYzViAWYyX1
3VK2PgrQcZ0oLuV84bSbjHGZf+JAjvImuyYhkKDhtTWAGkQ8LZ6r07duo71Gbz3glOUZZyz+FFiH5Krw
afBpzGhRlI8El6lLZASN9Z5iZNTKs+sOti1fzsuOBzUXEjFURJGa93GMqPC+OyTw6YxKvgapz7go1XL7
EAf50UXpMHd9xAJzsuIb6\/lB\/6v+XjD70wc74D2OosxtR6DIbQj3gbaBUBABQsJHQZLNnHWqikh+HL
CASVbkqw6YPm5NecGaOQxonDkh3ZVQSF15WCEsgNdoWG94OuLF8DSw1t1Kt8sMFkvBJgB7LJS3pAw\/T
cg\/rvR5qwZ\/n31rtOzaJgCdVc6XeIK6Rttj4KvNtgtTwkJIjb97FgIS0CXrR0UCp3BsuLwQg3CRTnj
QqsjGgJeinLXX2lq6vNOkxRzgXlWnap96kheYj7sIE\/IxbJJWEIInMbH9HU\/w0bS0MtIgofpIQAZ\/
m6XKhu9LpPZBgsBf9KX5chUPeuRfs3MfHFzPYPlCCd2dkE8BHKOTmfS3okqhkzIZvFN3Ni+7enktFdgf
pQ0toQbjhpn\/TszP34pBaIy77me+GvePNO5bAFECLWSpGvsmW16rLCP0H+xIS\/lNf9hK98jQqGU7eq
pQdai7WFie2yN75Up7MrgBMp5w9Z5C7qpG2iYGiqynpqTCEnfQ3IZumbL+YvGTiuI2c320MGjKzOdO45
MIU5fxUNZQSIfCSyIq5G\/XGIeXCG3KETyKDZygXUTgEWbJWNTADE0AhXvP7HMtsuvstyuHvlTZGcfKS
4oLnDPFiV1ndIV7+W74Ytv9bAdDIVl36xTzA2PV6waqXBfSPUCTx3jVrAeXcHGFjZtxbk3pFmuqPqgVc
xeX\/aDbK0NHkR6phQcFEREBjfdCOLAAbCkWiRF0JAA==

How can I get all HTML attributes with GeckoFX/C#

In C# viaGeckoFx, I have not found a method to find all attributes of an element.
To do this, I made a JavaScript function. Here is my code
GeckoWebBrowser GeckoBrowser = ....;
GeckoNode NodeElement = ....; // HTML element where to find all HTML attributes
string JSresult = "";
string JStext = #"
function getElementAttributes(element)
{
var AttributesAssocArray = {};
for (var index = 0; index < element.attributes.length; ++index) { AttributesAssocArray[element.attributes[index].name] = element.attributes[index].value; };
return JSON.stringify(AttributesAssocArray);
}
getElementAttributes(this);
";
using (AutoJSContext JScontext = new AutoJSContext(GeckoBrowser.Window.JSContext)) { JScontext.EvaluateScript(JStext, (nsISupports)NodeElement.DomObject, out JSresult); }
Do you have others suggestions to achieve this in C# (with no Javascript)?

The property GeckoElement.Attributes allows access to an elements attributes.
So for example (this is untested and uncompiled code):
public string GetElementAttributes(GeckoElement element)
{
var result = new StringBuilder();
foreach(var a in element.Attributes)
{
result.Append(String.Format(" {0} = '{1}' ", a.NodeName, a.NodeValue));
}
return result.ToString();
}

C# HtmlDecode Specific tags only

I have a large htmlencoded string and i want decode only specific whitelisted html tags.
Is there a way to do this in c#, WebUtility.HtmlDecode() decodes everything.
`I am looking for an implementaiton of DecodeSpecificTags() that will pass below test.
[Test]
public void DecodeSpecificTags_SimpleInput_True()
{
string input = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
string output = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
List<string> whiteList = new List<string>(){ "strong","br" } ;
Assert.IsTrue(DecodeSpecificTags(whiteList,input) == output);
}`

You could do something like this
public string DecodeSpecificTags(List<string> whiteListedTagNames,string encodedInput)
{
String regex="";
foreach(string s in whiteListedTagNames)
{
regex="<"+#"\s*/?\s*"+s+".*?"+">";
encodedInput=Regex.Replace(encodedInput,regex);
}
return encodedInput;
}

A better approach could be to use some html parser like Agilitypack or csquery or Nsoup to find specific elements and decode it in a loop.
check this for links and examples of parsers
Check It, i did it using csquery :
string input = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
string output = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
var decoded = HttpUtility.HtmlDecode(output);
var encoded =input ; // HttpUtility.HtmlEncode(decoded);
Console.WriteLine(encoded);
Console.WriteLine(decoded);
var doc=CsQuery.CQ.CreateDocument(decoded);
var paras=doc.Select("strong").Union(doc.Select ("br")) ;
var tags=new List<KeyValuePair<string, string>>();
var counter=0;
foreach (var element in paras)
{
HttpUtility.HtmlEncode(element.OuterHTML).Dump();
var key ="---" + counter + "---";
var value= HttpUtility.HtmlDecode(element.OuterHTML);
var pair= new KeyValuePair<String,String>(key,value);
element.OuterHTML = key ;
tags.Add(pair);
counter++;
}
var finalstring= HttpUtility.HtmlEncode(doc.Document.Body.InnerHTML);
finalstring.Dump();
foreach (var element in tags)
{
finalstring=finalstring.Replace(element.Key,element.Value);
}
Console.WriteLine(finalstring);

Or you could use HtmlAgility with a black list or white list based on your requirement. I'm using black listed approach.
My black listed tag is store in a text file, for example "script|img"
public static string DecodeSpecificTags(this string content, List<string> blackListedTags)
{
if (string.IsNullOrEmpty(content))
{
return content;
}
blackListedTags = blackListedTags.Select(t => t.ToLowerInvariant()).ToList();
var decodedContent = HttpUtility.HtmlDecode(content);
var document = new HtmlDocument();
document.LoadHtml(decodedContent);
decodedContent = blackListedTags.Select(blackListedTag => document.DocumentNode.Descendants(blackListedTag))
.Aggregate(decodedContent,
(current1, nodes) =>
nodes.Select(htmlNode => htmlNode.WriteTo())
.Aggregate(current1,
(current, nodeContent) =>
current.Replace(nodeContent, HttpUtility.HtmlEncode(nodeContent))));
return decodedContent;
}

C# String manipulation

I am working on an application that gets text from a text file on a page.
Example link: http://test.com/textfile.txt
This text file contains the following text:
1 Milk Stuff1.rar
2 Milk Stuff2.rar
3 Milk Stuff2-1.rar
4 Union Stuff3.rar
What I am trying to do is as follows, to remove everything from each line, except for "words" that start with 'Stuff' and ends with '.rar'.
The problem is, most of the simple solutions like using .Remove, .Split or .Replace end up failing. This is because, for example, formatting the string using spaces ends up returning this:
1
Milk
Stuff1.rar\n2
Milk
Stuff2.rar\n3
Milk
Stuff2-1.rar\n4
Union
Stuff3.rar\n
I bet it's not as hard as it looks, but I'd apreciate any help you can give me.
Ps: Just to be clear, this is what I want it to return:
Stuff1.rar
Stuff2.rar
Stuff2-1.rar
Stuff3.rar
I am currently working with this code:
client.HeadOnly = true;
string uri = "http://test.com/textfile.txt";
byte[] body = client.DownloadData(uri);
string type = client.ResponseHeaders["content-type"];
client.HeadOnly = false;
if (type.StartsWith(#"text/"))
{
string[] text = client.DownloadString(uri);
foreach (string word in text)
{
if (word.StartsWith("Patch") && word.EndsWith(".rar"))
{
listBox1.Items.Add(word.ToString());
}
}
}
This is obviously not working, but you get the idea.
Thank you in advance!

This should work:
using (var writer = File.CreateText("output.txt"))
{
foreach (string line in File.ReadAllLines("input.txt"))
{
var match = Regex.Match(line, "Stuff.*?\\.rar");
if (match.Success)
writer.WriteLine(match.Value);
}
}

I would be tempted to use a regular expression for this sort of thing.
Something like
Stuff[^\s]*.rar
will pull out just the text you require.
How about a function like:
public static IEnumerable<string> GetStuff(string fileName)
{
var regex = new Regex(#"Stuff[^\s]*.rar");
using (var reader = new StreamReader(fileName))
{
string line;
while ((line = reader.ReadLine()) != null)
{
var match = regex.Match(line);
if (match.Success)
{
yield return match.Value;
}
}
}
}

for(string line in text)
{
if(line.EndsWith(".rar"))
{
int index = line.LastIndexOf("Stuff");
if(index != -1)
{
listBox1.Items.Add(line.Substring(index));
}
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Formatting Twitter text (TweetText) with C# - c#

Related

Extracting data from CSV file (fusion table and kml workaround)

Can't get Regex group value

How can I get all HTML attributes with GeckoFX/C#

C# HtmlDecode Specific tags only

C# String manipulation

Categories

Resources