Regex string but with several options

Regex string but with several options - c#

My string:
string str = "user:steo id:1 nickname|user:kevo id:2 nickname:kevo200|user:noko id:3 nickname";
Now I want to get the values out with Regex:
var reg = Regex.Matches(str, #"user:(.+?)\sid:(\d+)\s+nickname:(.+?)")
.Cast<Match>()
.Select(a => new
{
user = a.Groups[1].Value,
id = a.Groups[2].Value,
nickname = a.Groups[3].Value
})
.ToList();
foreach (var ca in reg)
{
Console.WriteLine($"{ca.user} id: {ca.id} nickname: {ca.nickname}");
}
I do not know how I can do it with regex that I can use nickname:(the nickname) I only want use the nickname if it has a nickname like nickname:kevo200 and noch nickname

I am not a 100% sure if this answers your question, but i fetched a list from the given input string via regex parsing and either return the nick when available or the username otherwise.
PS C:\WINDOWS\system32> scriptcs
> using System.Text.RegularExpressions;
> var regex = new Regex(#"\|?(?:user(?::?(?<user>\w+))\sid(?::?(?<id>\d*))\s?nickname(?::?(?<nick>\w+))?)");
> var matches = regex.Matches("user:steo id:1 nickname|user:kevo id:2 nickname:kevo200|user:noko id:3 nickname");
> matches.Cast<Match>().Select(m=>new {user=m.Groups["user"].Value,nick=m.Groups["nick"].Value}).Select(u=>string.IsNullOrWhiteSpace(u.nick)?u.user:u.nick);
[
"steo",
"kevo200",
"noko"
]
edit: regex designer: https://regexr.com/3uf8t
edit: improved version to accept escape sequences in nicknames
PS C:\WINDOWS\system32> scriptcs
> using System.Text.RegularExpressions;
> var regex = new Regex(#"\|?(?:user(?::(?<user>\w+))?\sid(?::(?<id>\d*))?\s?nickname(?::(?<nick>[\w\\]+))?)");
> var matches = regex.Matches("user:steo id:1 nickname|user:kevo id:2 nickname:kevo200|user:noko id:3 nickname|user:kevo id:2 nickname:kev\\so200");
> matches.Cast<Match>().Select(m=>new {user=m.Groups["user"].Value,nick=m.Groups["nick"].Value.Replace("\\s"," ")}).Select(u=>string.IsNullOrWhiteSpace(u.nick)?u.user:u.nick);
[
"steo",
"kevo200",
"noko",
"kev o200"
]

Try this: user:(.+?)\sid:(\d+)\s+nickname:*(.*?)(\||$).
At first I proposed this regex: user:(.+?)\sid:(\d+)\s+nickname:*(.*?)\|* – wrong, doesn't capture name because of lazy quantifier.
Then this regex expression: user:(.+?)\sid:(\d+)\s+nickname(:(.+?)|)(\||$) – this should match all the parts divided by '|' in your string and give nickname="" for empty nicknames. But in case Groups[4] is not defined (when nickname is not followed by ":") you'll need check on the value existence.

If it were up to me and the data you are processing is always pipe separated and in a constant order, I would probably just skip the regex and split the string into it's pieces using String.Split like this.
string str = "user:steo id:1 nickname|user:kevo id:2 nickname:kevo200|user:noko id:3 nickname";
var entries = str.Split('|');
foreach(var entry in entries)
{
var subs = entry.Split(' ');
var userName = subs[0].Split(':')[1];
var id = subs[1].Split(':')[1];
var tempNick = subs[2].Split(':');
var nick = tempNick.Length == 2 ? tempNick[1] : string.Empty;
Console.WriteLine(userName + " id:" + id + " nickname " + nick);
}

Without Regex:
static void GetInfo()
{
string input = #"user:steo id:1 nickname|user:kevo id:2 nickname:kevo200|user:noko id:3 nickname";
var users =
from info in input.Split('|')
let x = info.Split(" ")
let nick_split = x[2].Split(':')
let has_nick = nick_split.GetUpperBound(0) > 0
let z = new
{
User = x[0].Split(':')[1],
Id = x[1].Split(':')[1],
Nickname = has_nick ? nick_split[1] : String.Empty
}
select z;
foreach (var user in users)
{
Console.WriteLine($"user: {user.User}, id: {user.Id}, nickname: {user.Nickname}");
}
}

Related

How can I pick out a number from a string in C#

I have this string:
http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid=&q=1007040&a=2
How can I pick out the number between "q=" and "&amp" as an integer?
So in this case I want to get the number: 1007040

What you're actually doing is parsing a URI - so you can use the .Net library to do this properly as follows:
var str = "http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid=&q=1007040&a=2";
var uri = new Uri(str);
var query = uri.Query;
var dict = System.Web.HttpUtility.ParseQueryString(query);
Console.WriteLine(dict["amp;q"]); // Outputs 1007040
If you want the numeric string as an integer then you'd need to parse it:
int number = int.Parse(dict["amp;q"]);

Consider using regular expressions
String str = "http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid=&q=1007040&a=2";
Match match = Regex.Match(str, #"q=\d+&amp");
if (match.Success)
{
string resultStr = match.Value.Replace("q=", String.Empty).Replace("&amp", String.Empty);
int.TryParse(resultStr, out int result); // result = 1007040
}

Seems like you want a query parameter for a uri that's html encoded. You could do:
Uri uri = new Uri(HttpUtility.HtmlDecode("http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid=&q=1007040&a=2"));
string q = HttpUtility.ParseQueryString(uri.Query).Get("q");
int qint = int.Parse(q);

A regex approach using groups:
public int GetInt(string str)
{
var match = Regex.Match(str,#"q=(\d*)&amp");
return int.Parse(match.Groups[1].Value);
}
Absolutely no error checking in that!

How to filter values read from a text file

I want to get a couple of values from a textfile in C#. Example:
1.sex=male
1.name=barack
1.lastname=obama
1.age = 55
2.sex=female
2.name= kelly
2.lastname=clinton
2.age = 24
3.sex = male
3.firstname= mike
3.lastname= james
3.age= 19
I only want to get all the "name", "lastname" and ages from the textFile, not the "sex". How can I filter this? I have tried something like this, but it only shows 1 value.
var list = new List<string>();
var text = File.ReadAllLines(#"C:\Users\Jal\Desktop\Test.text");
foreach (var s in text)
{
if (s.Contains("Name"))
{
if (s.Contains("Name"))
{
var desc = s.Substring(s.IndexOf("=") + 1);
list.Add(desc);
ListView.Items.Add(desc);
}
}
}
I found this code on Stack Overflow, but it doesn't get all of the values I want.

var names = new List<string>();
var lastnames = new List<string>();
var text = File.ReadAllLines(#"C:\Users\Jal\Desktop\Test.text");
foreach (var s in text)
{
if (s.Contains("lastname"))
{
var lastname = s.Substring(s.IndexOf("=") + 1);
lastnames.Add(lastname);
continue;
}
if (s.Contains("name"))
{
var name = s.Substring(s.IndexOf("=") + 1);
names.Add(name);
continue;
}
}
And in same way you can add another properties.

s.Contains("Name") won't ever be true on this case because it's case-sensitive, and your string in the file is "name".
Try using s.Contains("name")
But you would be better off using a Regex for this kind of thing.

Can't get Regex group value

The string I'm trying to parse from looks something like this:
{"F4q6i9xe":{"Hhgi79M1":"cTZ3W2JG","j0Uszek2":"0"},"a3vSYuq2":{"Kn51uR4Y":"c6QUklQzv+kRLnmvS0zsheqDsRCXbWFVpYhq2rsElKyFQnla01E6P3qQ26b+xWSrscFJCi2qsh6WjJKSW5FN9EwxRAWc3rfyToaEhRngI2WPu3W/b1/hkkS2tEEk7LEpS2ItxLYKjEQGneO5E9rcGzbtfSOikdIjhxpD5m9HsKayo2ZSc2EYd/cN9yfYrNfLOCx+xeGqGcPmmImAzOZM0Q1IXIDQZ5r70vUS1aUOMOFnN1fVxmQ8ISofEKNEpHFfwWW9QUl9eVDgpQ2HES13s20kvVH2FOlE5uahIJBnyTLWYzViAWYyX13VK2PgrQcZ0oLuV84bSbjHGZf+JAjvImuyYhkKDhtTWAGkQ8LZ6r07duo71Gbz3glOUZZyz+FFiH5KrwafBpzGhRlI8El6lLZASN9Z5iZNTKs+sOti1fzsuOBzUXEjFURJGa93GMqPC+OyTw6YxKvgapz7go1XL7EAf50UXpMHd9xAJzsuIb6/lB/6v+XjD70wc74D2OosxtR6DIbQj3gbaBUBABQsJHQZLNnHWqikh+HLCASVbkqw6YPm5NecGaOQxonDkh3ZVQSF15WCEsgNdoWG94OuLF8DSw1t1Kt8sMFkvBJgB7LJS3pAw/Tcg/rvR5qwZ/n31rtOzaJgCdVc6XeIK6Rttj4KvNtgtTwkJIjb97FgIS0CXrR0UCp3BsuLwQg3CRTnjQqsjGgJeinLXX2lq6vNOkxRzgXlWnap96kheYj7sIE/IxbJJWEIInMbH9HU/w0bS0MtIgofpIQAZ/m6XKhu9LpPZBgsBf9KX5chUPeuRfs3MfHFzPYPlCCd2dkE8BHKOTmfS3okqhkzIZvFN3Ni+7enktFdgfpQ0toQbjhpn/TszP34pBaIy77me+GvePNO5bAFECLWSpGvsmW16rLCP0H+xIS/lNf9hK98jQqGU7eqpQdai7WFie2yN75Up7MrgBMp5w9Z5C7qpG2iYGiqynpqTCEnfQ3IZumbL+YvGTiuI2c320MGjKzOdO45MIU5fxUNZQSIfCSyIq5G/XGIeXCG3KETyKDZygXUTgEWbJWNTADE0AhXvP7HMtsuvstyuHvlTZGcfKS4oLnDPFiV1ndIV7+W74Ytv9bAdDIVl36xTzA2PV6waqXBfSPUCTx3jVrAeXcHGFjZtxbk3pFmuqPqgVcxeX/aDbK0NHkR6phQcFEREBjfdCOLAAbCkWiRF0JAA=="}}
And this is my regular expression:
Hhgi79M1":"(?<encodeKeyID>.*?)",.*Kn51uR4Y":"(?<encodedBody>.*?)"}
And this is the code I'm using in my C# application:
string responsePattern = "Hhgi79M1\":\"(?<encodeKeyID>.*?)\",.*Kn51uR4Y\":\"(?<encodedBody>.*?)\"}";
if (Regex.IsMatch(body, responsePattern))
{
var match = Regex.Match(body, responsePattern);
string encodeKeyID = match.Groups["encodeKeyID"].Value;
string encodedBody = match.Groups["encodedBody"].Value;
Now it works, but it doesn't get the value of "encodedBody". I tested my expression with the data on https://regex101.com/ and it seems to work fine on there. However, when getting the value in my program it's just an empty string.

I sense the issue is that your body string is not escaped properly, as your pattern works fine in the following code:
string body = "{\"F4q6i9xe\":{\"Hhgi79M1\":\"cTZ3W2JG\",\"j0Uszek2\":\"0\"},\"a3vSYuq2\":{\"Kn51uR4Y\":\"c6QUklQzv+kRLnmvS0zsheqDsRCXbWFVpYhq2rsElKyFQnla01E6P3qQ26b+xWSrscFJCi2qsh6WjJKSW5FN9EwxRAWc3rfyToaEhRngI2WPu3W/b1/hkkS2tEEk7LEpS2ItxLYKjEQGneO5E9rcGzbtfSOikdIjhxpD5m9HsKayo2ZSc2EYd/cN9yfYrNfLOCx+xeGqGcPmmImAzOZM0Q1IXIDQZ5r70vUS1aUOMOFnN1fVxmQ8ISofEKNEpHFfwWW9QUl9eVDgpQ2HES13s20kvVH2FOlE5uahIJBnyTLWYzViAWYyX13VK2PgrQcZ0oLuV84bSbjHGZf+JAjvImuyYhkKDhtTWAGkQ8LZ6r07duo71Gbz3glOUZZyz+FFiH5KrwafBpzGhRlI8El6lLZASN9Z5iZNTKs+sOti1fzsuOBzUXEjFURJGa93GMqPC+OyTw6YxKvgapz7go1XL7EAf50UXpMHd9xAJzsuIb6/lB/6v+XjD70wc74D2OosxtR6DIbQj3gbaBUBABQsJHQZLNnHWqikh+HLCASVbkqw6YPm5NecGaOQxonDkh3ZVQSF15WCEsgNdoWG94OuLF8DSw1t1Kt8sMFkvBJgB7LJS3pAw/Tcg/rvR5qwZ/n31rtOzaJgCdVc6XeIK6Rttj4KvNtgtTwkJIjb97FgIS0CXrR0UCp3BsuLwQg3CRTnjQqsjGgJeinLXX2lq6vNOkxRzgXlWnap96kheYj7sIE/IxbJJWEIInMbH9HU/w0bS0MtIgofpIQAZ/m6XKhu9LpPZBgsBf9KX5chUPeuRfs3MfHFzPYPlCCd2dkE8BHKOTmfS3okqhkzIZvFN3Ni+7enktFdgfpQ0toQbjhpn/TszP34pBaIy77me+GvePNO5bAFECLWSpGvsmW16rLCP0H+xIS/lNf9hK98jQqGU7eqpQdai7WFie2yN75Up7MrgBMp5w9Z5C7qpG2iYGiqynpqTCEnfQ3IZumbL+YvGTiuI2c320MGjKzOdO45MIU5fxUNZQSIfCSyIq5G/XGIeXCG3KETyKDZygXUTgEWbJWNTADE0AhXvP7HMtsuvstyuHvlTZGcfKS4oLnDPFiV1ndIV7+W74Ytv9bAdDIVl36xTzA2PV6waqXBfSPUCTx3jVrAeXcHGFjZtxbk3pFmuqPqgVcxeX/aDbK0NHkR6phQcFEREBjfdCOLAAbCkWiRF0JAA==\"}}\n" +
"";
string responsePattern = "Hhgi79M1\":\"(?<encodeKeyID>.*?)\",.*Kn51uR4Y\":\"(?<encodedBody>.*?)\"}";
if (Regex.IsMatch(body, responsePattern))
{
var match = Regex.Match(body, responsePattern);
string encodeKeyID = match.Groups["encodeKeyID"].Value;
string encodedBody = match.Groups["encodedBody"].Value;
string msg = String.Format("encodeKeyID: {0}\nencodedBody: {1}", encodeKeyID, encodedBody);
//show in message box
MessageBox.Show(msg, "Pattern Match Result");
}
Output:

Try it this way.
string sTarget = #"
{""F4q6i9xe"":{""Hhgi79M1"":""cTZ3W2JG"",""j0Uszek2"":""0""},""a3vSYuq2"":{""Kn51uR4Y"":""c6QUklQzv+kRLnmvS0zsheqDsRCXbWFVpYhq2rsElKyFQnla01E6P3qQ26b+xWSrscFJCi2qsh6WjJKSW5FN9EwxRAWc3rfyToaEhRngI2WPu3W\/b1\/hkkS2tEEk7LEpS2ItxLYKjEQGneO5E9rcGzbtfSOikdIjhxpD5m9HsKayo2ZSc2EYd\/cN9yfYrNfLOCx+xeGqGcPmmImAzOZM0Q1IXIDQZ5r70vUS1aUOMOFnN1fVxmQ8ISofEKNEpHFfwWW9QUl9eVDgpQ2HES13s20kvVH2FOlE5uahIJBnyTLWYzViAWYyX13VK2PgrQcZ0oLuV84bSbjHGZf+JAjvImuyYhkKDhtTWAGkQ8LZ6r07duo71Gbz3glOUZZyz+FFiH5KrwafBpzGhRlI8El6lLZASN9Z5iZNTKs+sOti1fzsuOBzUXEjFURJGa93GMqPC+OyTw6YxKvgapz7go1XL7EAf50UXpMHd9xAJzsuIb6\/lB\/6v+XjD70wc74D2OosxtR6DIbQj3gbaBUBABQsJHQZLNnHWqikh+HLCASVbkqw6YPm5NecGaOQxonDkh3ZVQSF15WCEsgNdoWG94OuLF8DSw1t1Kt8sMFkvBJgB7LJS3pAw\/Tcg\/rvR5qwZ\/n31rtOzaJgCdVc6XeIK6Rttj4KvNtgtTwkJIjb97FgIS0CXrR0UCp3BsuLwQg3CRTnjQqsjGgJeinLXX2lq6vNOkxRzgXlWnap96kheYj7sIE\/IxbJJWEIInMbH9HU\/w0bS0MtIgofpIQAZ\/m6XKhu9LpPZBgsBf9KX5chUPeuRfs3MfHFzPYPlCCd2dkE8BHKOTmfS3okqhkzIZvFN3Ni+7enktFdgfpQ0toQbjhpn\/TszP34pBaIy77me+GvePNO5bAFECLWSpGvsmW16rLCP0H+xIS\/lNf9hK98jQqGU7eqpQdai7WFie2yN75Up7MrgBMp5w9Z5C7qpG2iYGiqynpqTCEnfQ3IZumbL+YvGTiuI2c320MGjKzOdO45MIU5fxUNZQSIfCSyIq5G\/XGIeXCG3KETyKDZygXUTgEWbJWNTADE0AhXvP7HMtsuvstyuHvlTZGcfKS4oLnDPFiV1ndIV7+W74Ytv9bAdDIVl36xTzA2PV6waqXBfSPUCTx3jVrAeXcHGFjZtxbk3pFmuqPqgVcxeX\/aDbK0NHkR6phQcFEREBjfdCOLAAbCkWiRF0JAA==""}}
";
Regex responseRx = new Regex(#"Hhgi79M1"":""(?<encodeKeyID>.*?)"",.*Kn51uR4Y"":""(?<encodedBody>.*?)""}");
Match responseMatch = responseRx.Match(sTarget);
if (responseMatch.Success)
{
Console.WriteLine("ID = {0}", responseMatch.Groups["encodeKeyID"].Value);
Console.WriteLine("Body = {0}", responseMatch.Groups["encodedBody"].Value);
}
Output
ID = cTZ3W2JG
Body = c6QUklQzv+kRLnmvS0zsheqDsRCXbWFVpYhq2rsElKyFQnla01E6P3qQ26b+xWSrscFJCi2qs
h6WjJKSW5FN9EwxRAWc3rfyToaEhRngI2WPu3W\/b1\/hkkS2tEEk7LEpS2ItxLYKjEQGneO5E9rcGzb
tfSOikdIjhxpD5m9HsKayo2ZSc2EYd\/cN9yfYrNfLOCx+xeGqGcPmmImAzOZM0Q1IXIDQZ5r70vUS1a
UOMOFnN1fVxmQ8ISofEKNEpHFfwWW9QUl9eVDgpQ2HES13s20kvVH2FOlE5uahIJBnyTLWYzViAWYyX1
3VK2PgrQcZ0oLuV84bSbjHGZf+JAjvImuyYhkKDhtTWAGkQ8LZ6r07duo71Gbz3glOUZZyz+FFiH5Krw
afBpzGhRlI8El6lLZASN9Z5iZNTKs+sOti1fzsuOBzUXEjFURJGa93GMqPC+OyTw6YxKvgapz7go1XL7
EAf50UXpMHd9xAJzsuIb6\/lB\/6v+XjD70wc74D2OosxtR6DIbQj3gbaBUBABQsJHQZLNnHWqikh+HL
CASVbkqw6YPm5NecGaOQxonDkh3ZVQSF15WCEsgNdoWG94OuLF8DSw1t1Kt8sMFkvBJgB7LJS3pAw\/T
cg\/rvR5qwZ\/n31rtOzaJgCdVc6XeIK6Rttj4KvNtgtTwkJIjb97FgIS0CXrR0UCp3BsuLwQg3CRTnj
QqsjGgJeinLXX2lq6vNOkxRzgXlWnap96kheYj7sIE\/IxbJJWEIInMbH9HU\/w0bS0MtIgofpIQAZ\/
m6XKhu9LpPZBgsBf9KX5chUPeuRfs3MfHFzPYPlCCd2dkE8BHKOTmfS3okqhkzIZvFN3Ni+7enktFdgf
pQ0toQbjhpn\/TszP34pBaIy77me+GvePNO5bAFECLWSpGvsmW16rLCP0H+xIS\/lNf9hK98jQqGU7eq
pQdai7WFie2yN75Up7MrgBMp5w9Z5C7qpG2iYGiqynpqTCEnfQ3IZumbL+YvGTiuI2c320MGjKzOdO45
MIU5fxUNZQSIfCSyIq5G\/XGIeXCG3KETyKDZygXUTgEWbJWNTADE0AhXvP7HMtsuvstyuHvlTZGcfKS
4oLnDPFiV1ndIV7+W74Ytv9bAdDIVl36xTzA2PV6waqXBfSPUCTx3jVrAeXcHGFjZtxbk3pFmuqPqgVc
xeX\/aDbK0NHkR6phQcFEREBjfdCOLAAbCkWiRF0JAA==

Working with Regex to get 2 strings out of a source code [duplicate]

I am using webrequest to download a source from a page and then I need to use Regex to grab the string and store it in a string:
U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..
also need:
bpvsid=nvnN2JFJqJc.&dcz=1
Both out of:
<td style="cursor:pointer;" class="" onclick="NewWindow('U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..', 'bpvsid=nvnN2JFJqJc.&dcz=1', 'bpvstage_edit', '1200', '800')" onmouseout="HideHover();"><img src="gfx/info.gif" alt="" tipwidth="450" ajaxtip="openajax.php?target=modules/bpv/bpvstage_hover_info.php&rid=&oid=&bpvsid=&bpvname=" /></td>
It keep giving me errors like not enough )'s?
Thanks in advance.
Current code, probably wrong in every way. Really new to this:
Regex rx = new Regex("(?<=class=\"\" onclick=\"NewWindow(').*(?=')");
longId = (rx.Match(textBox2.Text).Value);
textBox1.Text = longId;

var match = Regex.Match(s, #"onclick=""NewWindow\('([^']*)',\s*'([^']*)',.*");
if (match.Success)
{
string longId = match.Groups[1].Value;
string other = match.Groups[2].Value;
}
That will give you two groups with values:
U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..
bpvsid=nvnN2JFJqJc.&dcz=1

The regex NewWindow\('([^']*)', '([^']*) will match what you require. The two strings required will be in Groups[1] and Groups[2].
var match = Regex.Match(textBox2.Text, "NewWindow\('([^']*)', '([^']*)");
var id1 = match.Groups[1].Value;
var id2 = match.Groups[2].Value;

Note that you could also use simply string functions instead of a regex:
var s = "<td style=\"cursor:pointer;\" class=\"\" onclick=\"NewWindow('U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..', 'bpvsid=nvnN2JFJqJc.&dcz=1', 'bpvstage_edit', '1200', '800')\" onmouseout=\"HideHover();\"><img src=\"gfx/info.gif\" alt=\"\" tipwidth=\"450\" ajaxtip=\"openajax.php?target=modules/bpv/bpvstage_hover_info.php&rid=&oid=&bpvsid=&bpvname=\" /></td>";
var tmp = s.Substring(s.IndexOf("NewWindow('")).Split('\'');
var value1 = tmp[1]; // U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..
var value2 = tmp[3]; // bpvsid=nvnN2JFJqJc.&dcz=1

I would use HtmlAgilityPack to parse HTML, then this non-regex approach works:
string html = // get your html ...
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html); // doc.Load can also consume a response-stream directly
var result = Enumerable.Empty<string>();
var firstTD = doc.DocumentNode.SelectNodes("//td").FirstOrDefault();
if (firstTD != null)
{
if (firstTD.Attributes.Contains("onclick"))
{
string onclick = firstTD.Attributes["onclick"].Value;
int newWindowIndex = onclick.IndexOf("newWindow(", StringComparison.OrdinalIgnoreCase);
if (newWindowIndex >= 0)
{
string functionBody = onclick.Substring(newWindowIndex + "newWindow(".Length);
string[] tokens = functionBody.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
result = tokens.Take(2).Select(s => s.Trim(' ', '\''));
}
}
}

Formatting Twitter text (TweetText) with C#

Is there a better way to format text from Twitter to link the hyperlinks, username and hashtags? What I have is working but I know this could be done better. I am interested in alternative techniques. I am setting this up as a HTML Helper for ASP.NET MVC.
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Web;
using System.Web.Mvc;
namespace Acme.Mvc.Extensions
{
public static class MvcExtensions
{
const string ScreenNamePattern = #"#([A-Za-z0-9\-_&;]+)";
const string HashTagPattern = #"#([A-Za-z0-9\-_&;]+)";
const string HyperLinkPattern = #"(http://\S+)\s?";
public static string TweetText(this HtmlHelper helper, string text)
{
return FormatTweetText(text);
}
public static string FormatTweetText(string text)
{
string result = text;
if (result.Contains("http://"))
{
var links = new List<string>();
foreach (Match match in Regex.Matches(result, HyperLinkPattern))
{
var url = match.Groups[1].Value;
if (!links.Contains(url))
{
links.Add(url);
result = result.Replace(url, String.Format("{0}", url));
}
}
}
if (result.Contains("#"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, ScreenNamePattern))
{
var screenName = match.Groups[1].Value;
if (!names.Contains(screenName))
{
names.Add(screenName);
result = result.Replace("#" + screenName,
String.Format("#{0}", screenName));
}
}
}
if (result.Contains("#"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, HashTagPattern))
{
var hashTag = match.Groups[1].Value;
if (!names.Contains(hashTag))
{
names.Add(hashTag);
result = result.Replace("#" + hashTag,
String.Format("#{1}",
HttpUtility.UrlEncode("#" + hashTag), hashTag));
}
}
}
return result;
}
}
}

That is remarkably similar to the code I wrote that displays my Twitter status on my blog. The only further things I do that I do are
1) looking up #name and replacing it with Real Name;
2) multiple #name's in a row get commas, if they don't have them;
3) Tweets that start with #name(s) are formatted "To #name:".
I don't see any reason this can't be an effective way to parse a tweet - they are a very consistent format (good for regex) and in most situations the speed (milliseconds) is more than acceptable.
Edit:
Here is the code for my Tweet parser. It's a bit too long to put in a Stack Overflow answer. It takes a tweet like:
#user1 #user2 check out this cool link I got from #user3: http://url.com/page.htm#anchor #coollinks
And turns it into:
<span class="salutation">
To Real Name,
Real Name:
</span> check out this cool link I got from
<span class="salutation">
Real Name
</span>:
http://site.com/...
#coollinks
It also wraps all that markup in a little JavaScript:
document.getElementById('twitter').innerHTML = '{markup}';
This is so the tweet fetcher can run asynchronously as a JS and if Twitter is down or slow it won't affect my site's page load time.

I created helper method to shorten text to 140 chars with url included. You can set share length to 0 to exclude url from tweet.
public static string FormatTwitterText(this string text, string shareurl)
{
if (string.IsNullOrEmpty(text))
return string.Empty;
string finaltext = string.Empty;
string sharepath = string.Format("http://url.com/{0}", shareurl);
//list of all words, trimmed and new space removed
List<string> textlist = text.Split(' ').Select(txt => Regex.Replace(txt, #"\n", "").Trim())
.Where(formatedtxt => !string.IsNullOrEmpty(formatedtxt))
.ToList();
int extraChars = 3; //to account for the two dots ".."
int finalLength = 140 - sharepath.Length - extraChars;
int runningLengthCount = 0;
int collectionCount = textlist.Count;
int count = 0;
foreach (string eachwordformated in textlist
.Select(eachword => string.Format("{0} ", eachword)))
{
count++;
int textlength = eachwordformated.Length;
runningLengthCount += textlength;
int nextcount = count + 1;
var nextTextlength = nextcount < collectionCount ?
textlist[nextcount].Length :
0;
if (runningLengthCount + nextTextlength < finalLength)
finaltext += eachwordformated;
}
return runningLengthCount > finalLength ? finaltext.Trim() + ".." : finaltext.Trim();
}

There is a good resource for parsing Twitter messages this link, worked for me:
How to Parse Twitter Usernames, Hashtags and URLs in C# 3.0
http://jes.al/2009/05/how-to-parse-twitter-usernames-hashtags-and-urls-in-c-30/
It contains support for:
Urls
#hashtags
#usernames
BTW: Regex in the ParseURL() method needs reviewing, it parses stock symbols (BARC.L) into links.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex string but with several options - c#

Related

How can I pick out a number from a string in C#

How to filter values read from a text file

Can't get Regex group value

Working with Regex to get 2 strings out of a source code [duplicate]

Formatting Twitter text (TweetText) with C#

Categories

Resources