Find two strings using regex in any order - c#

For example I have an input: "Test your Internet connection bandwidth. Test your Internet connection bandwidth." (two times repeated) and I want to search for strings internet and bandwidth.
string keyword = tbSearch.Text //That holds value: "internet bandwidth"
string input = "Test your Internet connection bandwidth. Test your Internet connection bandwidth.";
Regex r = new Regex(keyword.Replace(' ', '|'), RegexOptions.IgnoreCase);
if (r.Matches(input).Count == siteKeyword.Split(' ').Length)
{
//Do something
}
This doesn't work cause it finds 2 "internet" and 2 "bandwidth", so it count 4 but the keyword length is 2. So what I can do?

var pattern = keyword.Split()
.Aggregate(new StringBuilder(),
(sb, s) => sb.AppendFormat(#"(?=.*\b{0}\b)", Regex.Escape(s)),
sb => sb.ToString());
if (Regex.IsMatch(input, pattern, RegexOptions.IgnoreCase))
{
// contains all keywords
}
First part is generating pattern from your keywords. If there is two keywords "internet bandwidth", then generated regex pattern will look like:
"(?=.*\binternet\b)(?=.*\bbandwidth\b)"
It will match following inputs:
"Test your Internet connection bandwidth."
"Test your Internet connection bandwidth. Test your Internet bandwidth."
Following inputs will not match (not all words contained):
"Test your Internet2 connection bandwidth bandwidth."
"Test your connection bandwidth."
Another option (verifying each keyword separately):
var allWordsContained = keyword.Split().All(word =>
Regex.IsMatch(input, String.Format(#"\b{0}\b", Regex.Escape(word)), RegexOptions.IgnoreCase));

Not sure what you are trying to do, but you could try something like this:
public bool allWordsContained(string input, string keyword)
{
bool result = true;
string[] words = keyword.Split(' ');
foreach (var word in words)
{
if (!input.Contains(word))
result = false;
}
return result;
}
public bool atLeastOneWordContained(string input, string keyword)
{
bool result = false;
string[] words = keyword.Split(' ');
foreach (var word in words)
{
if (input.Contains(word))
result = true;
}
return result;
}

Here is the solution. Clue is to get a list of results and make Distinct()...
string keyword = "internet bandwidth";
string input = "Test your Internet connection bandwidth. Test your Internet connection bandwidth.";
Regex r = new Regex(keyword.Replace(' ', '|'), RegexOptions.IgnoreCase);
MatchCollection mc = r.Matches(input);
List<string> res = new List<string>();
for (int i = 0; i < mc.Count;i++ )
{
res.Add(mc[i].Value);
}
if (res.Distinct().Count() == keyword.Split(' ').Length)
{
//Do something
}

Regex r = new Regex(keyword.Replace(' ', '|'), RegexOptions.IgnoreCase);
int distinctKeywordsFound = r.Matches(input)
.Cast<Match>()
.Select(m => m.Value)
.Distinct()
.Count();
if (distinctKeywordsFound == siteKeyword.Split(' ').Length)
{
//Do something
}

Related

Regex string but with several options

My string:
string str = "user:steo id:1 nickname|user:kevo id:2 nickname:kevo200|user:noko id:3 nickname";
Now I want to get the values out with Regex:
var reg = Regex.Matches(str, #"user:(.+?)\sid:(\d+)\s+nickname:(.+?)")
.Cast<Match>()
.Select(a => new
{
user = a.Groups[1].Value,
id = a.Groups[2].Value,
nickname = a.Groups[3].Value
})
.ToList();
foreach (var ca in reg)
{
Console.WriteLine($"{ca.user} id: {ca.id} nickname: {ca.nickname}");
}
I do not know how I can do it with regex that I can use nickname:(the nickname) I only want use the nickname if it has a nickname like nickname:kevo200 and noch nickname
I am not a 100% sure if this answers your question, but i fetched a list from the given input string via regex parsing and either return the nick when available or the username otherwise.
PS C:\WINDOWS\system32> scriptcs
> using System.Text.RegularExpressions;
> var regex = new Regex(#"\|?(?:user(?::?(?<user>\w+))\sid(?::?(?<id>\d*))\s?nickname(?::?(?<nick>\w+))?)");
> var matches = regex.Matches("user:steo id:1 nickname|user:kevo id:2 nickname:kevo200|user:noko id:3 nickname");
> matches.Cast<Match>().Select(m=>new {user=m.Groups["user"].Value,nick=m.Groups["nick"].Value}).Select(u=>string.IsNullOrWhiteSpace(u.nick)?u.user:u.nick);
[
"steo",
"kevo200",
"noko"
]
edit: regex designer: https://regexr.com/3uf8t
edit: improved version to accept escape sequences in nicknames
PS C:\WINDOWS\system32> scriptcs
> using System.Text.RegularExpressions;
> var regex = new Regex(#"\|?(?:user(?::(?<user>\w+))?\sid(?::(?<id>\d*))?\s?nickname(?::(?<nick>[\w\\]+))?)");
> var matches = regex.Matches("user:steo id:1 nickname|user:kevo id:2 nickname:kevo200|user:noko id:3 nickname|user:kevo id:2 nickname:kev\\so200");
> matches.Cast<Match>().Select(m=>new {user=m.Groups["user"].Value,nick=m.Groups["nick"].Value.Replace("\\s"," ")}).Select(u=>string.IsNullOrWhiteSpace(u.nick)?u.user:u.nick);
[
"steo",
"kevo200",
"noko",
"kev o200"
]
Try this: user:(.+?)\sid:(\d+)\s+nickname:*(.*?)(\||$).
At first I proposed this regex: user:(.+?)\sid:(\d+)\s+nickname:*(.*?)\|* – wrong, doesn't capture name because of lazy quantifier.
Then this regex expression: user:(.+?)\sid:(\d+)\s+nickname(:(.+?)|)(\||$) – this should match all the parts divided by '|' in your string and give nickname="" for empty nicknames. But in case Groups[4] is not defined (when nickname is not followed by ":") you'll need check on the value existence.
If it were up to me and the data you are processing is always pipe separated and in a constant order, I would probably just skip the regex and split the string into it's pieces using String.Split like this.
string str = "user:steo id:1 nickname|user:kevo id:2 nickname:kevo200|user:noko id:3 nickname";
var entries = str.Split('|');
foreach(var entry in entries)
{
var subs = entry.Split(' ');
var userName = subs[0].Split(':')[1];
var id = subs[1].Split(':')[1];
var tempNick = subs[2].Split(':');
var nick = tempNick.Length == 2 ? tempNick[1] : string.Empty;
Console.WriteLine(userName + " id:" + id + " nickname " + nick);
}
Without Regex:
static void GetInfo()
{
string input = #"user:steo id:1 nickname|user:kevo id:2 nickname:kevo200|user:noko id:3 nickname";
var users =
from info in input.Split('|')
let x = info.Split(" ")
let nick_split = x[2].Split(':')
let has_nick = nick_split.GetUpperBound(0) > 0
let z = new
{
User = x[0].Split(':')[1],
Id = x[1].Split(':')[1],
Nickname = has_nick ? nick_split[1] : String.Empty
}
select z;
foreach (var user in users)
{
Console.WriteLine($"user: {user.User}, id: {user.Id}, nickname: {user.Nickname}");
}
}

Working with Regex to get 2 strings out of a source code [duplicate]

I am using webrequest to download a source from a page and then I need to use Regex to grab the string and store it in a string:
U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..
also need:
bpvsid=nvnN2JFJqJc.&dcz=1
Both out of:
<td style="cursor:pointer;" class="" onclick="NewWindow('U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..', 'bpvsid=nvnN2JFJqJc.&dcz=1', 'bpvstage_edit', '1200', '800')" onmouseout="HideHover();"><img src="gfx/info.gif" alt="" tipwidth="450" ajaxtip="openajax.php?target=modules/bpv/bpvstage_hover_info.php&rid=&oid=&bpvsid=&bpvname=" /></td>
It keep giving me errors like not enough )'s?
Thanks in advance.
Current code, probably wrong in every way. Really new to this:
Regex rx = new Regex("(?<=class=\"\" onclick=\"NewWindow(').*(?=')");
longId = (rx.Match(textBox2.Text).Value);
textBox1.Text = longId;
var match = Regex.Match(s, #"onclick=""NewWindow\('([^']*)',\s*'([^']*)',.*");
if (match.Success)
{
string longId = match.Groups[1].Value;
string other = match.Groups[2].Value;
}
That will give you two groups with values:
U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..
bpvsid=nvnN2JFJqJc.&dcz=1
The regex NewWindow\('([^']*)', '([^']*) will match what you require. The two strings required will be in Groups[1] and Groups[2].
var match = Regex.Match(textBox2.Text, "NewWindow\('([^']*)', '([^']*)");
var id1 = match.Groups[1].Value;
var id2 = match.Groups[2].Value;
Note that you could also use simply string functions instead of a regex:
var s = "<td style=\"cursor:pointer;\" class=\"\" onclick=\"NewWindow('U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..', 'bpvsid=nvnN2JFJqJc.&dcz=1', 'bpvstage_edit', '1200', '800')\" onmouseout=\"HideHover();\"><img src=\"gfx/info.gif\" alt=\"\" tipwidth=\"450\" ajaxtip=\"openajax.php?target=modules/bpv/bpvstage_hover_info.php&rid=&oid=&bpvsid=&bpvname=\" /></td>";
var tmp = s.Substring(s.IndexOf("NewWindow('")).Split('\'');
var value1 = tmp[1]; // U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..
var value2 = tmp[3]; // bpvsid=nvnN2JFJqJc.&dcz=1
I would use HtmlAgilityPack to parse HTML, then this non-regex approach works:
string html = // get your html ...
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html); // doc.Load can also consume a response-stream directly
var result = Enumerable.Empty<string>();
var firstTD = doc.DocumentNode.SelectNodes("//td").FirstOrDefault();
if (firstTD != null)
{
if (firstTD.Attributes.Contains("onclick"))
{
string onclick = firstTD.Attributes["onclick"].Value;
int newWindowIndex = onclick.IndexOf("newWindow(", StringComparison.OrdinalIgnoreCase);
if (newWindowIndex >= 0)
{
string functionBody = onclick.Substring(newWindowIndex + "newWindow(".Length);
string[] tokens = functionBody.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
result = tokens.Take(2).Select(s => s.Trim(' ', '\''));
}
}
}

C# String manipulation

I am working on an application that gets text from a text file on a page.
Example link: http://test.com/textfile.txt
This text file contains the following text:
1 Milk Stuff1.rar
2 Milk Stuff2.rar
3 Milk Stuff2-1.rar
4 Union Stuff3.rar
What I am trying to do is as follows, to remove everything from each line, except for "words" that start with 'Stuff' and ends with '.rar'.
The problem is, most of the simple solutions like using .Remove, .Split or .Replace end up failing. This is because, for example, formatting the string using spaces ends up returning this:
1
Milk
Stuff1.rar\n2
Milk
Stuff2.rar\n3
Milk
Stuff2-1.rar\n4
Union
Stuff3.rar\n
I bet it's not as hard as it looks, but I'd apreciate any help you can give me.
Ps: Just to be clear, this is what I want it to return:
Stuff1.rar
Stuff2.rar
Stuff2-1.rar
Stuff3.rar
I am currently working with this code:
client.HeadOnly = true;
string uri = "http://test.com/textfile.txt";
byte[] body = client.DownloadData(uri);
string type = client.ResponseHeaders["content-type"];
client.HeadOnly = false;
if (type.StartsWith(#"text/"))
{
string[] text = client.DownloadString(uri);
foreach (string word in text)
{
if (word.StartsWith("Patch") && word.EndsWith(".rar"))
{
listBox1.Items.Add(word.ToString());
}
}
}
This is obviously not working, but you get the idea.
Thank you in advance!
This should work:
using (var writer = File.CreateText("output.txt"))
{
foreach (string line in File.ReadAllLines("input.txt"))
{
var match = Regex.Match(line, "Stuff.*?\\.rar");
if (match.Success)
writer.WriteLine(match.Value);
}
}
I would be tempted to use a regular expression for this sort of thing.
Something like
Stuff[^\s]*.rar
will pull out just the text you require.
How about a function like:
public static IEnumerable<string> GetStuff(string fileName)
{
var regex = new Regex(#"Stuff[^\s]*.rar");
using (var reader = new StreamReader(fileName))
{
string line;
while ((line = reader.ReadLine()) != null)
{
var match = regex.Match(line);
if (match.Success)
{
yield return match.Value;
}
}
}
}
for(string line in text)
{
if(line.EndsWith(".rar"))
{
int index = line.LastIndexOf("Stuff");
if(index != -1)
{
listBox1.Items.Add(line.Substring(index));
}
}
}

C#: What's an efficient way of parsing a string with one delimiter through ReadLine() of TextReader?

C#: What's an efficient way to parse a string with one delimiter for each ReadLine() of TextReader?
My objective is to load a list of proxies to ListView into two columns (Proxy|Port) reading from a .txt file. How would I go upon splitting each readline() into the proxy and port variables with the delimiter ":"?
This is what I've got so far,
public void loadProxies(string FilePath)
{
string Proxy; // example/temporary place holders
int Port; // updated at each readline() loop.
using (TextReader textReader = new StreamReader(FilePath))
{
string Line;
while ((Line = textReader.ReadLine()) != null)
{
// How would I go about directing which string to return whether
// what's to the left of the delimiter : or to the right?
//Proxy = Line.Split(':');
//Port = Line.Split(':');
// listview stuff done here (this part I'm familiar with already)
}
}
}
If not, is there a more efficient way to do this?
string [] parts = line.Split(':');
string proxy = parts[0];
string port = parts[1];
You could split them this way:
string line;
string[] tokens;
while ((Line = textReader.ReadLine()) != null)
{
tokens = line.Split(':');
proxy = tokens[0];
port = tokens[1];
// listview stuff done here (this part I'm familiar with already)
}
it's best practise to use small letter names for variables in C#, as the other ones are reserved for class / namespace names etc.
How about running a Regex on the whole file?
var parts=
Regex.Matches(input, #"(?<left>[^:]*):(?<right>.*)",RegexOptions.Multiline)
.Cast<Match>()
.Where(m=>m.Success)
.Select(m => new
{
left = m.Groups["left"],
right = m.Groups["right"]
});
foreach(var part in parts)
{
//part.left
//part.right
}
Or, if it's too big, why not Linqify the ReadLine operation with yielding method?
static IEnumerable<string> Lines(string filename)
{
using (var sr = new StreamReader(filename))
{
while (!sr.EndOfStream)
{
yield return sr.ReadLine();
}
}
}
And run it like so:
var parts=Lines(filename)
.Select(
line=>Regex.Match(input, #"(?<left>[^:]*):(?<right>.*)")
)
.Where(m=>m.Success)
.Select(m => new
{
left = m.Groups["left"],
right = m.Groups["right"]
});
foreach(var part in parts)
{
//part.left
//part.right
}
In terms of efficiency I expect you'd be hard-pressed to beat:
int index = line.IndexOf(':');
if (index < 0) throw new InvalidOperationException();
Proxy = line.Substring(0, index);
Port = int.Parse(line.Substring(index + 1));
This avoids the array construction / allocation associated with Split, and only looks as far as the first delimited. But I should stress that this is unlikely to be a genuine performance bottleneck unless the data volume is huge, so pretty-much any approach should be fine. In fact, perhaps the most important thing (I've been reminded by the comment below) is to suspend the UI while adding:
myListView.BeginUpdate();
try {
// TODO: add all the items here
} finally {
myListView.EndUpdate();
}
You might want to try something like this.
var items = File.ReadAllText(FilePath)
.Split(new[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries)
.Select(line => line.Split(':'))
.Select(pieces => new {
Proxy = pieces[0],
Port = int.Parse(pieces[1])
});
If you know that you won't have a stray newline at the end of the file you can do this.
var items = File.ReadAllLines(FilePath)
.Select(line => line.Split(':'))
.Select(pieces => new {
Proxy = pieces[0],
Port = Convert.ToInt32(pieces[1])
});

Formatting Twitter text (TweetText) with C#

Is there a better way to format text from Twitter to link the hyperlinks, username and hashtags? What I have is working but I know this could be done better. I am interested in alternative techniques. I am setting this up as a HTML Helper for ASP.NET MVC.
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Web;
using System.Web.Mvc;
namespace Acme.Mvc.Extensions
{
public static class MvcExtensions
{
const string ScreenNamePattern = #"#([A-Za-z0-9\-_&;]+)";
const string HashTagPattern = #"#([A-Za-z0-9\-_&;]+)";
const string HyperLinkPattern = #"(http://\S+)\s?";
public static string TweetText(this HtmlHelper helper, string text)
{
return FormatTweetText(text);
}
public static string FormatTweetText(string text)
{
string result = text;
if (result.Contains("http://"))
{
var links = new List<string>();
foreach (Match match in Regex.Matches(result, HyperLinkPattern))
{
var url = match.Groups[1].Value;
if (!links.Contains(url))
{
links.Add(url);
result = result.Replace(url, String.Format("{0}", url));
}
}
}
if (result.Contains("#"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, ScreenNamePattern))
{
var screenName = match.Groups[1].Value;
if (!names.Contains(screenName))
{
names.Add(screenName);
result = result.Replace("#" + screenName,
String.Format("#{0}", screenName));
}
}
}
if (result.Contains("#"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, HashTagPattern))
{
var hashTag = match.Groups[1].Value;
if (!names.Contains(hashTag))
{
names.Add(hashTag);
result = result.Replace("#" + hashTag,
String.Format("#{1}",
HttpUtility.UrlEncode("#" + hashTag), hashTag));
}
}
}
return result;
}
}
}
That is remarkably similar to the code I wrote that displays my Twitter status on my blog. The only further things I do that I do are
1) looking up #name and replacing it with Real Name;
2) multiple #name's in a row get commas, if they don't have them;
3) Tweets that start with #name(s) are formatted "To #name:".
I don't see any reason this can't be an effective way to parse a tweet - they are a very consistent format (good for regex) and in most situations the speed (milliseconds) is more than acceptable.
Edit:
Here is the code for my Tweet parser. It's a bit too long to put in a Stack Overflow answer. It takes a tweet like:
#user1 #user2 check out this cool link I got from #user3: http://url.com/page.htm#anchor #coollinks
And turns it into:
<span class="salutation">
To Real Name,
Real Name:
</span> check out this cool link I got from
<span class="salutation">
Real Name
</span>:
http://site.com/...
#coollinks
It also wraps all that markup in a little JavaScript:
document.getElementById('twitter').innerHTML = '{markup}';
This is so the tweet fetcher can run asynchronously as a JS and if Twitter is down or slow it won't affect my site's page load time.
I created helper method to shorten text to 140 chars with url included. You can set share length to 0 to exclude url from tweet.
public static string FormatTwitterText(this string text, string shareurl)
{
if (string.IsNullOrEmpty(text))
return string.Empty;
string finaltext = string.Empty;
string sharepath = string.Format("http://url.com/{0}", shareurl);
//list of all words, trimmed and new space removed
List<string> textlist = text.Split(' ').Select(txt => Regex.Replace(txt, #"\n", "").Trim())
.Where(formatedtxt => !string.IsNullOrEmpty(formatedtxt))
.ToList();
int extraChars = 3; //to account for the two dots ".."
int finalLength = 140 - sharepath.Length - extraChars;
int runningLengthCount = 0;
int collectionCount = textlist.Count;
int count = 0;
foreach (string eachwordformated in textlist
.Select(eachword => string.Format("{0} ", eachword)))
{
count++;
int textlength = eachwordformated.Length;
runningLengthCount += textlength;
int nextcount = count + 1;
var nextTextlength = nextcount < collectionCount ?
textlist[nextcount].Length :
0;
if (runningLengthCount + nextTextlength < finalLength)
finaltext += eachwordformated;
}
return runningLengthCount > finalLength ? finaltext.Trim() + ".." : finaltext.Trim();
}
There is a good resource for parsing Twitter messages this link, worked for me:
How to Parse Twitter Usernames, Hashtags and URLs in C# 3.0
http://jes.al/2009/05/how-to-parse-twitter-usernames-hashtags-and-urls-in-c-30/
It contains support for:
Urls
#hashtags
#usernames
BTW: Regex in the ParseURL() method needs reviewing, it parses stock symbols (BARC.L) into links.

Categories