Extract tokens from string [closed]

Extract tokens from string [closed] - c#

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a html file, with unknown ammount of tokens. The keywords will be assigned to some data later by the user. I want to determine how much token does the html contain.
Tokens can look like : ¤SomeID¤ or ¤Name¤ or even ¤SomeLongerWord¤.
Can somebody give me a complete code with regex, that would collect the tokens into a list from a string?
Example:
string ExtractFromThis = "Hello ¤Name¤, do you speak ¤SomeLanguage¤?"
List<string> IldLikeToHave = Magic(ExtractFromThis);
//IldLikeToHave should contain {"¤Name¤", "¤SomeLanguage¤"}
Thank you!

You could use a simple regular expression such as ¤.*?¤ (notice the non-greedy star) matching anything enclosed in ¤. Here's a sample. You could use Regex.Matches() to get all the matches.
If you're interested in getting the text inside the delimiters, you could as well put the quantifier inside a capture group like this ¤(.*?)¤, and use Match.Groups() to get the capture groups of every match.
I don't do C#, but here's a sample of what it should probably look like:
string pattern = #"¤(.*?)¤";
string input = "Hello ¤Name¤, do you speak ¤SomeLanguage¤?";
MatchCollection matches = Regex.Matches(input, pattern);
List<string> l = new List<string>();
foreach (Match match in matches) {
l.add(match.Groups[1].Value);
}

Related

How do I get multiple substrings in a large string in C#? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I've got a large string. It has multiple substrings in it like this:
pin=1234&
pin=827373&
pin=110&
What is the most efficient way to pull out every number between the pin= and the & symbol and store it in an array, starting from the beginning of the string to the end?

var pins = Regex.Matches(html, #"pin=([0-9]*)");
var pinArray = (from Match pin in pins select Convert.ToInt32(pin.Groups[1].Value)).ToArray();
Enjoy :)

I'm not sure if this is the * most * efficient on the earth, but you can split by pin= or &. Using the simple String methods would be the way to go... splits, for's + indexOf.
But It would require a increasing complex (if`s, successive splits) depending of the struct and complexity of your large string.
I'd still using RegEx for reliability and simplicity:
Filter string for what really matters: pin=\d+
Then extract the value: RegEx(\d), or better, value.Replace("pin=", "").Replace("&", "")
I can't tell more without knowing the struct of your large string.
I've seen now in comments that your large string has HTML content. I would take RegEx approach. I believe it wouldn't be the bottle neck performance issue, even for large strings.

You can extract multiple pattern matches with RegEx:
string pattern = #"pin=(\d+)&";
Regex rgx = new Regex(pattern);
foreach (Match match in rgx.Matches(sentence))
Console.WriteLine(match.Groups[1]);

how to parse a search query string like SO [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I want to build a searching function with keywords format on Entity Framework.
void funcSearch(string keywork)
{
if (keywork == "[tag]")
{
//regex for is tag
//do search tag
}
if (keywork == "user:1234")
{
//regex for userid is 1234
//do search user with 1234
}
...
}
Can i use regex to parse a query string format like SO, or any method? a function to to be able to analyze all of the cases with corresponding keyword?
tags [tag]
exact "words here"
author user:1234
user:me (yours)
score score:3 (3+)
score:0 (none)
answers answers:3 (3+)
answers:0 (none)
isaccepted:yes
hasaccepted:no
inquestion:1234
views views:250
sections title:apples
body:"apples oranges"
url url:"*.example.com"
favorites infavorites:mine
infavorites:1234
status closed:yes
duplicate:no
migrated:no
wiki:no
types is:question
is:answer
thank you for advice.

Yes, you can. You'd have to create a list of regular expressions to check and loop through them until you find a match. (Make sure to prioritize them correctly.)
For example, to find out if a search query is querying tags, you can use the following regex:
string query = "[tag]";
bool isTag = Regex.IsMatch(query, #"^\[.+?\]$");
Here's another regex matching a user ID:
string query = "user:1234";
var match = Regex.Match(query, #"^user:(\d+)$", RegexOptions.IgnoreCase);
Note that you should trim your query first.

Replace substring which comes in middle of string only [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I want regx pattern in C# which find substring in any string which comes in middle only. Let say ,
Input : "toprohitpop rohittoppop toppoprohit"
find substring : "rohit"
Replace with : "$$$$"
Output : "top$$$$pop rohittoppop toppoprohit"
if substring "rohit" comes in left or right of the string then it should not be replaced.Substring "rohit" will only be replaced when it comes in middle of string .
Thanks in advance.

Use non-word-break anchors:
\Brohit\B
The \B will only match if it is in the middle of a word.
Read about it.

var input = "toprohitpop rohittoppop toppoprohit";
var regex = new Regex(#"\Brohit\B");
var output = regex.Replace(input, "$$$$$$$$");
See "Anchors" in Regular Expression Language.
Also, be careful with the '$' in the substitution string (see comments)

Use the following regex: .+rohit.+
Basically it enforeces at least one char before rohit and one after

RegEx to Match Phrases and create Capture Groups [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am trying to create a RegEx and C# pattern that will match a phrase like:
Photos of Washington DC taken by Jane Doe
Where the capture groups result in "Photos" "Washington DC" and "Jane Doe". Other possibilities would be:
Videos of Austin taken by Ruby : Videos, Austin, Ruby
Photos of Red Bud Dogs taken by Willa Shepherd :Photos, Red Bud Dogs, Willa Shepherd
Is this even possible with RegEx?
It appears that I got flagged...did I mention that I don't know RegEx?
I tried: (Photo of).*?((?:[a-z][a-z]+))(Taken by)((?:[a-z][a-z]+)) but that failed.

.* matches any string (except newlines). By adding a ? to it (.*?), you can tell the regex engine to match as few characters as possible, which is probably the right approach here, so the very first instances of of and taken by will be used as separators of your intended sub-matches:
matchResults = Regex.Match(subjectString, "^(.*?) of (.*?) taken by (.*)");
// matchResults.Groups[1].Value contains "Photos" etc.
If you don't expect more than one of and taken by in your input, you can change all .*? into .*.

How to write this regex to export links [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Here the example.
Starts with : imgurl=
Ends with : &amp
Example extraction
asfasfasfasimgurl=http://www.mysite.com&ampasgasgas
Result: http://www.mysite.com
So how can I write regex to extract all instances like this?

You can use lookbehind and lookahead
(?<=imgurl=).*?(?=&amp)
Lookahead and lookbehind
Greedy Quantifiers
You can get a list of urls using
List<String> urls=Regex.Matches(input,regex)
.Cast<Match>()
.Select(x=>x.Value)
.ToList();

A simple regex could be:
(?:imgurl=)(.*)(?:&amp)
the (?:[stuff here]) is a non-capture group. It requires the pattern to match, but not capture/extract. The (.*) captures everything in-between the two non-capture groups.
Also to learn more about capture groups you can read here
What is a non-capturing group? What does a question mark followed by a colon (?:) mean?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract tokens from string [closed] - c#

Related

How do I get multiple substrings in a large string in C#? [closed]

how to parse a search query string like SO [closed]

Replace substring which comes in middle of string only [closed]

RegEx to Match Phrases and create Capture Groups [closed]

How to write this regex to export links [closed]

Categories

Resources