Find hashtags in string

Find hashtags in string - c#

I am working on a Xamarin.Forms PCL project in C# and would like to detect all the hashtags.
I tried splitting at spaces and checking if the word begins with an # but the problem is if the post contains two spaces like "Hello #World Test" it would lose that the double space
string body = "Example string with a #hashtag in it";
string newbody = "";
foreach (var word in body.Split(' '))
{
if (word.StartsWith("#"))
newbody += "[" + word + "]";
newbody += word;
}
Goal output:
Example string with a [#hashtag] in it
I also only want it to have A-Z a-z 0-9 and _ stopping at any other character
Test #H3ll0_W0rld$%Test => Test [#H3ll0_W0rld]$%Test
Other Stack questions try to detect the string and extract it, I would like it work with it and put it back in the string without losing anything that methods such as splitting by certain characters would lose.

You can use Regex with #\w+ and $&
Explanation
# matches the character # literally (case sensitive)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$& Includes a copy of the entire match in the replacement string.
Example
var input = "asdads sdfdsf #burgers, #rabbits dsfsdfds #sdf #dfgdfg";
var regex = new Regex(#"#\w+");
var matches = regex.Matches(input);
foreach (var match in matches)
{
Console.WriteLine(match);
}
or
var result = regex.Replace(input, "[$&]" );
Console.WriteLine(result);
Ouput
#burgers
#rabbits
#sdf
#dfgdfg
asdads sdfdsf [#burgers], [#rabbits] dsfsdfds [#sdf] [#dfgdfg]
Updated Demo here
Another Example

Use a regular expression: \#\w*
string pattern = "\#\w*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);

Related

Remove Adjacent Space near a Special Character using regex

Using regex want to remove adjacent Space near replacement Character
replacementCharcter = '-'
this._adjacentSpace = new Regex($#"\s*([\{replacementCharacter}])\s*");
MatchCollection replaceCharacterMatch = this._adjacentSpace.Matches(extractedText);
foreach (Match replaceCharacter in replaceCharacterMatch)
{
if (replaceCharacter.Success)
{
cleanedText = Extactedtext.Replace(replaceCharacter.Value, replaceCharacter.Value.Trim());
}
}
Extractedtext = - whi, - ch
cleanedtext = -whi, -ch
expected result : cleanedtext = -whi,-ch

You can use
var Extactedtext = "- whi, - ch";
var replacementCharacter = "-";
var _adjacentSpace = new Regex($#"\s*({Regex.Escape(replacementCharacter)})\s*");
var cleanedText = _adjacentSpace.Replace(Extactedtext, "$1");
Console.WriteLine(cleanedText); // => -whi,-ch
See the C# demo.
NOTE:
replacementCharacter is of type string in the code above
$#"\s*({Regex.Escape(replacementCharacter)})\s*" will create a regex like \s*-\s*, Regex.Escape() will escape any regex-special char (like +, (, etc.) correctly to be used in a regex pattern, and the whole regex simply matches (and captured into Group 1 with the capturing parentheses) the replacementCharacter enclosed with zero or more whitespaces
No need using Regex.Matches, just replace all matches if there are any, that is how Regex.Replace works.
_adjacentSpace is the compiled Regex object, to replace, just call the .Replace() method of the regex object instance
The replacement is a backreference to the Group 1 value, the - char here.

How do I replace all instances of any special characters between each occurrence of a set of delimiters in a string?

I'm attempting to replace all instances of any special characters between each occurrence of a set of delimiters in a string. I believe the solution will include some combination of a regular expression match to retrieve the text between each set of delimiters and a regular expression replace to replace each offending character within the match with a space. Here’s what I have so far:
string input = "***XX*123456789~N3*123 E. Fake St. Apt# 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W. False Ave.*Apt. #6B~N4*Beverly Hills*CA*90210~DMG*";
string matchPattern = "(~N3\\*)(.*?)(~N4\\*)";
string replacePattern = "[^0-9a-zA-Z ]?";
var matches = Regex.Matches(input, matchPattern);
foreach (Match match in matches)
{
match.Value = "~N3*" + Regex.Replace(match.Value, replacePattern, " ") + "~N4*";
}
MessageBox.Show(input);
I would expect the message box to show the following:
"***XX*123456789~N3*123 E Fake St Apt 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W False Ave *Apt 6B~N4*Beverly Hills*CA*90210~DMG*"
Obviously this isn’t working because I can’t assign to the matched value inside the loop, but I hope you can follow my thought process. It is important that any characters which are not between the delimiters remain unchanged. Any direction or advice would be helpful. Thank you so much!

Use a Regex.Replace with a match evaluator where you may call the second Regex.Replace:
string input = "***XX*123456789~N3*123 E. Fake St. Apt# 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W. False Ave.*Apt. #6B~N4*Beverly Hills*CA*90210~DMG*";
string matchPattern = #"(~N3\*)(.*?)(~N4\*)";
string replacePattern = "[^0-9a-zA-Z ]";
string res = Regex.Replace(input, matchPattern, m =>
string.Format("{0}{1}{2}",
m.Groups[1].Value,
Regex.Replace(m.Groups[2].Value, replacePattern, " "), // Here, you modify just inside the 1st regex matches
m.Groups[3].Value));
Console.Write(res); // Just to print the demo result
// => ***XX*123456789~N3*123 E Fake St Apt 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W False Ave Apt 6B~N4*Beverly Hills*CA*90210~DMG*
See the C# demo
Actually, since ~N3* and ~N4* are literal strings, you may use a single capturing group in the pattern and then add those delimiters as hard-coded in the match evaluator, but it is up to you to decide what suits you best.

search string for everything before a set of characters in C#

I'm looking for a way to search a string for everything before a set of characters in C#. For Example, if this is my string value:
This is is a test.... 12345
I want build a new string with all of the characters before "12345".
So my new string would equal "This is is a test.... "
Is there a way to do this?
I've found Regex examples where you can focus on one character but not a sequence of characters.

You don't need to use a Regex:
public string GetBitBefore(string text, string end)
{
var index = text.IndexOf(end);
if (index == -1) return text;
return text.Substring(0, index);
}

You can use a lazy quantifier to match anything, followed by a lookahead:
var match = Regex.Match("This is is a test.... 12345", #".*?(?=\d{5})");
where:
.*? lazily matches everything (up to the lookahead)
(?=…) is a positive lookahead: the pattern must be matched, but is not included in the result
\d{5} matches exactly five digits. I'm assuming this is your lookahead; you can replace it

You can do so with help of regex lookahead.
.*(?=12345)
Example:
var data = "This is is a test.... 12345";
var rxStr = ".*(?=12345)";
var rx = new System.Text.RegularExpressions.Regex (rxStr,
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
var match = rx.Match(data);
if (match.Success) {
Console.WriteLine (match.Value);
}
Above code snippet will print every thing upto 12345:
This is is a test....
For more detail about see regex positive lookahead

This should get you started:
var reg = new Regex("^(.+)12345$");
var match = reg.Match("This is is a test.... 12345");
var group = match.Groups[1]; // This is is a test....
Of course you'd want to do some additional validation, but this is the basic idea.

^ means start of string
$ means end of string
The asterisk tells the engine to attempt to match the preceding token zero or more times. The plus tells the engine to attempt to match the preceding token once or more
{min,max} indicate the minimum/maximum number of matches.
\d matches a single character that is a digit, \w matches a "word character" (alphanumeric characters plus underscore), and \s matches a whitespace character (includes tabs and line breaks).
[^a] means not so exclude a
The dot matches a single character, except line break characters
In your case there many way to accomplish the task.
Eg excluding digit: ^[^\d]*
If you know the set of characters and they are not only digit, don't use regex but IndexOf(). If you know the separator between first and second part as "..." you can use Split()

Take a look at this snippet:
class Program
{
static void Main(string[] args)
{
string input = "This is is a test.... 12345";
// Here we call Regex.Match.
MatchCollection matches = Regex.Matches(input, #"(?<MySentence>(\w+\s*)*)(?<MyNumberPart>\d*)");
foreach (Match item in matches)
{
Console.WriteLine(item.Groups["MySentence"]);
Console.WriteLine("******");
Console.WriteLine(item.Groups["MyNumberPart"]);
}
Console.ReadKey();
}
}

You could just split, not as optimal as the indexOf solution
string value = "oiasjdoiasj12345";
string end = "12345";
string result = value.Split(new string[] { end }, StringSplitOptions.None)[0] //Take first part of the result, not the quickest but fairly simple

Regex match between two strings that might contain another string

I'm doing a regex that is trying to match the following string:
.\SQL2012
From the two strings (they are contained within another larger string but that is irrelevant in this case):
/SERVER "\".\SQL2012\""
/SERVER .\SQL2012
So the "\" before and the \"" after the match may both be omitted in some cases. The regex I've come up with (from a previous question here on StackOverflow) is the following:
(?<=\/SERVER\s*(?:[""\\""]+)?)\w+(?=(?:[\\""""]+|$)| )
Which works fine if I'm trying to match TEST_SERVER instead of .\SQL2012 (because \w does not match special characters). Is there a way to match anything until \"" or a whitespace occurs?
I'm doing this in C#, here's my code:
string input = "/SERVER \"\\\".\\SQL2012\\\"\"";
string pattern = #"(?<=\/SERVER\s*(?:[""\\""]+)?)\w+(?=(?:[\\""""]+|$)| )";
Regex regEx = new Regex(pattern);
MatchCollection matches = regEx.Matches(input);
foreach (Match match in matches)
{
Console.WriteLine(match.ToString());
}
Console.ReadKey();

Add a word boundary \b just before to the lookahead,
string input = "/SERVER .\\SQL2012";
Regex rgx = new Regex(#"(?<=\/SERVER\s+""\\"").*?\b(?=\\""""|$| )|(?<=\/SERVER\s+).*?\b(?= |$)");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
Console.WriteLine(input);
IDEONE

Regex help with sample pattern. C#

I decided to use Regex, now I have two problems :)
Given the input string "hello world [2] [200] [%8] [%1c] [%d]",
What would be an approprite pattern to match the instances of "[%8]" "[%1c]" + "[%d]" ? (So a percentage sign, followed by any length alphanumeric, all enclosed in square brackets).
for the "[2]" and [200], I already use
Regex.Matches(input, "(\\[)[0-9]*?\\]");
Which works fine.
Any help would be appreicated.

MatchCollection matches = null;
try {
Regex regexObj = new Regex(#"\[[%\w]+\]");
matches = regexObj.Matches(input);
if (matches.Count > 0) {
// Access individual matches using matches.Item[]
} else {
// Match attempt failed
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

The Regex needed to match this pattern of "[%anyLengthAlphaNumeric]" in a string is this "[(%\w+)]"
The leading "[" is escaped with the "\" then you are creating a grouping of characters with the (...). This grouping is defined as %\w+. The \w is a shortcut for all word characters including letters and digits no spaces. The + matches one or more instances of the previous symbol, character or group. Then the trailing "]" is escaped with a "\" and catches the closing bracket.
Here is a basic code example:
string input = #"hello world [2] [200] [%8] [%1c] [%d]";
Regex example = new Regex(#"\[(%\w+)\]");
MatchCollection matches = example.Matches(input);

Try this:
Regex.Matches(input, "\\[%[0-9a-f]+\\]");
Or as a combined regular expression:
Regex.Matches(input, "\\[(\\d+|%[0-9a-f]+)\\]");

How about #"\[%[0-9a-f]*?\]"?
string input = "hello world [2] [200] [%8] [%1c] [%d]";
MatchCollection matches = Regex.Matches(input, #"\[%[0-9a-f]*?\]");
matches.Count // = 3

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Find hashtags in string - c#

Use a regular expression: \#\w* string pattern = "\#\w*"; Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase); MatchCollection matches = rgx.Matches(input);

Related

Remove Adjacent Space near a Special Character using regex

How do I replace all instances of any special characters between each occurrence of a set of delimiters in a string?

search string for everything before a set of characters in C#

Regex match between two strings that might contain another string

Regex help with sample pattern. C#

Categories

Resources