c# extracting a certain value within a string - c#

I'm trying to remove a certain bit of text within a string.
Say the string I have contains html elements, like paragraph tags, I created some sort of tokens that will be identified with "{" at the beginning and "}" at the end.
So essentially the string I have would look like this:
text = "<p>{token}</p><p> text goes here {token3}</p>"
I'm wondering is there a way to extract all the words including the "{}" using C#-Code within the string.
Whilst each token could be different to the next, that is why i must use "{" and "}" to identify them as seen below
At the moment I'm got to this code:
var newWord = text.Contains("{") && word.Contains("}")

Something like
var r = new Regex("({.*?})");
foreach(var match in r.Matches(myString)) ...
The ? means that your regex is non-greedy. If you omit it you´ll simply get everythinbg between the first { and the last }.
Alternativly you may also use this:
var index = text.IndexOf("{");
while (index != -1)
{
var end = text.IndexOf("}", index);
result.Add(text.Substring(index, end - index + 1));
index = text.IndexOf("{", index + 1);
}

I would just use a regex for this:
Regex reg = new Regex("{.*?}");
var results = reg.Matches(text);
The regex searches for any characters between { and }.
The .*? means match any character but in a non greedy way. So it will search for the shortest possible string between braces.

Related

Replace a part of string containing Password

Slightly similar to this question, I want to replace argv contents:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
to this:
"-help=none\n-URL=(default)\n-password=********\n-uname=Khanna\n-p=100"
I have tried very basic string find and search operations (using IndexOf, SubString etc.). I am looking for more elegant solution so as to replace this part of string:
-password=AnyPassword
to:
-password=*******
And keep other part of string intact. I am looking if String.Replace or Regex replace may help.
What I've tried (not much of error-checks):
var pwd_index = argv.IndexOf("--password=");
string converted;
if (pwd_index >= 0)
{
var leftPart = argv.Substring(0, pwd_index);
var pwdStr = argv.Substring(pwd_index);
var rightPart = pwdStr.Substring(pwdStr.IndexOf("\n") + 1);
converted = leftPart + "--password=********\n" + rightPart;
}
else
converted = argv;
Console.WriteLine(converted);
Solution
Similar to Rubens Farias' solution but a little bit more elegant:
string argv = "-help=none\n-URL=(default)\n-password=\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)[^\n]*", "$1********");
It matches password= literally, stores it in capture group $1 and the keeps matching until a \n is reached.
This yields a constant number of *'s, though. But telling how much characters a password has, might already convey too much information to hackers, anyway.
Working example: https://dotnetfiddle.net/xOFCyG
Regular expression breakdown
( // Store the following match in capture group $1.
password= // Match "password=" literally.
)
[ // Match one from a set of characters.
^ // Negate a set of characters (i.e., match anything not
// contained in the following set).
\n // The character set: consists only of the new line character.
]
* // Match the previously matched character 0 to n times.
This code replaces the password value by several "*" characters:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)([\s\S]*?\n)",
match => match.Groups[1].Value + new String('*', match.Groups[2].Value.Length - 1) + "\n");
You can also remove the new String() part and replace it by a string constant

How to replace surrounding groups of a match using regex?

I have the following string:
[Element][TOPINCLUDESAMEVALUES:5][ParentElement][ORDERBY:DateAdded]
and want to transform it to this:
[Element][TOP:5:WITHTIES][ParentElement][ORDERBY:DateAdded]
So, the [TOPINCLUDESAMEVALUES:5] is transform to [TOP:5:WITHTIES].
The input string could contain more [elements]. Each element is surrounded by square brackets []. For example:
...[element1][element2][TOPINCLUDESAMEVALUES:5]...[element3][element4][TOPINCLUDESAMEVALUES:105][element3]...
So, I need to transform each [TOPINCLUDESAMEVALUES:X] element to [TOP:X:WITHTIES] elements.
Generally, I try some combinations using regex replace substitutions but was not able to do it myself.
string statement = "[Campaign][TOPINCLUDESAMEVALUES:5][InstanceID][GROUPBY:Campaign]";
statement = Regex.Replace(statement, #"(?<=\[TOPINCLUDESAMEVALUES:)[^\]]+(?=\])", "");
Could anyone tell is there a way to do such replace?
Since you are replacing the content of TOPINCLUDESAMEVALUES with something else, you need to capture it. Lookbehind that you are using is non-capturing, so you wouldn't be able to replace its content.
Here is how you should be able to do it:
statement = Regex.Replace(
statement
, #"\[TOPINCLUDESAMEVALUES:([^\]]+)\]", "[TOP:$1:WITHTIES]"
);
This expression would match the entire [TOPINCLUDESAMEVALUES:5] bracketed portion, and additionally capture 5 as capturing group number 1. The replacement value refers to that group as $1, pasting its content in between TOP: and :WITHTIES.
Demo.
try this
string statement = "[Campaign][TOPINCLUDESAMEVALUES:5][InstanceID][GROUPBY:Campaign]";
string[] arrstatement = "[Campaign][TOPINCLUDESAMEVALUES:5][InstanceID][GROUPBY:Campaign]".Split(']');
for (int i = 0; i < arrstatement.Length; i++)
{
if (arrstatement[i].Contains("TOPINCLUDESAMEVALUES"))
arrstatement[i] = "[TOP" + arrstatement[i].Substring(arrstatement[i].IndexOf(":")) + ":WITHTIES";
}
statement = string.Join("]", arrstatement);

Get specific word from string

I've Table URL and I'd like select Table Name only. What is the best approach to achieve it?
URLs:
"db://SQL Table.Table.[dbo].[DirectoryType]"
"db://SQL Table.Table.[dbo].[IX_AnalysisResult_ConceptVariations]"
"db://SQL Table.Table.[dbo].[IX_AnalysisResult_DocXConcepts]"
DESIRED OUTPUT:
DirectoryType
IX_AnalysisResult_ConceptVariations
IX_AnalysisResult_DocXConcepts
NOTE: These URLs will have db://SQL Table.Table.[dbo]. in common most of the time so I am using following code to achieve this:
CODE:
var trimURL = tableURL.Replace("db://SQL Table.Table.[dbo].", String.Empty).Replace("[",String.Empty).Replace("]",String.Empty);
OUTPUT:
DirectoryType
IX_AnalysisResult_ConceptVariations
IX_AnalysisResult_DocXConcepts
If for some reason URL prefix is changed then my code won't work. So what is the best way to get a table name from these type of URLs?
You could get the last index of '[' and ']' and get the substring therein:
var startIndex = tableUrl.LastIndexOf('[') + 1; // +1 to start after opening bracket
var endIndex = tableUrl.LastIndexOf(']');
var charsToRead = (startIndex - endIndex) - 1; // -1 to stop before closing bracket
var tableName = tableUrl.Substring( startIndex, charsToRead );
Of course, this assumes you can guarantee no brackets in your table name.
References:
String.Substring
String.LastIndexOf
You can use this regex to match the last thing inside the last group of [] that appears immediately at the end of a string:
\[([^\[^\]]*)\]$
At input db://SQL Table.Table.[dbo].[DirectoryType] you grab the string DirectoryType.
The $ symbol means the end of a string.
You can see it in action here.
An example:
var match = new System.Text.RegularExpressions.Regex(#"\[([^\[^\]]*)\]$", RegexOptions.Singleline);
Match match_result = match.Match("db://SQL Table.Table.[dbo].[DirectoryType]");
string result = "";
if (match_result.Groups.Count > 1)
result = match_result.Groups[1].Value;
//result = "DirectoryType"
Remember using System.Text.RegularExpressions;
var matcher = new System.Text.RegularExpressions.Regex(#"^.*\[(?<table>.*?)\]""$", RegexOptions.Compiled);
var results = matcher.Match(/*your input string*/);
Inspect the results in the debugger and you'll find how to extract what you are looking for.
Note that this pattern assumes that your data actually includes the quotation marks shown in your question.
you were doing it right, i just used split on '.', I am assuming your url contains minimum anything.[DirectoryType]"
string op = tableURL.Split('.')[tableURL.Split('.').Length - 1].Replace("[", "").Replace("]", "");

Regex to strip characters except given ones?

I would like to strip strings but only leave the following:
[a-zA-Z]+[_a-zA-Z0-9-]*
I am trying to output strings that start with a character, then can have alphanumeric, underscores, and dashes. How can I do this with RegEx or another function?
Because everything in the second part of the regex is in the first part, you could do something like this:
String foo = "_-abc.!##$5o993idl;)"; // your string here.
//First replace removes all the characters you don't want.
foo = Regex.Replace(foo, "[^_a-zA-Z0-9-]", "");
//Second replace removes any characters from the start that aren't allowed there.
foo = Regex.Replace(foo, "^[^a-zA-Z]+", "");
So start out by paring it down to only the allowed characters. Then get rid of any allowed characters that can't be at the beginning.
Of course, if your regex gets more complicated, this solution falls apart fairly quickly.
Assuming that you've got the strings in a collection, I would do it this way:
foreach element in the collection try match the regex
if !success, remove the string from the collection
Or the other way round - if it matches, add it to a new collection.
If the strings are not in a collection can you add more details as to what your input looks like ?
If you want to pull out all of the identifiers matching your regular expression, you can do it like this:
var input = " _wontmatch f_oobar0 another_valid ";
var re = new Regex( #"\b[a-zA-Z][_a-zA-Z0-9-]*\b" );
foreach( Match match in re.Matches( input ) )
Console.WriteLine( match.Value );
Use MatchCollection matchColl = Regex.Matches("input string","your regex");
Then use:
string [] outStrings = new string[matchColl.Count]; //A string array to contain all required strings
for (int i=0; i < matchColl.Count; i++ )
outStrings[i] = matchColl[i].ToString();
You will have all the required strings in outStrings. Hope this helps.
Edited
var s = Regex.Matches(input_string, "[a-z]+(_*-*[a-z0-9]*)*", RegexOptions.IgnoreCase);
string output_string="";
foreach (Match m in s)
{
output_string = output_string + m;
}
MessageBox.Show(output_string);

Regex questions

I'm trying to get some text from a large text file, the text I'm looking for is:
Type:Production
Color:Red
I pass the whole text in the following method to get (Type:Production , Color:Red)
private static void FindKeys(IEnumerable<string> keywords, string source)
{
var found = new Dictionary<string, string>(10);
var keys = string.Join("|", keywords.ToArray());
var matches = Regex.Matches(source, #"(?<key>" + #"\B\s" + keys + #"\B\s" + "):",
RegexOptions.Singleline);
foreach (Match m in matches)
{
var key = m.Groups["key"].ToString();
var start = m.Index + m.Length;
var nx = m.NextMatch();
var end = (nx.Success ? nx.Index : source.Length);
found.Add(key, source.Substring(start, end - start));
}
foreach (var n in found)
{
Console.WriteLine("Key={0}, Value={1}", n.Key, n.Value);
}
}
}
My problems are the following:
The search returns _Type: as well, where I only need Type:
The search return Color:Red/n/n/n/n/n (with the rest of the text, where I only need Color:Red
So, basically:
- How can I force Regex to get the exact match for Type and ignore _Type
- How to get only the text after : and ignore /n/n/ and any other text
I hope this is clear
Thanks,
Your regex currently looks like this:
(?<key>\B\sWord1|Word2|Word3\B\s):
I see the following issues here:
First, Word1|Word2|Word3 should be put in parenthesis. Otherwise, it will search for \B\sWord1 or Word2 or Word3\B\s, which is not what you want (I guess).
Why \B\s? A non-boundary followed by a whitespace? That doesn't make sense. I guess you want just \b (= word boundary). There's no need to use it in the end, because the colon already constitutes a word boundary.
So, I would suggest to use the following. It will fix the _Type problem, because there is no word boundary between _ and Type (since _ is considered to be a word character).
\b(?<key>Word1|Word2|Word3):
If the text following the key is always just a single word, I'd match it in the regex as well: (\s* allows for whitespace after the colon, I don't know if you need this. \w+ ensures that only word characters -- i.e. no line breaks etc. -- are matched as the value.)
\b(?<key>Word1|Word2|Word3):\s*(?<value>\w+)
Then you just need to iterate through all the matches and extract the key and value groups. No need for any string operations or index arithmetic.
So if I understand correctly, you have:
Pairs of key:values
Each pair is separated by a space
Within each pair, the key and value is separated by “:”
Then I would not use regex at all. I would:
use String.Split(' ') to get an array of pairs
loop over all the pairs
use String.Split(':') to get the key and value from each pair

Categories