What would the syntax to get all the words in a string after the first space. For example, bobs nice house. So the result should be " nice house" without the quote.
([^\s]+) gives me all 3 words seperated by ;
,[\s\S]*$ > not compiling.
I was really looking shortest possible code. following did the job. thanks guys
\s(.*)
I think it should be done this way:
[^ ]* (.*)
It allows 0 or more elements that are not a space, than a single space and selects whatever comes after that space.
C# usage
var input = "bobs nice house";
var afterSpace = Regex.Match(input, "[^ ]* (.*)").Groups[1].Value;
afterSpace is "nice house".
To get that first space as well in result string change expression to [^ ]*( .*)
No regex solution
var afterSpace = input.Substring(input.IndexOf(' '));
Actually, you don't need to use regex for that process. You just need to use String.Split() method like this;
string s = "bobs nice house";
string[] s1 = s.Split(' ');
for(int i = 1; i < s1.Length; i++)
Console.WriteLine(s1[i]);
Output will be;
nice
house
Here is a DEMO.
use /(?<=\s).*/g to find string after first space.run snippet check
str="first second third forth fifth";
str=str.match(/(?<=\s).*/g)
console.log(str);
Related
Considering strings like this:
"This is a string....!"
"This is another...!!"
"What is this..!?!?"
...
// There are LOTS of examples of weird/angry sentence-endings like the ones above.
I want to replace the unnecessary punctuation at the end to make it look like this:
"This is a string!"
"This is another!"
"What is this?"
What I basically do is:
- split by space
- check if last char in string contains a punctuation
- start replacing with the patterns below
I have tried a very big ".Replace(string, string)" function, but it does not work - there has to be a simpler regex I guess.
Documentation:
Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string.
As well as:
Because this method returns the modified string, you can chain together successive calls to the Replace method to perform multiple replacements on the original string.
Anything is wrong here.
EDIT: ALL the proposed solutions work fine! Thank you very much!
This one was the best suited solution for my project:
Regex re = new Regex("[.?!]*(?=[.?!]$)");
string output = re.Replace(input, "");
Your solution works almost fine (demo), the only issue is when the same sequence could be matched starting at different spots. For example, ..!?!? from your last line is not part of the substitution list, so ..!? and !? get replaced by two separate matches, producing ?? in the output.
It looks like your strategy is pretty straightforward: in a chain of multiple punctuation characters the last character wins. You can use regular expressions to do the replacement:
[!?.]*([!?.])
and replace it with $1, i.e. the capturing group that has the last character:
string s;
while ((s = Console.ReadLine()) != null) {
s = Regex.Replace(s, "[!?.]*([!?.])", "$1");
Console.WriteLine(s);
}
Demo
Simply
[.?!]*(?=[.?!]$)
should do it for you. Like
Regex re = new Regex("[.?!]*(?=[.?!]$)");
Console.WriteLine(re.Replace("This is a string....!", ""));
This replaces all punctuations but the last with nothing.
[.?!]* matches any number of consecutive punctuation characters, and the (?=[.?!]$) is a positive lookahead making sure it leaves one at the end of the string.
See it here at ideone.
Or you can do it without regExps:
string TrimPuncMarks(string str)
{
HashSet<char> punctMarks = new HashSet<char>() {'.', '!', '?'};
int i = str.Length - 1;
for (; i >= 0; i--)
{
if (!punctMarks.Contains(str[i]))
break;
}
// the very last punct mark or null if there were no any punct marks in the end
char? suffix = i < str.Length - 1 ? str[str.Length - 1] : (char?)null;
return str.Substring(0, i+1) + suffix;
}
Debug.Assert("What is this?" == TrimPuncMarks("What is this..!?!?"));
Debug.Assert("What is this" == TrimPuncMarks("What is this"));
Debug.Assert("What is this." == TrimPuncMarks("What is this."));
I'm searching for a regular expression that will extract 0263563
from
;010263563=2119?
and 0267829
from
%00000026782904?;010267829=4119?
(Must be the same regular expression).
Start at the 3rd character after the semicolon and take 7 characters.
;..(\d{7})
or more general:
;..(.{7})
Or according to comment : "To clarify characters to take are digits"
;\d\d(\d{7})
Well, you need this:
;\d{2}(\d+)
The First group will contain the number you want.
;[\S]{2}([\S]{7})
Since you said characters and not numbers, but it would work either way
For rules that are that precise (start 3 chars after ;, then next 7), you could use a plain substring:
string s = "%00000026782904?;010267829=4119?";
var pos = s.IndexOf(';');
var number = s.Substring(pos+3, 7);
And of course, test whether that IndexOf really found the ;
The below regex would exactly 7 digits which must be preceded by a ; , any two characters.
(?<=;.{2})\d{7}
Code:
String input = #";010263563=2119?
%00000026782904?;010267829=4119?";
Regex rgx = new Regex(#"(?<=;.{2})\d{7}");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
IDEONE
Output:
0263563
0267829
Assuming regex is not an absolute requirement, you could use:
var input = "%00000026782904?;010267829=4119?";
// output will be: 0267829
var digits = input.SkipWhile(x => x != ';').Skip(3).Take(7).ToArray();
var output = new string(digits);
Let's say I have the following within my source code, and I want to return only the numbers within the string:
The source is coming from a website, just fyi, and I already have it parsed out so that it comes into the program, but then I need to actually parse these numbers to return what I want. Just having a doosy of a time trying to figure it out tho :(
like: 13|100|0;
How could I write this regex?
var cData = new Array(
"g;13|g;100|g;0",
"g;40|g;100|g;1.37",
"h;43|h;100|h;0",
"h;27|h;100|h;0",
"i;34|i;100|i;0",
"i;39|i;100|i;0",
);
Not sure you actually need regex here.
var str = "g;13|g;100|g;0";
str = str.Replace("g;", "");
would give you "13|100|0".
Or a slight improvement on spinon's answer:
// \- included in case numbers can be negative. Leave it out if not.
Regex.Replace("g;13|g;100|g;0", "[^0-9\|\.\-]", "");
Or an option using split and join:
String.Join("|", "g;13|g;100|g;0".Split('|').Select(pipe => pipe.Split(';')[1]));
I would use something like this so you only keep numbers and separator:
Regex.Replace("g;13|g;100|g;0", "[^0-9|]", "");
Regex might be overkill in this case. Given the uniform delimiting of | and ; I would recommend String.Split(). Then you could either split again or use String.Replace() to get rid of the extra chars (i.e. g;).
It looks like you have a number of solutions, but I'll throw in one more where you can iterate over each group in a match to get the number out if you want.
Regex regexObj = new Regex(#"\w;([\d|.]+)\|?");
Match matchResults = regexObj.Match("g;13|g;100|g;0");
if( matchResults.IsMatch )
{
for (int i = 1; i < matchResults.Groups.Count; i++)
{
Group groupObj = matchResults.Groups[i];
if (groupObj.Success)
{
//groupObj.Value will be the number you want
}
}
}
I hope this is helps.
I need to take a string, and capitalize words in it. Certain words ("in", "at", etc.), are not capitalized and are changed to lower case if encountered. The first word should always be capitalized. Last names like "McFly" are not in the current scope, so the same rule will apply to them - only first letter capitalized.
For example: "of mice and men By CNN" should be changed to "Of Mice and Men by CNN". (Therefore ToTitleString won't work here.)
What would be the best way to do that?
I thought of splitting the string by spaces, and go over each word, changing it if necessary, and concatenating it to the previous word, and so on.
It seems pretty naive and I was wondering if there's a better way to do it. I am using .NET 3.5.
Use
Thread.CurrentThread.CurrentCulture.TextInfo.ToTitleCase("of mice and men By CNN");
to convert to proper case and then you can loop through the keywords as you have mentioned.
Depending on how often you plan on doing the capitalization I'd go with the naive approach. You could possibly do it with a regular expression, but the fact that you don't want certain words capitalized makes that a little trickier.
You can do it with two passes using regular expressions:
var result = Regex.Replace("of mice and men isn't By CNN", #"\b(\w)", m => m.Value.ToUpper());
result = Regex.Replace(result, #"(\s(of|in|by|and)|\'[st])\b", m => m.Value.ToLower(), RegexOptions.IgnoreCase);
This outputs Of Mice and Men Isn't by CNN.
The first expression capitalizes every letter on a word boundary and the second one downcases any words matching the list that are surrounded by white space.
The downsides to this approach is that you're using regexs (now you have two problems) and you'll need to keep that list of excluded words up to date. My regex-fu isn't good enough to be able to do it in one expression, but it might be possible.
An answer from another question, How to Capitalize names -
CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
TextInfo textInfo = cultureInfo.TextInfo;
Console.WriteLine(textInfo.ToTitleCase(title));
Console.WriteLine(textInfo.ToLower(title));
Console.WriteLine(textInfo.ToUpper(title));
Use ToTitleCase() first and then keep a list of applicable words and Replace back to the all-lower-case version of those applicable words (provided that list is small).
The list of applicable words could be kept in a dictionary and looped through pretty efficiently, replacing with the .ToLower() equivalent.
Try something like this:
public static string TitleCase(string input, params string[] dontCapitalize) {
var split = input.Split(' ');
for(int i = 0; i < split.Length; i++)
split[i] = i == 0
? CapitalizeWord(split[i])
: dontCapitalize.Contains(split[i])
? split[i]
: CapitalizeWord(split[i]);
return string.Join(" ", split);
}
public static string CapitalizeWord(string word)
{
return char.ToUpper(word[0]) + word.Substring(1);
}
You can then later update the CapitalizeWord method if you need to handle complex surnames.
Add those methods to a class and use it like this:
SomeClass.TitleCase("a test is a sentence", "is", "a"); // returns "A Test is a Sentence"
A slight improvement on jonnii's answer:
var result = Regex.Replace(s.Trim(), #"\b(\w)", m => m.Value.ToUpper());
result = Regex.Replace(result, #"\s(of|in|by|and)\s", m => m.Value.ToLower(), RegexOptions.IgnoreCase);
result = result.Replace("'S", "'s");
You can have a Dictionary having the words you would like to ignore, split the sentence in phrases (.split(' ')) and for each phrase, check if the phrase exists in the dictionary, if it does not, capitalize the first character and then, add the string to a string buffer. If the phrase you are currently processing is in the dictionary, simply add it to the string buffer.
A non-clever approach that handles the simple case:
var s = "of mice and men By CNN";
var sa = s.Split(' ');
for (var i = 0; i < sa.Length; i++)
sa[i] = sa[i].Substring(0, 1).ToUpper() + sa[i].Substring(1);
var sout = string.Join(" ", sa);
Console.WriteLine(sout);
The easiest obvious solution (for English sentences) would be to:
"sentence".Split(" ") the sentence on space characters
Loop through each item
Capitalize the first letter of each item - item[i][0].ToUpper(),
Remerge back into a string joined on a space.
Repeat this process with "." and "," using that new string.
You should create your own function like you're describing.
i have strings in the form [abc].[some other string].[can.also.contain.periods].[our match]
i now want to match the string "our match" (i.e. without the brackets), so i played around with lookarounds and whatnot. i now get the correct match, but i don't think this is a clean solution.
(?<=\.?\[) starts with '[' or '.['
([^\[]*) our match, i couldn't find a way to not use a negated character group
`.*?` non-greedy did not work as expected with lookarounds,
it would still match from the first match
(matches might contain escaped brackets)
(?=\]$) string ends with an ]
language is .net/c#. if there is an easier solution not involving a regex i'd be also happy to know
what really irritates me is the fact, that i cannot use (.*?) to capture the string, as it seems non-greedy does not work with lookbehinds.
i also tried: Regex.Split(str, #"\]\.\[").Last().TrimEnd(']');, but i'm not really pround of this solution either
The following should do the trick. Assuming the string ends after the last match.
string input = "[abc].[some other string].[can.also.contain.periods].[our match]";
var search = new Regex("\\.\\[(.*?)\\]$", RegexOptions.RightToLeft);
string ourMatch = search.Match(input).Groups[1]);
Assuming you can guarantee the input format, and it's just the last entry you want, LastIndexOf could be used:
string input = "[abc].[some other string].[can.also.contain.periods].[our match]";
int lastBracket = input.LastIndexOf("[");
string result = input.Substring(lastBracket + 1, input.Length - lastBracket - 2);
With String.Split():
string input = "[abc].[some other string].[can.also.contain.periods].[our match]";
char[] seps = {'[',']','\\'};
string[] splitted = input.Split(seps,StringSplitOptions.RemoveEmptyEntries);
you get "out match" in splitted[7] and can.also.contain.periods is left as one string (splitted[4])
Edit: the array will have the string inside [] and then . and so on, so if you have a variable number of groups, you can use that to get the value you want (or remove the strings that are just '.')
Edited to add the backslash to the separator to treat cases like '\[abc\]'
Edit2: for nested []:
string input = #"[abc].[some other string].[can.also.contain.periods].[our [the] match]";
string[] seps2 = { "].["};
string[] splitted = input.Split(seps2, StringSplitOptions.RemoveEmptyEntries);
you our [the] match] in the last element (index 3) and you'd have to remove the extra ]
You have several options:
RegexOptions.RightToLeft - yes, .NET regex can do this! Use it!
Match the whole thing with greedy prefix, use brackets to capture the suffix that you're interested in
So generally, pattern becomes .*(pattern)
In this case, .*\[([^\]]*)\], then extract what \1 captures (see this on rubular.com)
References
regular-expressions.info/Grouping with brackets