I'm fairly new to using regular expressions, and, based on a few tutorials I've read, I'm unable to get this step in my Regex.Replace formatted properly.
Here's the scenario I'm working on... When I pull my data from the listbox, I want to format it into a CSV like format, and then save the file. Is using the Replace option an ideal solution for this scenario?
Before the regular expression formatting example.
FirstName LastName Salary Position
-------------------------------------
John Smith $100,000.00 M
Proposed format after regular expression replace
John Smith,100000,M
Current formatting status output:
John,Smith,100000,M
*Note - is there a way I can replace the first comma with a whitespace?
Snippet of my code
using(var fs = new FileStream(filepath, FileMode.OpenOrCreate, FileAccess.Write))
{
using(var sw = new StreamWriter(fs))
{
foreach (string stw in listBox1.Items)
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(stw);
//Piecing the list back to the original format
sb_trim = Regex.Replace(stw, #"[$,]", "");
sb_trim = Regex.Replace(sb_trim, #"[.][0-9]+", "");
sb_trim = Regex.Replace(sb_trim, #"\s", ",");
sw.WriteLine(sb_trim);
}
}
}
You can do it this with two replace's
//let stw be "John Smith $100,000.00 M"
sb_trim = Regex.Replace(stw, #"\s+\$|\s+(?=\w+$)", ",");
//sb_trim becomes "John Smith,100,000.00,M"
sb_trim = Regex.Replace(sb_trim, #"(?<=\d),(?=\d)|[.]0+(?=,)", "");
//sb_trim becomes "John Smith,100000,M"
sw.WriteLine(sb_trim);
Try this::
sb_trim = Regex.Replace(stw, #"(\D+)\s+\$([\d,]+)\.\d+\s+(.)",
m => string.Format(
"{0},{1},{2}",
m.Groups[1].Value,
m.Groups[2].Value.Replace(",", string.Empty),
m.Groups[3].Value));
This is about as clean an answer as you'll get, at least with regexes.
(\D+): First capture group. One or more non-digit characters.
\s+\$: One or more spacing characters, then a literal dollar sign ($).
([\d,]+): Second capture group. One or more digits and/or commas.
\.\d+: Decimal point, then at least one digit.
\s+: One or more spacing characters.
(.): Third capture group. Any non-line-breaking character.
The second capture group additionally needs to have its commas stripped. You could do this with another regex, but it's really unnecessary and bad for performance. This is why we need to use a lambda expression and string format to piece together the replacement. If it weren't for that, we could just use this as the replacement, in place of the lambda expression:
"$1,$2,$3"
Add the following 2 lines
var regex = new Regex(Regex.Escape(","));
sb_trim = regex.Replace(sb_trim, " ", 1);
If sb_trim= John,Smith,100000,M the above code will return "John Smith,100000,M"
This must do the job:
var result=Regex.Replace("John Smith $100,000.00 M", #"^(\w+)\s+(\w+)\s+\$([\d,\.]+)\s+(\w+)$","$1,$2,$3,$4");
//result: "John,Smith,100,000.00,M"
For simplicity, you just need a number from currency.
Regex.Replace(yourcurrency, "[^0-9]","")
Related
I have a subtitle in my string
string subtitle = Encoding.ASCII.GetString(srt_text);
srt_text is a byte array. I am converting it to string as you can see. subtitle starts and finish with
Starts:
1
00:00:40,152 --> 00:00:43,614
Out west there was this fella,
2
00:00:43,697 --> 00:00:45,824
fella I want to tell you about,
Finish:
1631
01:52:17,016 --> 01:52:20,019
Catch ya later on
down the trail.
1632
01:52:20,102 --> 01:52:24,440
Say, friend, you got any more
of that good Sarsaparilla?
Now I want to take times and put them into array. I tried
Regex rgx = new Regex(#"^(?:[01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9],[0-9][0-9][0-9]$", RegexOptions.IgnoreCase);
Match m = rgx.Match(subtitle);
I am thinking I can just find times but didn't put into array.
Assume 'times' is my string array. I want to array output like that
times[0] = "00:00:40,152"
times[1] = "00:00:43,614"
...
times[n-1] = "01:52:20,102"
times[n] = "01:52:24,440"
It have to keep going when subtitle is finish. All times might be in.
I am open for your advise. How can I do this? I am new probably have a lot of mistakes. I apoligize. Hope you can understand and help me.
Using Regular Expressions
You can do this with Regex with multiple matches using Regex.Matches
The regex used is
(\d{2}:\d{2}:\d{2},\d+)
\d select digits
{2} count of repeatition
+ one or many repeatitions
: and , are plain characters without meaning.
Here is the syntax.
var matchList = Regex.Matches(subtitle, #"(\d{2}:\d{2}:\d{2},\d+)",RegexOptions.Multiline);
var times = matchList.Cast<Match>().Select(match => match.Value).ToList();
With this your times variable will be filled with all the time substrings.
Below is the result screenshot.
Also note: The RegexOptions.Multiline part is optional in this scenario.
Probably this might help you get the times from the string you have.
string subtitle = #"1
00:00:40,152 --> 00:00:43,614
Out west there was this fella,
2
00:00:43,697 --> 00:00:45,824
fella I want to tell you about,";
List<string> timestrings = new List<string>();
List<string> splittedtimestrings = new List<string>();
List<string> splittedstring = subtitle.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries ).ToList();
foreach(string st in splittedstring)
{
if(st.Contains("00"))
{
timestrings.Add(st);
}
}
foreach(string s in timestrings)
{
string[] foundstr = s.Split(new string[] { " --> " }, StringSplitOptions.RemoveEmptyEntries);
splittedtimestrings.Add(foundstr[0]);
splittedtimestrings.Add(foundstr[1]);
}
I have tried splitting the string to get the time string instead of Regex. Because I think Regex should be used to processes text based on pattern matches rather than on comparing and matching literal text.
I want to insert a dollar sign at a specific position between two named capturing groups. The problem is that this means two immediately following dollar-signs in the replacement-string which results in problems.
How am I able to do that directly with the Replace-method? I only found a workaround by adding some temporary garbage that I instantly remove again.
See code for the problem:
// We want to add a dollar sign before a number and use named groups for capturing;
// varying parts of the strings are in brackets []
// [somebody] has [some-dollar-amount] in his [something]
string joeHas = "Joe has 500 in his wallet.";
string jackHas = "Jack has 500 in his pocket.";
string jimHas = "Jim has 740 in his bag.";
string jasonHas = "Jason has 900 in his car.";
Regex dollarInsertion = new Regex(#"(?<start>^.*? has )(?<end>\d+ in his .*?$)", RegexOptions.Multiline);
Console.WriteLine(joeHas);
Console.WriteLine(jackHas);
Console.WriteLine(jimHas);
Console.WriteLine(jasonHas);
Console.WriteLine("--------------------------");
joeHas = dollarInsertion.Replace(joeHas, #"${start}$${end}");
jackHas = dollarInsertion.Replace(jackHas, #"${start}$-${end}");
jimHas = dollarInsertion.Replace(jimHas, #"${start}\$${end}");
jasonHas = dollarInsertion.Replace(jasonHas, #"${start}$kkkkkk----kkkk${end}").Replace("kkkkkk----kkkk", "");
Console.WriteLine(joeHas);
Console.WriteLine(jackHas);
Console.WriteLine(jimHas);
Console.WriteLine(jasonHas);
Output:
Joe has 500 in his wallet.
Jack has 500 in his pocket.
Jim has 740 in his bag.
Jason has 900 in his car.
--------------------------
Joe has ${end}
Jack has $-500 in his pocket.
Jim has \${end}
Jason has $900 in his car.
Use this replacement pattern: "${start}$$${end}"
The double $$ escapes the $ so that it is treated as a literal character. The third $ is really part of the named group ${end}. You can read about this on the MSDN Substitutions page.
I would stick with the above approach. Alternately you can use the Replace overload that accepts a MatchEvaluator and concatenate what you need, similar to the following:
jackHas = dollarInsertion.Replace(jackHas,
m => m.Groups["start"].Value + "$" + m.Groups["end"].Value);
Why are you using regex for this in the first place?
string name = "Joe";
int amount = 500;
string place = "car";
string output = string.Format("{0} has ${1} in his {2}",name,amount,place);
What regular expression can be used to make the following conversions?
City -> CITY
FirstName -> FIRST_NAME
DOB -> DOB
PATId -> PAT_ID
RoomNO -> ROOM_NO
The following almost works - it just adds an extra underscore to the beginning of the word:
var rgx = #"(?x)( [A-Z][a-z,0-9]+ | [A-Z]+(?![a-z]) )";
var tests = new string[] { "City",
"FirstName",
"DOB",
"PATId",
"RoomNO"};
foreach (var test in tests)
Console.WriteLine("{0} -> {1}", test,
Regex.Replace(test, rgx, "_$0").ToUpper());
// output:
// City -> _CITY
// FirstName -> _FIRST_NAME
// DOB -> _DOB
// PATId -> _PAT_ID
// RoomNO -> _ROOM_NO
Flowing from John M Gant's idea of adding underscores then capitalizing, I think this regular expression should work:
([A-Z])([A-Z][a-z])|([a-z0-9])([A-Z])
replacing with:
$1$3_$2$4
You can rename the capture zones to make the replace string a little nicer to read. Only $1 or $3 should have a value, same with $2 and $4. The general idea is to add underscores when:
There are two capital letters followed by a lower case letter, place the underscore between the two capital letters. (PATId -> PAT_Id)
There is a small letter followed by a capital letter, place the underscore in the middle of the two. (RoomNO -> Room_NO and FirstName -> First_Name)
Hope this helps.
I suggest a simple Regex to insert the underscore, and then string.ToUpper() to convert to uppercase.
Regex.Replace(test, #"(\p{Ll})(\p{Lu})", "$1_$2").ToUpper()
It's two operations instead of one, but to me it's much easier to read than one big complicated regex replace.
I can probably come up with a regex that will do it... but I believe a transformative regex may not be the right answer. I suggest you take what you already have and just chop the first character (the leading underscore) off the output. The CPU time is probably going to be the same or less that way, and your coding time inconsequential.
Try: (?x)(.)( [A-Z][a-z,0-9]+ | [A-Z]+(?![a-z]) ) and change you code to output $0_$1 instead of _$0 <--misguided and failed attempt to dream up what I said was a silly idea.
Seems like Rails does it using more than one regular expression.
var rgx = #"([A-Z]+)([A-Z][a-z])";
var rgx2 = #"([a-z\d])([A-Z])";
foreach (var test in tests)
{
var result = Regex.Replace(test, rgx, "$1_$2");
result = Regex.Replace(result, rgx2, "$1_$2");
result = result.ToUpper();
Console.WriteLine("{0} -> {1}", test, result);
}
I realize this is an old question, but it is still something that comes up often, so I have decided to share my own approach to it.
Instead of trying to do it with replacements, the idea is to find all “words” in the string and then convert them to upper case and join:
var tests = new string[] { "City",
"FirstName",
"DOB",
"PATId",
"RoomNO"};
foreach (var test in tests)
Console.WriteLine("{0} -> {1}", test,
String.Join("_", new Regex(#"^(\p{Lu}(?:\p{Lu}*|[\p{Ll}\d]*))*$")
.Match(test)
.Groups[1]
.Captures
.Cast<Capture>()
.Select(c => c.Value.ToUpper())));
Not terribly concise, but allows you to concentrate on defining what a “word” is, exactly, instead of struggling with anchors, separators and whatnot. In this case I've defined a word as something starting with an uppercase letter following by either a sequence of uppercase letters or a mix of lowercase and uppercase letters. I could have wanted to separate sequences of digits, too. "^(\p{Lu}(?:\p{Lu}*|\p{Ll}*)|\d+)*$" would do the trick. Or maybe I wanted to have the digits as a part of the previous uppercase word, then I'd do "^(\p{Lu}(?:[\p{Lu}\d]*|[\p{Ll}\d]*))*$".
There is no javascript answer here, so may as well add it.
( This is using the regex from #John McDonald )
var text = "fooBar barFoo";
var newText = text.replace(/([A-Z])([A-Z][a-z])|([a-z0-9])([A-Z])/g, "$1$3_$2$4");
newText.toLowerCase()
I want to insert a dollar sign at a specific position between two named capturing groups. The problem is that this means two immediately following dollar-signs in the replacement-string which results in problems.
How am I able to do that directly with the Replace-method? I only found a workaround by adding some temporary garbage that I instantly remove again.
See code for the problem:
// We want to add a dollar sign before a number and use named groups for capturing;
// varying parts of the strings are in brackets []
// [somebody] has [some-dollar-amount] in his [something]
string joeHas = "Joe has 500 in his wallet.";
string jackHas = "Jack has 500 in his pocket.";
string jimHas = "Jim has 740 in his bag.";
string jasonHas = "Jason has 900 in his car.";
Regex dollarInsertion = new Regex(#"(?<start>^.*? has )(?<end>\d+ in his .*?$)", RegexOptions.Multiline);
Console.WriteLine(joeHas);
Console.WriteLine(jackHas);
Console.WriteLine(jimHas);
Console.WriteLine(jasonHas);
Console.WriteLine("--------------------------");
joeHas = dollarInsertion.Replace(joeHas, #"${start}$${end}");
jackHas = dollarInsertion.Replace(jackHas, #"${start}$-${end}");
jimHas = dollarInsertion.Replace(jimHas, #"${start}\$${end}");
jasonHas = dollarInsertion.Replace(jasonHas, #"${start}$kkkkkk----kkkk${end}").Replace("kkkkkk----kkkk", "");
Console.WriteLine(joeHas);
Console.WriteLine(jackHas);
Console.WriteLine(jimHas);
Console.WriteLine(jasonHas);
Output:
Joe has 500 in his wallet.
Jack has 500 in his pocket.
Jim has 740 in his bag.
Jason has 900 in his car.
--------------------------
Joe has ${end}
Jack has $-500 in his pocket.
Jim has \${end}
Jason has $900 in his car.
Use this replacement pattern: "${start}$$${end}"
The double $$ escapes the $ so that it is treated as a literal character. The third $ is really part of the named group ${end}. You can read about this on the MSDN Substitutions page.
I would stick with the above approach. Alternately you can use the Replace overload that accepts a MatchEvaluator and concatenate what you need, similar to the following:
jackHas = dollarInsertion.Replace(jackHas,
m => m.Groups["start"].Value + "$" + m.Groups["end"].Value);
Why are you using regex for this in the first place?
string name = "Joe";
int amount = 500;
string place = "car";
string output = string.Format("{0} has ${1} in his {2}",name,amount,place);
Input string is something like this: OU=TEST:This001. We need extra "This001". Best in C#.
What about :
/OU=.*?:(.*)/
Here is how it works:
OU= // Must contain OU=
. // Any character
* // Repeated but not mandatory
? // Ungreedy (lazy) (Don't try to match everything)
: // Match the colon
( // Start to capture a group
. // Any character
* // Repeated but not mandatory
) // End of the group
For the / they're delimiters to know where the regex start and where it ends (and for adding options).
The captured group will contain This001.
But it would be faster with a simple Substring().
yourString.Substring(yourString.IndexOf(":")+1);
Resources :
regular-expressions.info
"OU=" smells like you're doing an Active Directory or LDAP search and responding to the results. While regex is a brilliant tool, I just wanted to make sure that you're also aware of the excellent System.DirectoryServices.Protocols classes that were made for parsing, filtering and manipulating just this sort of data.
The SearchResult, SearchResultEntry and DirectoryAttribute in particular would be the friends you might be looking for. I don't doubt that you can regex or substring as cleverly as the next guy but it's also nice to have another good tool in the toolbox.
Have you tried these classes?
A solution without regex:
var str = "OU=TEST:This00:1";
var result = str.Split(new char[] { ':' }, 2)[1];
// result == This00:1
Regex vs Split vs IndexOf
Split
var str = "OU=TEST:This00:1";
var sw = new Stopwatch();
sw.Start();
var result = str.Split(new char[] { ':' }, 2)[1];
sw.Stop();
// sw.ElapsedTicks == 15
Regex
var str = "OU=TEST:This00:1";
var sw = new Stopwatch();
sw.Start();
var result = (new Regex(":(.*)", RegexOptions.Compiled)).Match(str).Groups[1];
sw.Stop();
// sw.ElapsedTicks == 7000 (Compiled)
IndexOf
var str = "OU=TEST:This00:1";
var sw = new Stopwatch();
sw.Start();
var result = str.Substring(str.IndexOf(":") + 1);
sw.Stop();
// sw.ElapsedTicks == 40
Winner: Split
Links
Split
IndexOf
Regex
if the OU=TEST: is your requirement before the string you want to match, use this regex:
(?<=OU\s*=\s*TEST\s*:\s*).*
that regex matches any length of text after the colon, whereas any text before the colon is just a requirement.
You can replace TEST with [A-Za-z]+ to match any text other than TEST, or you can replace TEST with [\w]+ to match any length of any combination of alphabet and numbers.
\s* means it might be any number of whitespaces or nothing in that position, remove it if you don't need such a check.