C# regular expression match triple quotes """ - c#

I have a text file that contain 3 quotations (""") in various lines in text. It also have 6 blank spaces before that in every line.
I have tried doing #"\s{6}\"{3}"; and various cases, but it seems like c# doesn't like when it sees 3 quotations mark together. What I'm trying to do is to find that and add a new line after.
This is what i have tried:
string pattern4 = #"\s{6}"{3}";
var match4 = Regex.Match(body, pattern4, RegexOptions.Multiline);
while (match4.Success)
{
string index = """;
output.Insert(index, "\r\n");
}
Sample Input:
"""Step: 33 And I enter
Step: 34 And I set the
Desired Output:
"""
Step: 33 And I enter
Step: 34 And I set THE

To escape a quote inside a verbatim string (starts with #) use double quotes. Also there is a Regex.Replace method that you could use like this:
string input = #" """"""Step: 33 And I enter
Step: 34 And I set the ";
string pattern = #"\s{6}""{3}";
string replacement = "\"\"\"\r\n";
string output = Regex.Replace(input, pattern, replacement);

Related

Locate RegEx match then extract

I am trying to read text from a RichTextBox in order to locate the first occurrence of a matched expression. I would then like to extract the string that satisfies they query so I can use it as a variable. Below is the basic bit of code I have to start of with and build upon.
private string returnPostcode()
{
string[] allLines = rtxtDocViewer.Text.Split('\n');
string expression = string expression = "^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$"
foreach (string line in allLines)
{
if (Regex.Matches(line, expression, RegexOptions.Count > 0)
{
//extract and return the string that is found
}
}
}
Example of what's contained in the RichTextBox is below. I want to extract "E12 8SD" which the above regex should be able to find. Thanks
Damon Brown
Flat B University Place
26 Park Square
London
E12 8SD
Mobile: 1111 22222
Email: dabrown192882#gmail.com Date of birth: 21/03/1986
Gender: Male
Marital Status: Single
Nationality: English
Summary
I have acquired a multifaceted skill set with experience using several computing platforms.
You need to use Regex.IsMatch and remove the RegexOptions.Count > 0
string[] allLines = s.Split('\n');
string expression = "^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$";
foreach (string line in allLines)
{
if (Regex.IsMatch(line, expression)) // Regex.IsMatch will check if a string matches the regex
{
Console.WriteLine(line); // Print the matched line
}
}
See the IDEONE Demo
Quite possible that your text contains CR+LF line breaks. Then, adjust your code as follows:
string[] allLines = s.Split(new[] {"\r\n"}, StringSplitOptions.RemoveEmptyEntries);
See this demo
UPDATE
To just extract the code with your regex, you need not split the contents into lines, just use a Regex.Match on the whole text:
string s = "Damon Brown\nFlat B University Place\n26 Park Square \nLondon\nTW1 1AJ Twickenham Mobile: +44 (0) 7711223344\nMobile: 1111 22222\nEmail: dabrown192882#gmail.com Date of birth: 21/03/1986\nGender: Male\nMarital Status: Single\nNationality: English\nSummary\nI have acquired a multifaceted skill set with experience using several computing platforms.";
string expression = #"(?i)\b(gir 0a{2})|((([a-z][0-9]{1,2})|(([a-z][a-hj-y][0-9]{1,2})|(([a-z][0-9][a-z])|([a-z][a-hj-y][0-9]?[a-z])))) [0-9][a-z]{2})\b";
Match res = Regex.Match(s, expression);
if (res.Success)
Console.WriteLine(res.Value); // = > TW1 1AJ
I also removed the uppercase ranges to replace them with a case-insensitive modifier (?i).
See this IDEONE demo.

Replace special characters or special characters followed by space

I have this particular string:
Administrationsomkostninger I -2.889 - r0.l l0
I would like to replace these characters:r,l and i with 1.
I use this expression:
([(t|r|l|i|)])
That gives me this string:
Adm1n1s11a11onsomkos1n1nge1 1 -2.889 - 10.1 10
Now i want to replace the all digits that contains a digit followed + a whitespace
so in this case only - 10.1 10 gets converted to -10.110
Try this
string input = "Administrationsomkostninger I -2.889 - r0.l l0";
string pattern = #"(?'spaces'\s){2,}";
string output = Regex.Replace(input, pattern, " ");
​

how to replace a string having single quote with some characters in C#

I need a code which will search if a string contains single quote ' before a character and that single quote should be replaced with two single quotes ''.
example-:
input = "test's"
output = "test''s"
input = "test'"
output = "test'"
input = "test' "
output = "test' "
Use positive lookahead to check if next character is a word:
string input = "test's";
var result = Regex.Replace(input, #"'(?=\w)", #"""");
This code uses regular expression to replace match in input string with double quotes. Pattern to match is '(?=\w). It contains single quote and positive lookahead of next character (character itself will not be included in match). If match is found (i.e. input contains single quote followed by word character, then quote is replaced with given string (double quote in this case).
UPDATE: After your edit and comments, correct replacement should look like
var result = Regex.Replace(input, "'(?=[a-zA-Z])", "''");
Inputs:
"test's"
"test'"
"test' "
"test'42"
"Mr Jones' test isn't good - it's bad"
Outputs:
"test''s"
"test'"
"test' "
"test'42"
"Mr Jones' test isn''t good - it''s bad"
try this way
String input = input.Replace("'","\"");

String Splitting whenever there is a space and next character is a numeral

I want to split a string whenever there is space and next character is numeral.
For example input string:
Golden State 97 Indiana 108 (FINAL)
It should be splitted as:
string[0]:Golden State
string[1]:97
string[2]:Indiana
string[3]:108
string[4]:FINAL
Please help me out of this.
Thank you
If you use the regular expression
var regex = new Regex(#"^(.*) (\d+) (.*) (\d+) \((.*)\)$");
then you can get your values as
var m = regex.Match("Golden State 97 Indiana 108 (FINAL)");
if (m.Success)
{
var string0 = m.Groups[1].Value; // Golden State
var string1 = m.Groups[2].Value; // 97
var string2 = m.Groups[3].Value; // Indiana
var string3 = m.Groups[4].Value; // 108
var string4 = m.Groups[5].Value; // FINAL
}
(as long as none of the team names have any groups of digits in!)
You could do this via splitting, but there's no need.
I'd recommend a regular expression to parse the string.
Here is one I've quickly created
"^([a-zA-Z\s]+) (\d+) ([a-zA-Z\s]+) (\d+) (.+)$"
This will match the various parts assuming it is always in the shape you've described above.
Read up about the RegEx class on MSDN. The parts of my RegEx in brackets are called groups. Once you've matched the string against the RegEx, you can access the Groups and each group will contain a part of the string e.g. "Golden State".
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
Yet another solution with regular expressions. This is a bit more generic than the other solutions, because the input format is not fixed.
var input = "Golden State 97 Indiana 108 (FINAL)";
var regex = new Regex(#"(?:[a-z][a-z ]*)?[a-z]|\d+", RegexOptions.IgnoreCase);
var values = regex.Matches(input).Cast<Match>().Select(m => m.Value).ToArray();

C# - Removing a Line that matches a Regex

I have some data.. it looks similar to this:
0423 222222 ADH, TEXTEXT
0424 1234 ADH,MORE TEXT
0425 98765 ADH, TEXT 3609
2000 98765-4 LBL,IUC,PCA,S/N
0010 99999-27 LBL,IUI,1.0x.25
9000 12345678 HERE IS MORE, TEXT
9010 123-123 SOMEMORE,TEXT1231
9100 SD178 YAYFOR, TEXT01
9999 90123 HEY:HOW-TO DOTHIS
And I would like to remove each entire line that begins with a 9xxx. Right now I have tried Replacing the value using Regex. Here is what I have for that:
output = Regex.Replace(output, #"^9[\d]{3}\s+[\d*\-*\w*]+\s+[\d*\w*\-*\,*\:*\;*\.*\d*\w*]+", "");
However, this is really hard to read and it actually does not delete the entire line.
CODE:
Here is the section of the code I am using:
try
{
// Resets the formattedTextRichTextBox so multiple files aren't loaded on top of eachother.
formattedTextRichTextBox.ResetText();
foreach (string line in File.ReadAllLines(openFile.FileName))
{
// Uses regular expressions to find a line that has, digit(s), space(s), digit(s) + letter(s),
// space(s), digit(s), space(s), any character (up to 25 times).
Match theMatch = Regex.Match(line, #"^[\.*\d]+\s+[\d\w]+\s+[\d\-\w*]+\s+.{25}");
if (theMatch.Success)
{
// Stores the matched value in string output.
string output = theMatch.Value;
// Replaces the text with the required layout.
output = Regex.Replace(output, #"^[\.*\d]+\s+", "");
//output = Regex.Replace(output, #"^9[\d]{3}\s+[\d*\-*\w*]+\s+[\d*\w*\-*\,*\:*\;*\.*\d*\w*]+", "");
output = Regex.Replace(output, #"\s+", " ");
// Sets the formattedTextRichTextBox to the string output.
formattedTextRichTextBox.AppendText(output);
formattedTextRichTextBox.AppendText("\n");
}
}
}
OUTCOME:
So what I would like the new data to look like is in this format (removed 9xxx):
0423 222222 ADH, TEXTEXT
0424 1234 ADH,MORE TEXT
0425 98765 ADH, TEXT 3609
2000 98765-4 LBL,IUC,PCA,S/N
0010 99999-27 LBL,IUI,1.0x.25
QUESTIONS:
Is there an easier way to go about this?
If so, can I use regex to go about this or must I use a different way?
Just reformulate the regex that tests your format to match everything that doesn't begin with 9 - that way lines starting with 9 are not added to the rich text box.
Try this(Uses Linq):
//Create a regex to identify lines that start with 9XXX
Regex rgx = new Regex(#"^9\d{3}");
//Below is the linq expression to filter the lines that start with 9XXX
var validLines =
(
//This following line specifies what enumeration to pick the data from
from ln in File.ReadAllLines(openFile.FileName)
//This following specifies what is the filter that needs to be applied to select the data.
where !rgx.IsMatch(ln)
//This following specifies what to select from the filtered data.
select ln;
).ToArray(); //This line makes the IQueryable enumeration to an array of Strings (since variable ln in the above expression is a String)
//Finally join the filtered entries with a \n using String.Join and then append it to the textbox
formattedTextRichTextBox.AppendText = String.Join(validLines, "\n");
Yes, there is a simpler way. Just use Regex.Replace method, and provide Multiline option.
Why don't you just match the first 9xxx part the use a wildcard to match the rest of the line, it would be a lot more readable.
output = Regex.Replace(output, #"^9[\d{3}].*", "")

Categories