C# Regex Replacement Not Working

C# Regex Replacement Not Working - c#

I'm trying to remove new lines from a text file. Opening the text file in notepad doesn't reveal the line breaks I'm trying to remove (it looks like one big wall of text), however when I open the file in sublime, I can see them.
In sublime, I can remove the pattern '\n\n' and then the pattern '\n(?!AAD)' no problem. However, when I run the following code, the resulting text file is unchanged:
public void Format(string fileloc)
{
string str = File.ReadAllText(fileloc);
File.WriteAllText(fileloc + "formatted", Regex.Replace(Regex.Replace(str, "\n\n", ""), "\n(?!AAD)", ""));
}
What am I doing wrong?

If you do not want to spend hours trying to re-adjust the code for various types of linebreaks, here is a generic solution:
string str = File.ReadAllText(fileloc);
File.WriteAllText(fileloc + "formatted",
Regex.Replace(Regex.Replace(str, "(?:\r?\n|\r){2}", ""), "(?:\r?\n|\r)(?!AAD)", "")
);
Details:
A linebreak can be matched with (?:\r?\n|\r): an optional CR followed with a single obligatory LF. To match 2 consecutive linebreaks, a limiting quantifier can be appended - (?:\r?\n|\r){2}.

An empirical solution. Opening your sample file in binary mode revealed that it contains 0x0D characters, which are carriage returns \r. So I came up with this (multiple lines for easier debugging):
public void Format(string fileloc)
{
var str = File.ReadAllText(fileloc);
var firstround = Regex.Replace(str, #"\r\r", "");
var secondround = Regex.Replace(firstround, #"\r(?!AAD)", "");
File.WriteAllText(fileloc + "formatted", secondround);
}

Is this possibly a windows/linux mismatch? Try replacing '\r\n' instead.

Related

How to add suffix at each line of large string

I have written a code to add Suffix at end of each line of a multi-line String but code only appends at the end of string. I am beginner. Can somebody help me in clarifying where I am mistaken? Here is my code:
protected void Prefix_Suffix_Btn_Click(object sender, EventArgs e)
{
String txt_input = Input_id.InnerText.ToString().Trim();
String txt_suffix = Suffix_id.InnerText.ToString().Trim();
String txt_output = Output_id.InnerText.ToString().Trim();
txt_input = txt_input.Replace(txt_suffix + "\n", "\n");
txt_input = txt_input + txt_suffix;
Output_id.InnerText = txt_input;
}
Input:
Line1
Line2
Line3
Desired output:
Line1AppededText
Line2AppendedText
Line3AppendedText

Let's Split text to lines, append each line and, finally, Join into string back:
string source = string.Join(Environment.NewLine,
"Line1",
"Line2",
"Line3");
// Let's have a look at the initial string;
Console.WriteLine(source);
Console.WriteLine();
string result = string.Join(Environment.NewLine, source
.Split(new string[] { Environment.NewLine }, StringSplitOptions.None)
.Select(line => line + "AppendedText"));
Console.Write(result);
Outcome:
Line1
Line2
Line3
Line1AppendedText
Line2AppendedText
Line3AppendedText

The string of that comes out of your Input_id.InnerText is a string that consists of many lines. So if you want to append to each line, you need think of a way to treat those lines separately.
A line-end is denoted as the character '\n'. It looks like 2 characters to you, but the engine will treat it as one: line-end.
What you can do is split (break up) this string into multiple strings by snapping the string whenever you find a '\n'. You can do this by the following:
var lines = Input_id.InnerText.ToString().Split('\n');
Now lines contains an array of strings, each item in there containing a line of the input.
Now you could create a new string that will be built up by your split array as follows:
var newString = "";
foreach(var line in lines) {
newString += line + "<appendText>\n"; //note how we add the \n again since those disappeared by splitting
}
Now newString will contain the new string with each line containing the appended text.
A way shorter answer would be to for instance use the replace function like this:
var newString = Input_id.InnerText.ToString().Replace("\n", "<AppendedText>\n");
There is many ways to do what you want.

You just made a mistake when passing your values into the Replace() method. The documentation for String.Replace() defines it like this:
public string Replace (string oldValue, string newValue);
The first argument ("oldValue") should be the thing you want to replace. The second argument ("newValue") should be the thing you want to change it to. You've just got them the wrong way round. You're asking it to replace the new text (suffix and newline) with the old text (just the newline), which clearly it can't do because the suffix text doesn't exist in the string yet - and it wouldn't be logical even if it worked.
Change
txt_input = txt_input.Replace(txt_suffix + "\n", "\n");
to
txt_input = txt_input.Replace("\n", txt_suffix + "\n");
and you should be fine. As other answers alluded to, there may be nicer ways of achieving the same output, but in terms of fixing your original code this is all you should need to do.
Here's a live demo (just using console output instead of HTML elements): https://dotnetfiddle.net/jnzgUy

How to contact whole text from file into the string avoiding empty lines beetwen strings

How to get whole text from document contacted into the string. I'm trying to split text by dot: string[] words = s.Split('.'); I want take this text from text document. But if my text document contains empty lines between strings, for example:
pat said, “i’ll keep this ring.”
she displayed the silver and jade wedding ring which, in another time track,
she and joe had picked out; this
much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
result looks like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track
3. she and joe had picked out this
4. much of the alternate world she had elected to retain
5. he wondered what if any legal basis she had kept in addition
6. none he hoped wisely however he said nothing
7. better not even to ask
but desired correct output should be like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track she and joe had picked out this much of the alternate world she had elected to retain
3. he wondered what if any legal basis she had kept in addition
4. none he hoped wisely however he said nothing
5. better not even to ask
So to do this first I need to process text file content to get whole text as single string, like this:
pat said, “i’ll keep this ring.” she displayed the silver and jade wedding ring which, in another time track, she and joe had picked out; this much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
I can't to do this same way as it would be with list content for example: string concat = String.Join(" ", text.ToArray());,
I'm not sure how to contact text into string from text document

I think this is what you want:
var fileLocation = #"c:\\myfile.txt";
var stringFromFile = File.ReadAllText(fileLocation);
//replace Environment.NewLine with any new line character your file uses
var withoutNewLines = stringFromFile.Replace(Environment.NewLine, "");
//modify to remove any unwanted character
var withoutUglyCharacters = Regex.Replace(withoutNewLines, "[“’”,;-]", "");
var withoutTwoSpaces = withoutUglyCharacters.Replace(" ", " ");
var result = withoutTwoSpaces.Split('.').Where(i => i != "").Select(i => i.TrimStart()).ToList();
So first you read all text from your file, then you remove all unwanted characters and then split by . and return non empty items

Have you tried replacing double new-lines before splitting using a period?
static string[] GetSentences(string filePath) {
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
var sentences = Regex.Split(lines, #"\.[\s]{1,}?");
return sentences;
}
I haven't tested this, but it should work.
Explanation:
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
Throws an exception if the file could not be found. It is advisory you surround the method call with a try/catch.
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
Creates a string, and ignores any lines which are purely whitespace or empty.
var sentences = Regex.Split(lines, #".[\s]{1,}?");
Creates a string array, where the string is split at every period and whitespace following the period.
E.g:
The string "I came. I saw. I conquered" would become
I came
I saw
I conquered
Update:
Here's the method as a one-liner, if that's your style?
static string[] SplitSentences(string filePath) => File.Exists(filePath) ? Regex.Split(string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line))), #"") : null;

I would suggest you to iterate through all characters and just check if they are in range of 'a' >= char <= 'z' or if char == ' '. If it matches the condition then add it to the newly created string else check if it is '.' character and if it is then end your line and add another one :
List<string> lines = new List<string>();
string line = string.Empty;
foreach(char c in str)
{
if((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20)
line += c;
else if(c == '.')
{
lines.Add(line.Trim());
line = string.Empty;
}
}
Working online example
Or if you prefer "one-liner"s :
IEnumerable<string> lines = new string(str.Select(c => (char)(((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20) ? c : c == '.' ? '\n' : '\0')).ToArray()).Split('\n').Select(s => s.Trim());

I may be wrong about this. I would think that you may not want to alter the string if you are splitting it. Example, there are double/single quote(s) (“) in part of the string. Removing them may not be desired which brings up the possibly of a question, reading a text file that contains single/double quotes (as your example data text shows) like below:
var stringFromFile = File.ReadAllText(fileLocation);
will not display those characters properly in a text box or the console because the default encoding using the ReadAllText method is UTF8. Example the single/double quotes will display (replacement characters) as diamonds in a text box on a form and will be displayed as a question mark (?) when displayed to the console. To keep the single/double quotes and have them display properly you can get the encoding for the OS’s current ANSI encoding by adding a parameter to the ReadAllText method like below:
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
Below is code using a simple split method to .split the string on periods (.) Hope this helps.
private void button1_Click(object sender, EventArgs e) {
string fileLocation = #"C:\YourPath\YourFile.txt";
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
string bigString = stringFromFile.Replace(Environment.NewLine, "");
string[] result = bigString.Split('.');
int count = 1;
foreach (string s in result) {
if (s != "") {
textBox1.Text += count + ". " + s.Trim() + Environment.NewLine;
Console.WriteLine(count + ". " + s.Trim());
count++;
}
else {
// period at the end of the string
}
}
}

How to strip a string from the point a hyphen is found within the string C#

I'm currently trying to strip a string of data that is may contain the hyphen symbol.
E.g. Basic logic:
string stringin = "test - 9894"; OR Data could be == "test";
if (string contains a hyphen "-"){
Strip stringin;
output would be "test" deleting from the hyphen.
}
Console.WriteLine(stringin);
The current C# code i'm trying to get to work is shown below:
string Details = "hsh4a - 8989";
var regexItem = new Regex("^[^-]*-?[^-]*$");
string stringin;
stringin = Details.ToString();
if (regexItem.IsMatch(stringin)) {
stringin = stringin.Substring(0, stringin.IndexOf("-") - 1); //Strip from the ending chars and - once - is hit.
}
Details = stringin;
Console.WriteLine(Details);
But pulls in an Error when the string does not contain any hyphen's.

How about just doing this?
stringin.Split('-')[0].Trim();
You could even specify the maximum number of substrings using overloaded Split constructor.
stringin.Split('-', 1)[0].Trim();

Your regex is asking for "zero or one repetition of -", which means that it matches even if your input does NOT contain a hyphen. Thereafter you do this
stringin.Substring(0, stringin.IndexOf("-") - 1)
Which gives an index out of range exception (There is no hyphen to find).
Make a simple change to your regex and it works with or without - ask for "one or more hyphens":
var regexItem = new Regex("^[^-]*-+[^-]*$");
here -------------------------^

It seems that you want the (sub)string starting from the dash ('-') if original one contains '-' or the original string if doesn't have dash.
If it's your case:
String Details = "hsh4a - 8989";
Details = Details.Substring(Details.IndexOf('-') + 1);

I wouldn't use regex for this case if I were you, it makes the solution much more complex than it can be.
For string I am sure will have no more than a couple of dashes I would use this code, because it is one liner and very simple:
string str= entryString.Split(new [] {'-'}, StringSplitOptions.RemoveEmptyEntries)[0];
If you know that a string might contain high amount of dashes, it is not recommended to use this approach - it will create high amount of different strings, although you are looking just for the first one. So, the solution would look like something like this code:
int firstDashIndex = entryString.IndexOf("-");
string str = firstDashIndex > -1? entryString.Substring(0, firstDashIndex) : entryString;

you don't need a regex for this. A simple IndexOf function will give you the index of the hyphen, then you can clean it up from there.
This is also a great place to start writing unit tests as well. They are very good for stuff like this.
Here's what the code could look like :
string inputString = "ho-something";
string outPutString = inputString;
var hyphenIndex = inputString.IndexOf('-');
if (hyphenIndex > -1)
{
outPutString = inputString.Substring(0, hyphenIndex);
}
return outPutString;

Splitting strings using Environment.Newline leaves \n in most array items?

I used MyString.Split(Environment.Newline.ToCharArray()[0]) to split my string from a file into different pieces. But, every item in the array, except the first one starts with \n after I did that? I know the way that I'm splitting by newlines is kind of "cheaty" for lack of a better word, so if there is a better way of doing this, please tell me...
Here is the file...

If you are wanting to maintain using the .Split() instead of reading a file in a line at a time you can do...
var splitResult = MyString.Split( new string[]{ System.Environment.NewLine },
System.StringSplitOptions.RemoveEmptyEntries );
/* or System.StringSplitOptions.None if you want empty results as well */
EDIT:
The problem you were having is that in a non-unix environment the new-line "character" is actually two characters. So when you grabbed the zero index you were actually splitting on a carriage return...not the new-line character (\n).
Windows = "\r\n"
Unix = "\n"
Per http://msdn.microsoft.com/en-us/library/system.environment.newline.aspx

A newline in Windows is two characters (\r and \n). The Environment.Newline.ToCharArray()[0] expression specifies only one of those characters: \r. Therefore, the other character (\n) remains as a portion of the split string.
My I suggest you read your file using something like this:
public IEnumerable<string> ReadFile(string filePath)
{
using (StreamReader rdr = new StreamReader(filePath))
{
string line;
while ( (line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
You might need more error handling, or to specify different file open option, or to pass a stream to method rather than the path, but the idea of using an iterator over the ReadLine() method is sound. The result is you can just use code like this:
foreach (string line in ReadLine(" ... my file path ... "))
{
}

How to format the given xml into single line (without spaces)

Using C# how can i format a given xml file into a single single line (without spaces)?
My output is giving symbols if there are spaces and new lines.

Use this:
public static string StripXmlWhitespace(string Xml)
{
Regex Parser = new Regex(#">\s*<");
Xml = Parser.Replace(Xml, "><");
return Xml.Trim();
}

You can use string's Replace method to format xmlString and then save it to output:
string singleLineXml = xml.Replace(System.Environment.NewLine, " ")
or
string singleLineXml = xml.Replace("\r\n", " ")
After removing line breaks > remove spaces:
singleLineXml.Remove(' ');
Yes #Steve Wellens, Remove(' ') is a bad idea.. let's try
singleLineXml.Replace("> <","><");
And i found relative thread, may be it helps Writing string to XML file without formatting (C#)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.