I have a csv file. It is not so big, the problem is this. The end of the line has these two characters at the end of every line cr lf.
Unfortunately in only one single register there is a column with a LF character in the middle. When I try to import the document, this character generates a conflict.
The line looks like this in Notepad++
text1, text2,te(LF)
xt3, text4 (CR LF)
And I need this
text1, text2,text3, text4 (CR LF)
Now, mi question is, how can I delete this character in C# without affecting the end of the row?
Regards
Try this code:
string result = Regex.Replace(text, #"([^\r])\n", "$1");
you simply replace any new line that does not come just after CR with just what comes before it.
Ideone sample
Delete all LF. Then replace all CR by CR,LF. Use string.Replace for this.
Related
I need to import a multi-line string into a single cell of CSV
string message = "This is 1st string.
This is 2nd string.
This is 3rd string."
When I try to import string into CSV it splits up in multiple rows.
If I remove new line chars from "message" the string is added as single line.
I need to add message as it is into a single cell (multi-line into single cell).
I tried following codes:
var regex = new Regex(#"\r\n?|\t", RegexOptions.Compiled);
message= regex.Replace(message, String.Empty);
char lf = (char)10;
message= message.Replace("\n", lf.ToString());
But the message is added to multiple rows in CSV.
You should try to remove the br tag (with <> ) in case it contains html, and maybe you can replace like:
.Replace(br tag between <>,"");
.Replace(Enviroment.NewLine,"");
And I´m not sure if the regex would replace one by one, but try this:
.Replace("\r","")
ant then:
.Replace("\n","")
Hope it works
Using \r\n?|\t, you replace each carriage return, possibly followed by a line feed, but also every tab (the |\t means 'or a tab')
I guess you probably want to replace \r\n?\t*, i.e. a carriage return, possibly followed by a line feed, followed by an undetermined number of tabs, and then replace this whole block.
I have a following line which I want to split it by comma.
"Clark Kent,Hello Mr.Wayne,发送于 3:38 PM。"
Sounds easy right? The problem is the text does not contain single comma character. The commas you see in the text is a single character which is combination of comma and space(Just copy paste the above sentence in your text editor and check it out).
The problem is: I need to split the text by comma. Although I can copy paste the character add it as one of my delimiter characters, I am wondering if I could just convert such texts into text that could be splitted by comma. Well don't worry about Chinese words for now. The similar is the case with the last character you see in the text. Actually, this behavior arises when my application Language is set to Chinese.
FYI: I thought such comma is non printable/non ascii char but to my surprise, when I printed the text in console, I got:
Here is my input and expected output:
Input : "Clark Kent,Hello Mr.Wayne,发送于 3:38 PM。"
Expected output: {"Clark Kent", "Hello Mr.Wayne", "发送于 3:38 PM。"}.
The comma you are facing is a 'Fullwidth Comma' (Hex: 0xff0c). A regular unicode character which can be replaced by a comma and space using the string.Replace method:
s.Replace("<fullwidthComma>", "<trueComma><space>");
What I suggest (same as #Chris suggested in comments) is to replace your strange comma value with regular comma before split.
var s = "Clark Kent,Hello Mr.Wayne,发送于 3:38 PM。";
s = s.Replace(',', ',');
var splitted = s.Split(',');
Benefit is that if it find strange comma, it will replace else it will process with regular comma.
I want to filter a regex with a ... regex ...
My target is in a file which content is
...
information 1...
Entity1=^\|1[\s\t]+[\S]+[\s\t]+(.*)$
information 2...
...
The file is transferred to mystring with the method ReadAllText(path); where path is the path to the text file.
I use the code
//Retrieve regex like ^\|1[\s\t]+[\S]+[\s\t]+(.*)$ in Entity1=^\|1[\s\t]+[\S]+[\s\t]+(.*)$
//\d for any digit followed by =
// . for any character found 1 or + times, ended with space character
m = Regex.Match(mystring, #"Entity\d=(.+)\s");
string regex = m.Groups[1].Value;
which works almost fine
What I get is ( seen from inside the degugger )
^\|1[\s\t]+[\S]+[\s\t]+(.*)$\r
There is an additional \r at the end of the result. It causes an unwanted extra newline in other parts of the code.
Trying #"Entity\d=(.+)" (i.e removing the final \s) does not help.
Any idea of how to avoid the additionnal \r gracefully ( I do not want,if possible, to track the finale \r and remove it )
Online regex tester like regex101 did not permit to foresee this problem before going to C# code
Use a negated character class to make sure \r is not matched:
m = Regex.Match(mystring, #"Entity\d=([^\r\n]+)");
The [^\r\n] class means match any character other than a carriage return and a line feed.
It is true that regex101 does not keep carriage returns. You can see the \r matching at regexhero.net:
Check if this works:
#"Entity\d=(.+)(?=(\r|\n))";
(?=(\r|\n)) is a positive lookahead and means that the \r or \n won't be included in the result.
Edit:
#"Entity\d=(.+?)(?=\r|\n)";
With C#, I'm trying now to use Regular Expressions to replace newline (\n) in a text file by semicolon (;), but just if there is any content in that line;
If the text file is:
This is the program
Hello World
Then my return would be
This is the program;
Hello World;
I'm trying to use
my_str = Regex.Replace(val, "\n", ";");
But it affects also the lines without content.
Try capturing 1+ characters followed by a newline?
(.+)[\r\n]?
\1;\n
C#:
my_str = Regex.Replace(val, "(.+)[\r\n]?", "$1;\n");
I think something like this will work.
my_str = Regex.Replace(val, "(?<prev>.+)\\n", "${prev};\\n");
I am running into some problems using the regular expression. Can you please help me out? The following in the problem I am trying to solve -
Input: :,... :D..:::))How are you today :P?..:(*
Output :D :) :P :(
Basically I want to remove the punctuations and text from the input string like-(.,:; etc) and replace them with empty string. But I want to keep the smilies -:) ,:( OR :P .I have written the following code but it is not working.
Regex= "[A-Za-z]|:[D(P(]"
but it also remove the ":D and :P" smilie.
The following regex string should work for you:
(((?<!:)[^:])|(:(?![PD\(\)])))[^:]*
It's made up of two parts:
( ((?<!:)[^:]) | (:(?![PD\(\)])) )
[^:]*
The first part is an OR (|) statement that uses Negative Lookahead and Lookbehind. It finds the first character in a block of text that doesn't contain a smiley by looking for either:
A character that is obviously not in a smiley:
Any character that is not preceded by a colon: (?<!:)
and is not a colon itself: [^:]
OR a colon that is not followed by a smiley character:
A colon :
That is not followed by a character that is the second half of a smiley: (?![PD\(\)]))
The second part ([^:]*) continues looking until we find the beginning of a potential smiley (a colon).
This Regex currently only finds the following smileys:
:D
:P
:(
:)
You can update the second half of the OR statement to find other smileys.
To sum it up, this Regex should find everything that is not part of a smiley. You can simply declare it in a Regex variable and then call .Replace(string input, string replacement), passing in your input string and the string you want to replace the non-smiley characters with (String.Empty in this case).
Not so perfect solution:
string text = ":,... :D..:::))How are you today :P?..:(*";
text = text.Replace(":)", "###)");
text = text.Replace(":D", "###D");
text = text.Replace(":P", "###P");
// clean up your punctuation marks here
//
text = text.Replace("###)", ":)");
text = text.Replace("###D", ":D");
text = text.Replace("###P", ":P");