How to replace new line character with csv newline character - c#

I need to import a multi-line string into a single cell of CSV
string message = "This is 1st string.
This is 2nd string.
This is 3rd string."
When I try to import string into CSV it splits up in multiple rows.
If I remove new line chars from "message" the string is added as single line.
I need to add message as it is into a single cell (multi-line into single cell).
I tried following codes:
var regex = new Regex(#"\r\n?|\t", RegexOptions.Compiled);
message= regex.Replace(message, String.Empty);
char lf = (char)10;
message= message.Replace("\n", lf.ToString());
But the message is added to multiple rows in CSV.

You should try to remove the br tag (with <> ) in case it contains html, and maybe you can replace like:
.Replace(br tag between <>,"");
.Replace(Enviroment.NewLine,"");
And I´m not sure if the regex would replace one by one, but try this:
.Replace("\r","")
ant then:
.Replace("\n","")
Hope it works

Using \r\n?|\t, you replace each carriage return, possibly followed by a line feed, but also every tab (the |\t means 'or a tab')
I guess you probably want to replace \r\n?\t*, i.e. a carriage return, possibly followed by a line feed, followed by an undetermined number of tabs, and then replace this whole block.

Related

Remove redundant characters from string after importing a file

I'm working on application where my goal is to import a data from .txt file and store it in database.
One row in that file looks like this: .txt file for import
Lets take a look at "Papua New Guinea" which I marked with red square in the previous image.
So after importing this file using IFormFile I get something like this: List of items in code
My plan is to store this values to database, but I am having redudant characters as can be seen in previous picture "\"Papua New Guinea\"".
How can I remove those redudant characters? Having in mind that not every item will have those redudant (\") characters (2nd image you can see some integer values)
The \ is used as escape character to show you that the following " does not mark the end of the string but is that it is part of it. The escape character is not part of the string. I.e., in "\"Papua New Guinea\"", the string begins with the characters ", P, a and ends with e, a, ".
You can use an overload of Trim accepting characters to be trimmed from the beginning and the end of the string.
string s = "\"Papua New Guinea\"";
string trimmed = s.Trim('"');
Note that since character literals are delimited with single quotes, you can specify the double quote character without escape character.
Those slashes wont' actually show up if you try to print or do anything with the string. They're just there because strings start with a " and \" is a way of adding a quotation mark to a string. Just like \n adds a new line. The integers don't have them because they don't include a quotation mark.

How to split a text having a character which is combination of comma and space?

I have a following line which I want to split it by comma.
"Clark Kent,Hello Mr.Wayne,发送于 3:38 PM。"
Sounds easy right? The problem is the text does not contain single comma character. The commas you see in the text is a single character which is combination of comma and space(Just copy paste the above sentence in your text editor and check it out).
The problem is: I need to split the text by comma. Although I can copy paste the character add it as one of my delimiter characters, I am wondering if I could just convert such texts into text that could be splitted by comma. Well don't worry about Chinese words for now. The similar is the case with the last character you see in the text. Actually, this behavior arises when my application Language is set to Chinese.
FYI: I thought such comma is non printable/non ascii char but to my surprise, when I printed the text in console, I got:
Here is my input and expected output:
Input : "Clark Kent,Hello Mr.Wayne,发送于 3:38 PM。"
Expected output: {"Clark Kent", "Hello Mr.Wayne", "发送于 3:38 PM。"}.
The comma you are facing is a 'Fullwidth Comma' (Hex: 0xff0c). A regular unicode character which can be replaced by a comma and space using the string.Replace method:
s.Replace("<fullwidthComma>", "<trueComma><space>");
What I suggest (same as #Chris suggested in comments) is to replace your strange comma value with regular comma before split.
var s = "Clark Kent,Hello Mr.Wayne,发送于 3:38 PM。";
s = s.Replace(',', ',');
var splitted = s.Split(',');
Benefit is that if it find strange comma, it will replace else it will process with regular comma.

Delete Line Feed character in c#

I have a csv file. It is not so big, the problem is this. The end of the line has these two characters at the end of every line cr lf.
Unfortunately in only one single register there is a column with a LF character in the middle. When I try to import the document, this character generates a conflict.
The line looks like this in Notepad++
text1, text2,te(LF)
xt3, text4 (CR LF)
And I need this
text1, text2,text3, text4 (CR LF)
Now, mi question is, how can I delete this character in C# without affecting the end of the row?
Regards
Try this code:
string result = Regex.Replace(text, #"([^\r])\n", "$1");
you simply replace any new line that does not come just after CR with just what comes before it.
Ideone sample
Delete all LF. Then replace all CR by CR,LF. Use string.Replace for this.

Match regex pattern in a line of text without targeting the text within quotations

Stackoverflow has been very generous with answers to my regex questions so far, but with this one I'm blanking on what to do and just can't seem to find it answered here.
So I'm parsing a string, let's say for example's sake, a line of VB-esque code like either of the following:
Call Function ( "Str ing 1 ", "String 2" , " String 3 ", 1000 ) As Integer
Dim x = "This string should not be affected "
I'm trying to parse the text in order to eliminate all leading spaces, trailing spaces, and extra internal spaces (when two "words/chunks" are separated with two or more space or when there is one or more spaces between a character and a parentheses) using regex in C#. The result after parsing the above should look like:
Call Function("Str ing 1 ", "String 2", " String 3 ", 1000) As Integer
Dim x = "This string should not be affected "
The issue I'm running into is that, I want to parse all of the line except any text contained within quotation marks (i.e. a string). Basically if there are extra spaces or whatever inside a string, I want to assume that it was intended and move on without changing the string at all, but if there are extra spaces in the line text outside of the quotation marks, I want to parse and adjust that accordingly.
So far I have the following regex which does all of the parsing I mentioned above, the only issue is it will affect the contents of strings just like any other part of the line:
var rx = new Regex(#"\A\s+|(?<=\s)\s+|(?<=.)\s+(?=\()|(?<=\()\s+(?=.)|(?<=.)\s+(?=\))|\s+\z")
.
.
.
lineOfText = rx.Replace(lineOfText, String.Empty);
Anyone have any idea how I can approach this, or know of a past question answering this that I couldn't find? Thank you!
Since you are reading the file line by line, you can use the following fix:
("[^"]*(?:""[^"]*)*")|^\s+|(?<=\s)\s+|(?<=\w)\s+(?=\()|(?<=\()\s+(?=\w)|(?<=\w)\s+(?=\))|\s+$
Replace the matched text with $1 to restore the captured string literals that were captured with ("[^"]*(?:""[^"]*)*").
See demo

Replace a string in multiline regex with end of line token

I got the following regex
var fixedString = Regex.Replace(subject, #"(:[\w]+ [\d]+)$", "",
RegexOptions.Multiline);
which doesn't work. It works if I use \r\n, but I would like to support all types of line breaks. As another answer states I have to use RegexOptions.Multiline to be able to use $ as end of line token (instead of end of string). But it doesn't seem to help.
What am I doing wrong?
I am not sure what you want to achieve, I think I understood, you want to replace also the newline character at the end of the row.
The problem is the $ is a zero width assertion. It does not match the newline character, it matches the position before \n.
You could do different other things:
If it is OK to match all following newlines, means also all following empty rows, you could do this:
var fixedString = Regex.Replace(subject, #"(:[\w]+ [\d]+)[\r\n]+", "");
If you only want to match the newline after the row and keep following empty rows, you have to make a pattern for all possible combinations, e.g.:
var fixedString = Regex.Replace(subject, #"(:[\w]+ [\d]+)\r?\n", "");
This would match the combination \n and \r\n

Categories