Remove redundant characters from string after importing a file - c#

I'm working on application where my goal is to import a data from .txt file and store it in database.
One row in that file looks like this: .txt file for import
Lets take a look at "Papua New Guinea" which I marked with red square in the previous image.
So after importing this file using IFormFile I get something like this: List of items in code
My plan is to store this values to database, but I am having redudant characters as can be seen in previous picture "\"Papua New Guinea\"".
How can I remove those redudant characters? Having in mind that not every item will have those redudant (\") characters (2nd image you can see some integer values)

The \ is used as escape character to show you that the following " does not mark the end of the string but is that it is part of it. The escape character is not part of the string. I.e., in "\"Papua New Guinea\"", the string begins with the characters ", P, a and ends with e, a, ".
You can use an overload of Trim accepting characters to be trimmed from the beginning and the end of the string.
string s = "\"Papua New Guinea\"";
string trimmed = s.Trim('"');
Note that since character literals are delimited with single quotes, you can specify the double quote character without escape character.

Those slashes wont' actually show up if you try to print or do anything with the string. They're just there because strings start with a " and \" is a way of adding a quotation mark to a string. Just like \n adds a new line. The integers don't have them because they don't include a quotation mark.

Related

How to replace new line character with csv newline character

I need to import a multi-line string into a single cell of CSV
string message = "This is 1st string.
This is 2nd string.
This is 3rd string."
When I try to import string into CSV it splits up in multiple rows.
If I remove new line chars from "message" the string is added as single line.
I need to add message as it is into a single cell (multi-line into single cell).
I tried following codes:
var regex = new Regex(#"\r\n?|\t", RegexOptions.Compiled);
message= regex.Replace(message, String.Empty);
char lf = (char)10;
message= message.Replace("\n", lf.ToString());
But the message is added to multiple rows in CSV.
You should try to remove the br tag (with <> ) in case it contains html, and maybe you can replace like:
.Replace(br tag between <>,"");
.Replace(Enviroment.NewLine,"");
And I´m not sure if the regex would replace one by one, but try this:
.Replace("\r","")
ant then:
.Replace("\n","")
Hope it works
Using \r\n?|\t, you replace each carriage return, possibly followed by a line feed, but also every tab (the |\t means 'or a tab')
I guess you probably want to replace \r\n?\t*, i.e. a carriage return, possibly followed by a line feed, followed by an undetermined number of tabs, and then replace this whole block.

Reading path written in text box removes backslash

In my current project, the user writes a file path (Example: "C:\Data") into a Textbox. Then I read it with:
string PathInput = tbPath.Text;
And then send it into an SQL Insert Query.
If I then read the data from SQL, I get back: C:Data
So I tried to do:
string Path = PathInput.Replace(#"\", "\\");
So that it would double the \\, because when I enter C:\\Data I get C:\Data. But it looks like the \ get lost in Textbox and not in Database.
So, how can I read the TextBox without losing the \s?
Your replace doesn't actually replace anything:
PathInput.Replace(#"\", "\\");
Since you use an # before the first string, you don't have to escape anything. But in the second string, you don't use #, meaning you have to escape characters in that string - that means you're replacing the \ with another \.
Change it to:
PathInput.Replace(#"\", #"\\");

Match regex pattern in a line of text without targeting the text within quotations

Stackoverflow has been very generous with answers to my regex questions so far, but with this one I'm blanking on what to do and just can't seem to find it answered here.
So I'm parsing a string, let's say for example's sake, a line of VB-esque code like either of the following:
Call Function ( "Str ing 1 ", "String 2" , " String 3 ", 1000 ) As Integer
Dim x = "This string should not be affected "
I'm trying to parse the text in order to eliminate all leading spaces, trailing spaces, and extra internal spaces (when two "words/chunks" are separated with two or more space or when there is one or more spaces between a character and a parentheses) using regex in C#. The result after parsing the above should look like:
Call Function("Str ing 1 ", "String 2", " String 3 ", 1000) As Integer
Dim x = "This string should not be affected "
The issue I'm running into is that, I want to parse all of the line except any text contained within quotation marks (i.e. a string). Basically if there are extra spaces or whatever inside a string, I want to assume that it was intended and move on without changing the string at all, but if there are extra spaces in the line text outside of the quotation marks, I want to parse and adjust that accordingly.
So far I have the following regex which does all of the parsing I mentioned above, the only issue is it will affect the contents of strings just like any other part of the line:
var rx = new Regex(#"\A\s+|(?<=\s)\s+|(?<=.)\s+(?=\()|(?<=\()\s+(?=.)|(?<=.)\s+(?=\))|\s+\z")
.
.
.
lineOfText = rx.Replace(lineOfText, String.Empty);
Anyone have any idea how I can approach this, or know of a past question answering this that I couldn't find? Thank you!
Since you are reading the file line by line, you can use the following fix:
("[^"]*(?:""[^"]*)*")|^\s+|(?<=\s)\s+|(?<=\w)\s+(?=\()|(?<=\()\s+(?=\w)|(?<=\w)\s+(?=\))|\s+$
Replace the matched text with $1 to restore the captured string literals that were captured with ("[^"]*(?:""[^"]*)*").
See demo

Can't replace a particular Line Feed from an MYOB tab delimited file

I have a tab delimited file, and in one particular field, sometimes the content will contain a sentence with a line feed character in the middle of it after looking at it on Notepad++. Subsequently, when the program tries to split this line with the delimiter, it sort of stops at that point and starts again which is bad.
So I've been doing to usual with replace, and then trim to get rid of it, but it's not picked them up.
i.e.
line = line.Replace("\r", " ").Replace("\n", " ");
and
line = line.Trim('\r', '\n');
What am I missing? Is there another representation of \n out there?
Edit. I have also tried (char)10 and didn't find it either.
Edit 2. As a big picture, I've solved what I needed to do, but not with this particular method of replacing. Because I was using .Readline() on my file, I determined replace wouldn't work regardless as that line had finished even though I know it wasn't, so I would read into the next line and then combine the two strings together and my mystery line feed was gone.
From the comments to the question:
You mentioned you tried (char)10, but it only worked if it was in quotes. That just treats it as "(char)10" (a string) instead of the character that a base 10 (decimal) value of "10" actually is: a linefeed character.
It didn't work without quotes because replacing a char requires that the char be replaced with a char instead of a string, which is what happens by default when we don't specify a string is actually a char. (confusing, I know). In VB, and perhaps in C# as well, this is denoted with a c after the closing quote.
Example:
line = line.Replace("\r", " ").Replace("\n", " ").Replace((char)10, " "c);
The other possible way of doing this without using the " "c is to use the char of the space's ascii value (32):
line = line.Replace("\r", " ").Replace("\n", " ").Replace((char)10, (char)32);
That's probably why your initial attempt didn't work when trying to replace the linefeed character.
I think you also may have been fine if you replaced the \n by itself (especially before replacing the \r), or Environment.NewLine, which represents the CRLF - control return (\r) line feed ((char)10) used to denote a new line in windows. (Environment.NewLine is able to adapt to the system the code is running from, though, whereas \n is always CRLF... Correct me if I'm wrong.)

Using "#" to insert multiple lines of strings in StringBuilder

I have a StringBuilder object and wanted to used its Append() method to add this whole string to it:
so I used "#" and copy pasted that whole string like this, but it gives a lot of errors such as "; expected ", "Invalid Expression '<'" , etc
myString.Append(#"COPY-PASTED-THAT_WHOLE-STRING");
What is the correct way of adding this string to my string builder object?
Thank you.
Even with an # prefixing the string, you need to escape any " characters, otherwise they will be interpreted as the end of the string literal.
EDIT:
e.g.
var entity = #"<!ENTITY xsd ""http://www.w3.org/2001/XMLSchema#"">";
Double-quotes (") inside the string you want to paste need to be escaped by being replaced with two consecutive double-quotes, as in "". Here's a trick to use:
Paste your string into a new instance of Notepad
Replace all double quotes (") with two double quotes ("")
Select and copy the content from Notepad back into clipboard
Paste it into #"…" in your code/text editor
From C# docs:
In a verbatim string literal, the characters between the delimiters
are interpreted verbatim, the only exception being a
quote-escape-sequence.
You can use the # syntax to add multiple lines. But you need to escape the "s inside your string by using ""
For example
#"<Ontology xmlns=""http://www.w3.org/2002/07/owl#"""
If you don't escape them, C# will treat the quote mark as the end of the string.
One option, as others have said, is to escape all of the double quotes (") with a double double quote ("").
What I prefer to do, as it makes the code more readable, when adding an XML block as a literal string, is to use single quotes rather than double quotes. Just put the XML file into a text editor and do a replace all on double quote with a single quote (').
Another option, since your XML literal isn't all that short, is to put it into a file and read in that file at runtime.
You can escape them like this as well...
#"<Ontology xmlns=\"http://www.w3.org/2002/07/owl#\""

Categories