Im trying to insert literal strings into c++ files using a c# tool, and Im tasked with automatically adding escapes.
To start with " => \". However I cannot figure out the regular expression required to transform instances of " to \"
public String AddEscapeCharactersForCode(String content)
{
String escaper = "\\\\";
String ncontent = Regex.Replace(content, "\\\\\"");
ncontent = Regex.Replace(ncontent, "'", "\\\\'");
ncontent = Regex.Replace(ncontent, "\n", "\\\\\n");
return content;
}
The above code does nothing to my strings resulting in unescaped quotes and broken code files =(
Well, you've got:
// ...
return content;
...which simply returns the string passed in. So, all of that Regex.Replace goodness gets thrown away.
For this simple task, you don't really need a regexp. Using String.Replace() is straightforward.
String.Replace Method
Returns a new string in which all occurrences of a specified Unicode character or String in this instance are replaced with another specified Unicode character or String.
s1 = "some \"parts\" may be \"quoted\" here"
// s1 is <some "parts" may be "quoted" here>
s2 = s.replace("\"", "\\\"")
// s2 is <some \"parts\" may be \"quoted\" here>
If you must do it with regex, minimize the number of replacements by using a regular expression that handles backslashes and double quotes in one step.
public String AddEscapeCharactersForCode(String content)
{
content = Regex.Replace(content, "[\"\\\\]", "\\$&");
content = Regex.Replace(content, "\n", "\\n");
return content;
}
I think you have too many backslashes in your example. To me the output of the above looks right.
Related
I'm a doing an massive uploading of information from a .csv file and I need replace this character non ASCII "�" for a normal space, " ".
The character "�" corresponds to "\uFFFD" for C, C++, and Java, which it seems that it is called REPLACEMENT CHARACTER. There are others, such as spaces type like U+FEFF, U+205F, U+200B, U+180E, and U+202F in the C# official documentation.
I'm trying do the replace this way:
public string Errors = "";
public void test(){
string textFromCsvCell = "";
string validCharacters = "^[0-9A-Za-z().:%-/ ]+$";
textFromCsvCell = "This is my text from csv file"; //All spaces aren't normal space " "
string cleaned = textFromCsvCell.Replace("\uFFFD", "\"")
if (Regex.IsMatch(cleaned, validCharacters ))
//All code for insert
else
Errors=cleaned;
//print Errors
}
The test method shows me this text:
"This is my�texto from csv file"
I try some solutions too:
Trying solution 1: Using Trim
Regex.Replace(value.Trim(), #"[^\S\r\n]+", " ");
Try solution 2: Using Replace
System.Text.RegularExpressions.Regex.Replace(str, #"\s+", " ");
Try solution 3: Using Trim
String.Trim(new char[]{'\uFEFF', '\u200B'});
Try solution 4: Add [\S\r\n] to validCharacters
string validCharacters = "^[\S\r\n0-9A-Za-z().:%-/ ]+$";
Nothing works.
How can I replace it?
Sources:
Unicode Character 'REPLACEMENT CHARACTER' (U+FFFD)
Trying to replace all white space with a single space
Strip the byte order mark from string in C#
Remove extra whitespaces, but keep new lines using a regular expression in C#
EDITED
This is the original string:
"SYSTEM OF MONITORING CONTINUES OF GLUCOSE"
in 0x... notation
SYSTEM OF0xA0MONITORING CONTINUES OF GLUCOSE
Solution
Go to the Unicode code converter. Look at the conversions and do the replace.
In my case, I do a simple replace:
string value = "SYSTEM OF MONITORING CONTINUES OF GLUCOSE";
//value contains non-breaking whitespace
//value is "SYSTEM OF�MONITORING CONTINUES OF GLUCOSE"
string cleaned = "";
string pattern = #"[^\u0000-\u007F]+";
string replacement = " ";
Regex rgx = new Regex(pattern);
cleaned = rgx.Replace(value, replacement);
if (Regex.IsMatch(cleaned,"^[0-9A-Za-z().:<>%-/ ]+$"){
//all code for insert
else
//Error messages
This expression represents all possible spaces: space, tab, page break, line break and carriage return
[ \f\n\r\t\v\u00a0\u1680\u180e\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]
References
Regular expressions (MDN)
Using String.Replace:
Use a simple String.Replace().
I've assumed that the only characters you want to remove are the ones you've mentioned in the question: � and you want to replace them by a normal space.
string text = "imp�ortant";
string cleaned = text.Replace('\u00ef', ' ')
.Replace('\u00bf', ' ')
.Replace('\u00bd', ' ');
// Returns 'imp ortant'
Or using Regex.Replace:
string cleaned = Regex.Replace(text, "[\u00ef\u00bf\u00bd]", " ");
// Returns 'imp ortant'
Try it out: Dotnet Fiddle
Define a range of ASCII characters, and replace anything that is not within that range.
We want to find only Unicode characters, so we will match on a Unicode character and replace.
Regex.Replace("This is my te\uFFFDxt from csv file", #"[^\u0000-\u007F]+", " ")
The above pattern will match anything that is not ^ in the set [ ] of this range \u0000-\u007F (ASCII characters (everything past \u007F is Unicode)) and replace it with a space.
Result
This is my te xt from csv file
You can adjust the range provided \u0000-\u007F as needed to expand the range of allowed characters to suit your needs.
If you just want ASCII then try the following:
var ascii = new ASCIIEncoding();
byte[] encodedBytes = ascii.GetBytes(text);
var cleaned = ascii.GetString(encodedBytes).Replace("?", " ");
How can I replace Line Breaks within a string in C#?
Use replace with Environment.NewLine
myString = myString.Replace(System.Environment.NewLine, "replacement text"); //add a line terminating ;
As mentioned in other posts, if the string comes from another environment (OS) then you'd need to replace that particular environments implementation of new line control characters.
The solutions posted so far either only replace Environment.NewLine or they fail if the replacement string contains line breaks because they call string.Replace multiple times.
Here's a solution that uses a regular expression to make all three replacements in just one pass over the string. This means that the replacement string can safely contain line breaks.
string result = Regex.Replace(input, #"\r\n?|\n", replacementString);
To extend The.Anyi.9's answer, you should also be aware of the different types of line break in general use. Dependent on where your file originated, you may want to look at making sure you catch all the alternatives...
string replaceWith = "";
string removedBreaks = Line.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
should get you going...
I would use Environment.Newline when I wanted to insert a newline for a string, but not to remove all newlines from a string.
Depending on your platform you can have different types of newlines, but even inside the same platform often different types of newlines are used. In particular when dealing with file formats and protocols.
string ReplaceNewlines(string blockOfText, string replaceWith)
{
return blockOfText.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
}
If your code is supposed to run in different environments, I would consider using the Environment.NewLine constant, since it is specifically the newline used in the specific environment.
line = line.Replace(Environment.NewLine, "newLineReplacement");
However, if you get the text from a file originating on another system, this might not be the correct answer, and you should replace with whatever newline constant is used on the other system. It will typically be \n or \r\n.
if you want to "clean" the new lines, flamebaud comment using regex #"[\r\n]+" is the best choice.
using System;
using System.Text.RegularExpressions;
class MainClass {
public static void Main (string[] args) {
string str = "AAA\r\nBBB\r\n\r\n\r\nCCC\r\r\rDDD\n\n\nEEE";
Console.WriteLine (str.Replace(System.Environment.NewLine, "-"));
/* Result:
AAA
-BBB
-
-
-CCC
DDD---EEE
*/
Console.WriteLine (Regex.Replace(str, #"\r\n?|\n", "-"));
// Result:
// AAA-BBB---CCC---DDD---EEE
Console.WriteLine (Regex.Replace(str, #"[\r\n]+", "-"));
// Result:
// AAA-BBB-CCC-DDD-EEE
}
}
Use new in .NET 6 method
myString = myString.ReplaceLineEndings();
Replaces ALL newline sequences in the current string.
Documentation:
ReplaceLineEndings
Don't forget that replace doesn't do the replacement in the string, but returns a new string with the characters replaced. The following will remove line breaks (not replace them). I'd use #Brian R. Bondy's method if replacing them with something else, perhaps wrapped as an extension method. Remember to check for null values first before calling Replace or the extension methods provided.
string line = ...
line = line.Replace( "\r", "").Replace( "\n", "" );
As extension methods:
public static class StringExtensions
{
public static string RemoveLineBreaks( this string lines )
{
return lines.Replace( "\r", "").Replace( "\n", "" );
}
public static string ReplaceLineBreaks( this string lines, string replacement )
{
return lines.Replace( "\r\n", replacement )
.Replace( "\r", replacement )
.Replace( "\n", replacement );
}
}
To make sure all possible ways of line breaks (Windows, Mac and Unix) are replaced you should use:
string.Replace("\r\n", "\n").Replace('\r', '\n').Replace('\n', 'replacement');
and in this order, to not to make extra line breaks, when you find some combination of line ending chars.
Why not both?
string ReplacementString = "";
Regex.Replace(strin.Replace(System.Environment.NewLine, ReplacementString), #"(\r\n?|\n)", ReplacementString);
Note: Replace strin with the name of your input string.
I needed to replace the \r\n with an actual carriage return and line feed and replace \t with an actual tab. So I came up with the following:
public string Transform(string data)
{
string result = data;
char cr = (char)13;
char lf = (char)10;
char tab = (char)9;
result = result.Replace("\\r", cr.ToString());
result = result.Replace("\\n", lf.ToString());
result = result.Replace("\\t", tab.ToString());
return result;
}
var answer = Regex.Replace(value, "(\n|\r)+", replacementString);
As new line can be delimited by \n, \r and \r\n, first we’ll replace \r and \r\n with \n, and only then split data string.
The following lines should go to the parseCSV method:
function parseCSV(data) {
//alert(data);
//replace UNIX new lines
data = data.replace(/\r\n/g, "\n");
//replace MAC new lines
data = data.replace(/\r/g, "\n");
//split into rows
var rows = data.split("\n");
}
Use the .Replace() method
Line.Replace("\n", "whatever you want to replace with");
Best way to replace linebreaks safely is
yourString.Replace("\r\n","\n") //handling windows linebreaks
.Replace("\r","\n") //handling mac linebreaks
that should produce a string with only \n (eg linefeed) as linebreaks.
this code is usefull to fix mixed linebreaks too.
Another option is to create a StringReader over the string in question. On the reader, do .ReadLine() in a loop. Then you have the lines separated, no matter what (consistent or inconsistent) separators they had. With that, you can proceed as you wish; one possibility is to use a StringBuilder and call .AppendLine on it.
The advantage is, you let the framework decide what constitutes a "line break".
string s = Regex.Replace(source_string, "\n", "\r\n");
or
string s = Regex.Replace(source_string, "\r\n", "\n");
depending on which way you want to go.
Hopes it helps.
If you want to replace only the newlines:
var input = #"sdfhlu \r\n sdkuidfs\r\ndfgdgfd";
var match = #"[\\ ]+";
var replaceWith = " ";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input.Replace(#"\n", replaceWith).Replace(#"\r", replaceWith), match, replaceWith);
Console.WriteLine("output: " + x);
If you want to replace newlines, tabs and white spaces:
var input = #"sdfhlusdkuidfs\r\ndfgdgfd";
var match = #"[\\s]+";
var replaceWith = "";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input, match, replaceWith);
Console.WriteLine("output: " + x);
This is a very long winded one-liner solution but it is the only one that I had found to work if you cannot use the the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method
MyStr.replace( System.String.Concat( System.Char.ConvertFromUtf32(13).ToString(), System.Char.ConvertFromUtf32(10).ToString() ), ReplacementString );
This is somewhat offtopic but to get it to work inside Visual Studio's XML .props files, which invoke .NET via the XML properties, I had to dress it up like it is shown below.
The Visual Studio XML --> .NET environment just would not accept the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method.
$([System.IO.File]::ReadAllText('MyFile.txt').replace( $([System.String]::Concat($([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString()))),$([System.String]::Concat('^',$([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString())))))
Based on #mark-bayers answer and for cleaner output:
string result = Regex.Replace(ex.Message, #"(\r\n?|\r?\n)+", "replacement text");
It removes \r\n , \n and \r while perefer longer one and simplify multiple occurances to one.
I have read an entire file into a string object :-
string result = File.ReadAllText(#"C:\TestLog.log");
I want to remove all line breaks, "\n", but i also need to preserve any instances in the string object where "\r\n" exists.
How can i do this?
It looks like you want a regular expression replace.
string result = File.ReadAllText(#"C:\TestLog.log");
string newresult = Regex.Replace(result, #"[^\\r]\\n", "");
So the pattern looks for any \n that is not preceded by \r.
result = result.Replace("\n","").Replace("\r","\r\n")
Without using regex though. Not sure if you intend to use that or not.
Using C# we can do string check like if string.contains() method, e.g.:
string test = "Microsoft";
if (test.Contains("i"))
test = test.Replace("i","a");
This is fine. But what if I want to replace a string which contains " symbol to be replaced.
I want to achieve this:
"<html><head>
I want to remove the " symbol present in check so that the result would be:
<html><head>
The " character can also be replaced, just like any other:
test = test.Replace("\"","");
Also, note that you don't have to test if the character exists : your test.Contains("i") could be removed since the .Replace() method won't do anything (no replace, no error thrown) if the character doesn't exist inside the string.
To include a quote symbol in a string, you need to escape it, using a backslash. In your example, you want to use something lik this:
if (test.Contains("\""))
There are two ways to include a '"' character in a string literal. All the answers so far have used the c-style way:
var quotation = "Parting is such sweet sorrow";
var howSweetIsIt = quotation + " that I shall say \"good-night\" till it be morrow.";
In some contexts (especially for users experienced with Visual Basic), the verbatim string literal may be easier to read. A verbatim string literal begins with an # sign, and the only character that requires escaping is the quotation mark -- all other characters are included verbatim (hence the name). Significantly, the method of escaping the quotation mark is different: rather than preceding it with a backslash, it must be doubled:
var howSweetIsIt = quotation + " that I shall say ""good-night"" till it be morrow.";
string SymbolString = "Micro\"so\"ft";
The string above use scape char \ to insert " between the characters
string Result = SymbolString.Replace("\"", string.Empty);
With the following replace I replace the character "" for empty.
This is what you try to achieve?
if (check.Contains("\"")
output = check.Replace("\"", "");
output = check.Replace("\"", "");
Just remember to use "\"" for the quote sign as the backslash is an escape character.
if (str.Contains("\""))
{
str = str.Replace("\"", "");
}
I'm trying to match on some inconsistently formatted HTML and need to strip out some double quotes.
Current:
<input type="hidden">
The Goal:
<input type=hidden>
This is wrong because I'm not escaping it properly:
s = s.Replace(""","");
This is wrong because there is not blank character character (to my knowledge):
s = s.Replace('"', '');
What is syntax / escape character combination for replacing double quotes with an empty string?
I think your first line would actually work but I think you need four quotation marks for a string containing a single one (in VB at least):
s = s.Replace("""", "")
for C# you'd have to escape the quotation mark using a backslash:
s = s.Replace("\"", "");
I didn't see my thoughts repeated already, so I will suggest that you look at string.Trim in the Microsoft documentation for C# you can add a character to be trimmed instead of simply trimming empty spaces:
string withQuotes = "\"hellow\"";
string withOutQotes = withQuotes.Trim('"');
should result in withOutQuotes being "hello" instead of ""hello""
s = s.Replace("\"", "");
You need to use the \ to escape the double quote character in a string.
You can use either of these:
s = s.Replace(#"""","");
s = s.Replace("\"","");
...but I do get curious as to why you would want to do that? I thought it was good practice to keep attribute values quoted?
s = s.Replace("\"",string.Empty);
c#: "\"", thus s.Replace("\"", "")
vb/vbs/vb.net: "" thus s.Replace("""", "")
If you only want to strip the quotes from the ends of the string (not the middle), and there is a chance that there can be spaces at either end of the string (i.e. parsing a CSV format file where there is a space after the commas), then you need to call the Trim function twice...for example:
string myStr = " \"sometext\""; //(notice the leading space)
myStr = myStr.Trim('"'); //(would leave the first quote: "sometext)
myStr = myStr.Trim().Trim('"'); //(would get what you want: sometext)
You have to escape the double quote with a backslash.
s = s.Replace("\"","");
s = s.Replace(#"""", "");
This worked for me
//Sentence has quotes
string nameSentence = "Take my name \"Wesley\" out of quotes";
//Get the index before the quotes`enter code here`
int begin = nameSentence.LastIndexOf("name") + "name".Length;
//Get the index after the quotes
int end = nameSentence.LastIndexOf("out");
//Get the part of the string with its quotes
string name = nameSentence.Substring(begin, end - begin);
//Remove its quotes
string newName = name.Replace("\"", "");
//Replace new name (without quotes) within original sentence
string updatedNameSentence = nameSentence.Replace(name, newName);
//Returns "Take my name Wesley out of quotes"
return updatedNameSentence;
s = s.Replace( """", "" )
Two quotes next to each other will function as the intended " character when inside a string.
if you would like to remove a single character i guess it's easier to simply read the arrays and skip that char and return the array. I use it when custom parsing vcard's json.
as it's bad json with "quoted" text identifiers.
Add the below method to a class containing your extension methods.
public static string Remove(this string text, char character)
{
var sb = new StringBuilder();
foreach (char c in text)
{
if (c != character)
sb.Append(c);
}
return sb.ToString();
}
you can then use this extension method:
var text= myString.Remove('"');