Substring with special character - c#

Let's say I have a string:
string str = "09012013 Receipt 09012013 #12"
I want to do substring to return "Receipt 09012013 #12", I used substring:
var result = str.Substring(9);
and the result was only "Receipt 09012013"
I tried with other special characters (%,§,$ ...), it worked, substring returned "Receipt 09012013 %12", but with # and &, substring only returned "Receipt 09012013".
Any thoughts? Thanks.
EDIT
My code:
new NaviListItem("renameBtn", "showwaitingscreen", "akte/renameakte?entityid=" + Request["parentid"] + "&aktenkurzbezeichnung=" + Model.Node.Header.Substring(Model.Node.Ordnungsnummer.Length+1), "umbenennen.png", Model.RenameVisible, "Umbenennen", "Umbenennen"),

The result of Substring does not depend on the characters at the end:
string str = "09012013 Receipt 09012013 #12".Substring(9);
produces "Receipt 09012013 #12" as its result.
Most likely this is a display issue: if you are delivering the result of the Substring over some sort of HTML-enabled display mechanism, the & and # would often be treated as meta-characters, and therefore require escaping.

Related

Find and replace the string in paragraph

I want to empty the value between the hyphn for example need to clear the data in between the range of hyphen prefix and suffix then make it has empty string.
string templateContent = "Template content -macro- -UnitDetails- -testEmail- sending Successfully";
Output
templateContent = "Template content sending Successfully";
templateContent = Regex.Replace(templateContent, #"-\w*-\s?", string.Empty).TrimEnd(' ');
#"-\w*-\s" - is regex pattern for '-Word- '
- - pattern for -
\w - word character.
* - zero or any occurrences of \w
\s - pattern for whitespace character
? - marks \s as optional
TrimEnd(' ') - to remove trailing space if there was a pattern at end of the string
There are many ways to do this, however given your example the following should work
var split = templateContent
.Split(' ')
.Where(x => !x.StartsWith("-") && !x.EndsWith("-"));
var result = string.Join(" ",split);
Console.WriteLine(result);
Output
Template content sending Successfully
Full Demo Here
Note : I personally think regex is better suited to this
You can use regex for this
string regExp = "(-[a-zA-Z]*-)";
string tmp = Regex.Replace(templateContent , regExp, "");
string finalStr = Regex.Replace(tmp, " {2,}", " ");
var resultWithSpaces = Regex.Replace(templateContent, #"-\S+-", string.Empty);
This regular expression looks for two hyphens surrounding one or more characters that are not white space.
It will leave the spaces that were around the removed word. To get rid of those you can do another Regex to replace multiple spaces with a single space.
var result = Regex.Replace(resultWithSpaces, #"\s+", " ");

Splitting a string on an "undefined" variable

I have a piece of text that is in multiple formats, and I want to try and create a method that encompasses all of them. I know where I can split these lines, however, I am uncertain of how to define this.
An example of the text:
.0 index .0.label unicode "Area" .0.value unicode "6WAY DB" .1 index .1.label unicode "SubStation" .1.value unicode "E782DB257" .2 (etc...)
I want to split these lines on the ".0", ".1", etc, so that my list will look like:
.0 index
.0.label unicode "Area"
.0.value unicode "6WAY DB"
.1 index
.1.label unicode "SubStation"
This will make the data easier to manipulate. However, since the value changes depending on the line, I can't simply sate the value as a regular string. Instead, I was thinking of stating is more like
string Split = "." + n.IsInt();
Or something similar. However, I can't find anything that has worked yet.
If i understand you, you can do the following with regex replace
var input = ".0 index .0.label unicode \"Area\" .0.value unicode \"6WAY DB\" .1 index .1.label unicode \"SubStation\" .1.value unicode \"E782DB257\" .2 (etc...)";
var result = Regex.Replace(input, #"\.\d", $"{Environment.NewLine}$&");
Console.WriteLine(result);
or to actually split
var lines = result.Split(new[]{Environment.NewLine},StringSplitOptions.None);
foreach (var line in lines)
Console.WriteLine(line);
Output
.0 index
.0.label unicode "Area"
.0.value unicode "6WAY DB"
.1 index
.1.label unicode "SubStation"
.1.value unicode "E782DB257"
.2 (etc...)
Explanation
. matches any character (except for line terminators)
\d matches a digit (equal to [0-9])
$& replaces with the original match
If your string follow fix format and you want to extract value from the string then you can implement a custom function for that something like this.
function splitCustom(str){
var retVal=[];
str = str.split('.0 index')[1].trim();
var totalRecord=str[str.lastIndexOf(' index')-1];
for(var i=0;i<=totalRecord;i++){
var obj={};
var substr=str.split("." + (i+1) + ' index');
var curRecord="";
if(substr.length>1){
curRecord=substr[0].trim();
str = substr[1].trim();
}
else{
curRecord=str;
}
obj.index=i;
var labelString=curRecord.split("." + i + ".")[1].trim();
obj.label=labelString.substr(labelString.indexOf('"')+1, labelString.lastIndexOf('"')-labelString.indexOf('"')-1);
var valueString=curRecord.split("." + i + ".")[2].trim();
obj.value=valueString.substr(valueString.indexOf('"')+1, valueString.lastIndexOf('"')-valueString.indexOf('"')-1);
retVal.push(obj);
}
return retVal;
}
var str='.0 index .0.label unicode "Area" .0.value unicode "6WAY DB" .1 index .1.label unicode "SubStation" .1.value unicode "E782DB257"';
var response = splitCustom(str);
Output
[
{"index":0,"label":"Area","value":"6WAY DB"},
{"index":1,"label":"SubStation","value":"E782DB257"}
]

Replace Unicode character "�" with a space

I'm a doing an massive uploading of information from a .csv file and I need replace this character non ASCII "�" for a normal space, " ".
The character "�" corresponds to "\uFFFD" for C, C++, and Java, which it seems that it is called REPLACEMENT CHARACTER. There are others, such as spaces type like U+FEFF, U+205F, U+200B, U+180E, and U+202F in the C# official documentation.
I'm trying do the replace this way:
public string Errors = "";
public void test(){
string textFromCsvCell = "";
string validCharacters = "^[0-9A-Za-z().:%-/ ]+$";
textFromCsvCell = "This is my text from csv file"; //All spaces aren't normal space " "
string cleaned = textFromCsvCell.Replace("\uFFFD", "\"")
if (Regex.IsMatch(cleaned, validCharacters ))
//All code for insert
else
Errors=cleaned;
//print Errors
}
The test method shows me this text:
"This is my�texto from csv file"
I try some solutions too:
Trying solution 1: Using Trim
Regex.Replace(value.Trim(), #"[^\S\r\n]+", " ");
Try solution 2: Using Replace
System.Text.RegularExpressions.Regex.Replace(str, #"\s+", " ");
Try solution 3: Using Trim
String.Trim(new char[]{'\uFEFF', '\u200B'});
Try solution 4: Add [\S\r\n] to validCharacters
string validCharacters = "^[\S\r\n0-9A-Za-z().:%-/ ]+$";
Nothing works.
How can I replace it?
Sources:
Unicode Character 'REPLACEMENT CHARACTER' (U+FFFD)
Trying to replace all white space with a single space
Strip the byte order mark from string in C#
Remove extra whitespaces, but keep new lines using a regular expression in C#
EDITED
This is the original string:
"SYSTEM OF MONITORING CONTINUES OF GLUCOSE"
in 0x... notation
SYSTEM OF0xA0MONITORING CONTINUES OF GLUCOSE
Solution
Go to the Unicode code converter. Look at the conversions and do the replace.
In my case, I do a simple replace:
string value = "SYSTEM OF MONITORING CONTINUES OF GLUCOSE";
//value contains non-breaking whitespace
//value is "SYSTEM OF�MONITORING CONTINUES OF GLUCOSE"
string cleaned = "";
string pattern = #"[^\u0000-\u007F]+";
string replacement = " ";
Regex rgx = new Regex(pattern);
cleaned = rgx.Replace(value, replacement);
if (Regex.IsMatch(cleaned,"^[0-9A-Za-z().:<>%-/ ]+$"){
//all code for insert
else
//Error messages
This expression represents all possible spaces: space, tab, page break, line break and carriage return
[ \f\n\r\t\v​\u00a0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​​\u202f\u205f​\u3000]
References
Regular expressions (MDN)
Using String.Replace:
Use a simple String.Replace().
I've assumed that the only characters you want to remove are the ones you've mentioned in the question: � and you want to replace them by a normal space.
string text = "imp�ortant";
string cleaned = text.Replace('\u00ef', ' ')
.Replace('\u00bf', ' ')
.Replace('\u00bd', ' ');
// Returns 'imp ortant'
Or using Regex.Replace:
string cleaned = Regex.Replace(text, "[\u00ef\u00bf\u00bd]", " ");
// Returns 'imp ortant'
Try it out: Dotnet Fiddle
Define a range of ASCII characters, and replace anything that is not within that range.
We want to find only Unicode characters, so we will match on a Unicode character and replace.
Regex.Replace("This is my te\uFFFDxt from csv file", #"[^\u0000-\u007F]+", " ")
The above pattern will match anything that is not ^ in the set [ ] of this range \u0000-\u007F (ASCII characters (everything past \u007F is Unicode)) and replace it with a space.
Result
This is my te xt from csv file
You can adjust the range provided \u0000-\u007F as needed to expand the range of allowed characters to suit your needs.
If you just want ASCII then try the following:
var ascii = new ASCIIEncoding();
byte[] encodedBytes = ascii.GetBytes(text);
var cleaned = ascii.GetString(encodedBytes).Replace("?", " ");

conversion of special character string

i am using a web service and result is coming like this
" methew wade watto"
then I've tried with string.replace():
jsona = jsona.Replace(#"", "");
but the problem is i am unable to replace special character's like " this in my replace statement, How can I replace " from the input string? and what are the other options of replacing the string other then this?
In c#, The # symbol means to read that string literally, and don't
interpret control characters otherwise. whereas \ followed by a
character that is not recognized as an escaped character, matches that
character.
So you have to use \" to represent " in .Replace() instead for #
I think you have to try something like this:
string jsonInput = "\"methew wade watto\""; // be the input
string replacedQuotes = jsonInput.Replace("\"", "");
Working example
You need to escape the " with \ , right now, you are just saying to replace empty string with empty string:
jsona= jsona.Replace("\"","");
Now this will replace the " sign in your string with empty string.
Output:
methew wade watto
Use a backslash to determine special character
string = string.Replace("\"", "");

Replace a part of string containing Password

Slightly similar to this question, I want to replace argv contents:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
to this:
"-help=none\n-URL=(default)\n-password=********\n-uname=Khanna\n-p=100"
I have tried very basic string find and search operations (using IndexOf, SubString etc.). I am looking for more elegant solution so as to replace this part of string:
-password=AnyPassword
to:
-password=*******
And keep other part of string intact. I am looking if String.Replace or Regex replace may help.
What I've tried (not much of error-checks):
var pwd_index = argv.IndexOf("--password=");
string converted;
if (pwd_index >= 0)
{
var leftPart = argv.Substring(0, pwd_index);
var pwdStr = argv.Substring(pwd_index);
var rightPart = pwdStr.Substring(pwdStr.IndexOf("\n") + 1);
converted = leftPart + "--password=********\n" + rightPart;
}
else
converted = argv;
Console.WriteLine(converted);
Solution
Similar to Rubens Farias' solution but a little bit more elegant:
string argv = "-help=none\n-URL=(default)\n-password=\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)[^\n]*", "$1********");
It matches password= literally, stores it in capture group $1 and the keeps matching until a \n is reached.
This yields a constant number of *'s, though. But telling how much characters a password has, might already convey too much information to hackers, anyway.
Working example: https://dotnetfiddle.net/xOFCyG
Regular expression breakdown
( // Store the following match in capture group $1.
password= // Match "password=" literally.
)
[ // Match one from a set of characters.
^ // Negate a set of characters (i.e., match anything not
// contained in the following set).
\n // The character set: consists only of the new line character.
]
* // Match the previously matched character 0 to n times.
This code replaces the password value by several "*" characters:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)([\s\S]*?\n)",
match => match.Groups[1].Value + new String('*', match.Groups[2].Value.Length - 1) + "\n");
You can also remove the new String() part and replace it by a string constant

Categories