Remove characters from C# string not belonging to a specicif code page

Remove characters from C# string not belonging to a specicif code page - c#

In C# I have a string that goes on to be inserted into a db table using codepage 37 US. So for instance, the '€' will cause the insert operation to fail.
What is a good way to clean my string of characters not represented in code page 37 and possible replace those charaters with some default character?

Something like this?
var euroString = "abc?€./*";
var encoding37 = System.Text.Encoding.GetEncoding(
37,
new EncoderReplacementFallback("_"), //replacement char
new DecoderExceptionFallback());
var byteArrayWithFallbackChars = encoding37.GetBytes(euroString);
var utfStringFromBytesWithFallback = new string(encoding37.GetChars(byteArrayWithFallbackChars));
//returns "abc?_./*"
P.S.: you can just use GetEncoding(37), but in this case replacement char is ? which I think is not really OK for DB :)

Here is a regex to restrict input to a range of allowed characters:
https://dotnetfiddle.net/WIrSSO
const string Allowed = #"1-9\."; //Add allowed chars here
string cleanStr = Regex.Replace("£1.11", "[^" + Allowed + "]", "");

Related

C# Finding numerical value from specific string

I have a string of varying length that I am trying to retrieve a number from. The format of the string is always:
"some text lines
FC = 1234
more text here
and so on"
So I know the string of numbers comes after "FC = ", and I know it finishes at the next \n. How can I return this number (which will vary in size) into a new string?

Try the following code snippet:
var str = "some text lines \nFC = 1234\n more text here and so on";
Console.WriteLine(Regex.Match(str, #"\d+\.*\d*").Value);

Thanks to all. Think I managed to find a way with Regex, based on ScareCrow's suggestion:
string rgSearch = searchString + #"\d+\.*\d*";
FC = Regex.Match(diagnostics, rgSearch).Value;
FC = FC.Replace(searchString, ""); //Leaves the number only

Identify the string that does not exists in another string using regex and C#

I am trying to capture a string that does not contains in another string.
string searchedString = " This is my search string";
string subsetofSearchedString = "This is my";
My output should be "Search string". I would like to go with only regex so that I can handle complex strings.
The below is the code that I have tried so far and I am not successful.
Match match = new Regex(subsetofSearchedString ).Match(searchedString );
if (!string.IsNullOrWhiteSpace(match.Value))
{
UnmatchedString= UnmatchedString.Replace(match.Value, string.Empty);
}
Update : The above code is not working for the below texts.
text1 = 'Property Damage (2015 ACURA)' Exposure Added Automatically for IP:Claimant DriverLoss Reserve Line :Property DamageReserve Amount $ : STATIP Role(s): Owner, DriverExposure Owner :Jaimee Watson_csr Author:
text2 = 'Property Damage (2015 ACURA)' Exposure Added Automatically for IP:Claimant DriverLoss Reserve Line :Property DamageReserve Amount $ : STATIP Role(s): Owner, Driver
Match match = new Regex(text2).Match(text1);

You can use Regex.Split:
var ans = Regex.Split(searchedString, subsetofSearchedString);
If you want the answer as a single string minus the subset, you can join it:
var ansjoined = String.Join("", ans);
Replacing with String.Empty will also work:
var ans = Regex.Replace(searchedString, subsetOfSearchedString, String.Empty);

Answer :
Regex wasn't working for me because of the presence of metacharacters in my string. Regex.Escape did not help me with the comparison.
String Contains worked like a charm here
if (text1.Contains(text2))
{
status = TestResult.Pass;
text1= text1.Replace(text2, string.Empty);
}

Convert String to Byte Array replacing characters between specific character

I am trying to find a way to convert a string (entered into a TextBox) and convert it to a byte array to send out a serial port / socket.
I am fine with the converting string to byte[] part but am struggling a bit with the replacement
Essentially the GUI allows the user to specific the format of the response to send and I was looking at something like the following :-
User Enters : [2] Test {1} {2} [3]
{1} and {2} are variable fields which can be pulled from the incoming message so they are currently being replaced without issue.
What I am trying to achieve is replace the [2] with an STX character and the [3] with an ETX character with the 2 and 3 being their ASCII equivalents. www.asciitable.com
The user can enter any valid ascii character in this format so [13] for CR etc
Would the best way to loop through the string remembering the index of [ and then the index of ] and grab all characters between these two indexes? Or is there a more efficient way?
Thanks,
Daniel.

A regular expression can find digits between brackets and replace them with a calculated value.
Your replacement scheme looks like it might be similar to String.Format but you'll have to compare that and decide on the order of operations and meaning of special characters.
The encoding will throw an exception if the bracketed number is outside of 0-127. You could have some other behavior if you want.
var encoding = Encoding.GetEncoding(Encoding.ASCII.CodePage,
EncoderFallback.ExceptionFallback,
DecoderFallback.ExceptionFallback);
var bracketRegex = new Regex(#"\[(?<digits>\d+)\]", RegexOptions.Compiled);
MatchEvaluator convertToCodepoint = (match) =>
Char.ConvertFromUtf32(Int32.Parse(match.Groups["digits"].Value));
var values = new[] {"a", "b", "c" };
var input = "[2] Test {1} {2} [3]";
encoding.GetBytes(String.Format(bracketRegex.Replace(input, convertToCodepoint), values))
.Dump();

I think you should write a code similar to this:
string input = TextBox.text; "User name (sales)";
//Use those lines if you don't know how many times do you have to iterate.
var totalOfBraces = input.Where(x => x == '{').Count();
var totalOfBrackets = input.Where(x => x == '[').Count();
var totalOfElements = totalOfBraces + totalOfBrackets;
string output = input.Split('[', ']')[1];
string output = input.Split('{', '}')[1];
And you you can get the elements between Braces and Brackets and do a replace of them.
Then, why I added totalOfElements, to have the possibility to do a for bucle
For example:
var counterOfBraces = 0;
var counterOfBrackets = 0;
for(var i=0; i<totalOfElements.Count(); i++){
if(i < totalOfBrackets){
counterOfBrackets+=1;
var textToFind = "[" + index + "]";
input = input.Replace(textToFind, "some new text");
} else {
//Do the same for braces
}
}
//NOW HERE, YOU HAVE YOUR TEXT FORMATED AND READY TO CONVERT IT TO BYTE[]

How to strip a string from the point a hyphen is found within the string C#

I'm currently trying to strip a string of data that is may contain the hyphen symbol.
E.g. Basic logic:
string stringin = "test - 9894"; OR Data could be == "test";
if (string contains a hyphen "-"){
Strip stringin;
output would be "test" deleting from the hyphen.
}
Console.WriteLine(stringin);
The current C# code i'm trying to get to work is shown below:
string Details = "hsh4a - 8989";
var regexItem = new Regex("^[^-]*-?[^-]*$");
string stringin;
stringin = Details.ToString();
if (regexItem.IsMatch(stringin)) {
stringin = stringin.Substring(0, stringin.IndexOf("-") - 1); //Strip from the ending chars and - once - is hit.
}
Details = stringin;
Console.WriteLine(Details);
But pulls in an Error when the string does not contain any hyphen's.

How about just doing this?
stringin.Split('-')[0].Trim();
You could even specify the maximum number of substrings using overloaded Split constructor.
stringin.Split('-', 1)[0].Trim();

Your regex is asking for "zero or one repetition of -", which means that it matches even if your input does NOT contain a hyphen. Thereafter you do this
stringin.Substring(0, stringin.IndexOf("-") - 1)
Which gives an index out of range exception (There is no hyphen to find).
Make a simple change to your regex and it works with or without - ask for "one or more hyphens":
var regexItem = new Regex("^[^-]*-+[^-]*$");
here -------------------------^

It seems that you want the (sub)string starting from the dash ('-') if original one contains '-' or the original string if doesn't have dash.
If it's your case:
String Details = "hsh4a - 8989";
Details = Details.Substring(Details.IndexOf('-') + 1);

I wouldn't use regex for this case if I were you, it makes the solution much more complex than it can be.
For string I am sure will have no more than a couple of dashes I would use this code, because it is one liner and very simple:
string str= entryString.Split(new [] {'-'}, StringSplitOptions.RemoveEmptyEntries)[0];
If you know that a string might contain high amount of dashes, it is not recommended to use this approach - it will create high amount of different strings, although you are looking just for the first one. So, the solution would look like something like this code:
int firstDashIndex = entryString.IndexOf("-");
string str = firstDashIndex > -1? entryString.Substring(0, firstDashIndex) : entryString;

you don't need a regex for this. A simple IndexOf function will give you the index of the hyphen, then you can clean it up from there.
This is also a great place to start writing unit tests as well. They are very good for stuff like this.
Here's what the code could look like :
string inputString = "ho-something";
string outPutString = inputString;
var hyphenIndex = inputString.IndexOf('-');
if (hyphenIndex > -1)
{
outPutString = inputString.Substring(0, hyphenIndex);
}
return outPutString;

Split a string at 2 points

I have a file called file_test1.txt and I want to extract just test1 from the name and place it in a string. Whats the best way of doing this?
E.g.
string fullfile = #"C:\file_test1.txt";
string section = [test1] from fullfile; // <- expected result
I want to be able to split on 'file_' and '.txt' as the 'test1' section could be larger or smaller however the 'file_' and '.txt' will always be the same.

Try Path.GetFileNameWithoutExtension(fullfile).Substring(5) (or Substring("TEMPLATE_PREFIX".Length))

You can try spilt
var test = Path.GetFileNameWithoutExtension(fullfile).split('_')[1];

Try following
string fullfile = #"C:\file_test1.txt";
var name = fullfile.Substring(8,fullfile.Length-12)
As c:\file_ and .txt are fixed, You can take Substring starting at index 8 (skip leading name), upto length of total string length - 12 (12 => length of leading name, and trailing extension)

Thought I'd give a solution that uses Split and handles files with multiple underscores:
string.Join("_", Path.GetFileNameWithoutExtension(file).Split('_').Skip(1));

String.Split() works quite well for my uses:
http://msdn.microsoft.com/en-us/library/b873y76a.aspx

Obviously many ways to accomplish this. Here's yet another approach:
string fullfile = #"C:\file_test1.txt";
int index1 = fullfile.LastIndexOf("file_");
if (index1 != -1)
{
int index2 = fullfile.IndexOf(".", index1);
if (index2 != -1)
{
string section = fullfile.Substring(index1 + 5, index2 - index1 - 5);
}
}

You could also get "test1", or any subsequent filename (assuming your file naming convention remains constant!) using this regular expression:
var defaultRegex = new Regex(#"(?<=_).*(?=.txt)");
var matches = defaultRegex.Matches(fullfile);
var match = matches[0].Value;
The regular expression:
(?<=_).*(?=.txt)
uses positive look behind to find text preceded by '_', and also positive lookahead to find text which has '.txt' ahead of it.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove characters from C# string not belonging to a specicif code page - c#

Here is a regex to restrict input to a range of allowed characters: https://dotnetfiddle.net/WIrSSO const string Allowed = #"1-9\."; //Add allowed chars here string cleanStr = Regex.Replace("£1.11", "[^" + Allowed + "]", "");

Related

C# Finding numerical value from specific string

Identify the string that does not exists in another string using regex and C#

Convert String to Byte Array replacing characters between specific character

How to strip a string from the point a hyphen is found within the string C#

Split a string at 2 points

Categories

Resources