Fastest way to remove the leading special characters in string in c# - c#

I am using c# and i have a string like
-Xyz
--Xyz
---Xyz
-Xyz-Abc
--Xyz-Abc
i simply want to remove any leading special character until alphabet comes , Note: Special characters in the middle of string will remain same . What is the fastest way to do this?

You could use string.TrimStart and pass in the characters you want to remove:
var result = yourString.TrimStart('-', '_');
However, this is only a good idea if the number of special characters you want to remove is well-known and small.
If that's not the case, you can use regular expressions:
var result = Regex.Replace(yourString, "^[^A-Za-z0-9]*", "");

I prefer this two methods:
List<string> strings = new List<string>()
{
"-Xyz",
"--Xyz",
"---Xyz",
"-Xyz-Abc",
"--Xyz-Abc"
};
foreach (var s in strings)
{
string temp;
// String.Trim Method
char[] charsToTrim = { '*', ' ', '\'', '-', '_' }; // Add more
temp = s.TrimStart(charsToTrim);
Console.WriteLine(temp);
// Enumerable.SkipWhile Method
// Char.IsPunctuation Method (se also Char.IsLetter, Char.IsLetterOrDigit, etc.)
temp = new String(s.SkipWhile(x => Char.IsPunctuation(x)).ToArray());
Console.WriteLine(temp);
}

Related

Get everything before dot or comma c#

how can I get a substring of everything before dot or comma?
For example:
string input = "2.1";
int charLocation = text.IndexOf(".", StringComparison.Ordinal);
string test = input.Substring(0, charLocation );
but what if I have an input = "2,1" ?
I would like to do it in one method, not using twice a substring (once for dot and once for comma)?
string test = input.Split(new Char[] { ',', '.' })[0];
This will split the string for either comma or period...
input.Split(',','.');
Use the IndexOfAny function. It allows you to specify a list of characters to look for, rather than just a single character. You could then make a substring up to the return value of that function.
e.g.
char[] chars = { '.', ',' }
String out = s.Substring(0,s.IndexOfAny(chars));

c# replace each instance of a character selection in a string

I've found many references to do similar to this but none seem to be exactly what I'm after, so hoping someone could help.
In simple terms, I want to take a string entered by a user (into a Winform input), and firstly strip out any blanks, then replace any of a list of 'illegal' characters with the UK currency symbol (£). The requirement is for the input to be used but the file that is generated by the process has the modified filename.
I wrote a function (based on an extension method) but it's not working quite as expected:
public static class ExtensionMethods
{
public static string Replace(this string s, char[] separators, string newVal)
{
var temp = s.Split(separators, StringSplitOptions.RemoveEmptyEntries);
return String.Join(newVal, temp);
}
}
public static string RemoveUnwantedChars(string enteredName, char[] unwanted, string rChar)
{
return enteredName.Replace(unwanted, rChar);
}
Which in my code, I've called twice:
char[] blank = { ' ' };
string ename = Utilities.RemoveUnwantedChars(this.txtTableName.Text, blank, string.Empty);
char[] unwanted = { '(', ')', '.', '%', '/', '&', '+' };
string fname = Utilities.RemoveUnwantedChars(ename, unwanted, "£");
If I enter a string that contains at least one space, all of the characters above and some other letters (for example, " (GH) F16.5% M X/Y&1+1"), I get the following results:
ename = "(GH)F16.5%MX/Y&1+1" - this is correct in that it has removed the blanks.
fname = "GH£F16£5£MX£Y£1£1" - this hasn't worked correctly in that it has not replaced the first character but removed it.
The rest of the characters have been correctly replaced. It only occurs when one of the 'illegal' characters is at the start of the string - if my string was "G(H) F16.5% M X/Y&1+1", I would correctly get "G£H£F16£5£MX£Y£1£1". It also replaces multiple 'illegal' characters with one '£', so "M()GX+.1" would become "M£GX£1" but should be "M££GX££1".
I think the problem is in your Replace extension. You are splitting in this line
var temp = s.Split(separators, StringSplitOptions.RemoveEmptyEntries);
You are removing empty entries causing the unexpected result. Use this instead:
var temp = s.Split(separators, StringSplitOptions.None);
The problem is occuring because string.Join() only puts separators between substrings - it will never put one at the start.
One possible solution is to avoid using string.Join() and write Replace() like this instead:
public static class ExtensionMethods
{
public static string Replace(this string s, char[] separators, string newVal)
{
var sb = new StringBuilder(s);
foreach (char ch in separators)
{
string target = new string(ch, 1);
sb.Replace(target, newVal);
}
return sb.ToString();
}
}
When you use split method in your Replace function you get following strings:
GH, F16, 5, MX, Y, 1, 1.
When you join them with your newVal you get:
GH + newVal + F16 + newVal + ... thus omitting first replaced character.
You would probably need some special case to check if first char is "illegal" and put newVal at start of your string.

how to deal with string.split by position

I'd like to ask one question about String.Split
For example:
char[] semicolon=new [] {';'};
char[] bracket=new [] {'[',']'};
string str="AND[Firstpart;Sndpart]";
I can split str by bracket and then split by semicolon.
Finally,I get the Firstpart and Sndpart in the bracket.
But If str="AND[AND[Firstpart;Sndpart];sndpart];
How can I get AND[Firpart;Sndpart] and sndpart?
Is there a way to tell c# to split by second semicolon?
Thanks for your help
One way is to hide characters inside bracket with a character that is not used in any of your strings.
Method HideSplit: This method will change separator characters inside brackets with fake ones. Then it will perform split and will give back the result with original characters.
This method maybe an overkill if you want to do this many times. but you should be able to optimize it easily if you got the idea.
private static void Main()
{
char[] semicolon = new[] { ';' };
char[] bracket = new[] { '[', ']' };
string str = "AND[AND[Firstpart;Sndpart];sndpart]";
string[] splitbyBracket = HideSplit(str, bracket);
}
private static string[] HideSplit(string str,char[] separator)
{
int counter = 0; // When counter is more than 0 it means we are inside brackets
StringBuilder result = new StringBuilder(); // To build up string as result
foreach (char ch in str)
{
if(ch == ']') counter--;
if (counter > 0) // if we are inside brackets perform hide
{
if (ch == '[') result.Append('\uFFF0'); // add '\uFFF0' instead of '['
else if (ch == ']') result.Append('\uFFF1');
else if (ch == ';') result.Append('\uFFF2');
else result.Append(ch);
}
else result.Append(ch);
if (ch == '[') counter++;
}
string[] split = result.ToString().Split(separator); // Perform split. (characters are hidden now)
return split.Select(x => x
.Replace('\uFFF0', '[')
.Replace('\uFFF1', ']')
.Replace('\uFFF2', ';')).ToArray(); // unhide characters and give back result.
// dont forget: using System.Linq;
}
Some examples :
string[] a1 = HideSplit("AND[AND[Firstpart;Sndpart];sndpart]", bracket);
// Will give you this array { AND , AND[Firstpart;Sndpart];sndpart }
string[] a2 = HideSplit("AND[Firstpart;Sndpart];sndpart", semicolon);
// Will give you this array { AND[Firstpart;Sndpart] , sndpart }
string[] a3 = HideSplit("AND[Firstpart;Sndpart]", bracket);
// Will give you this array { AND , Firstpart;Sndpart }
string[] a4 = HideSplit("Firstpart;Sndpart", semicolon);
// Will give you this array { Firstpart , Sndpart }
And you can continue splitting this way.
Is there a way to tell c# to split by second semicolon?
There is no direct way to do that, but if that is precisely what you want, it's not hard to achieve:
string str="AND[AND[Firstpart;Sndpart];sndpart];
string[] tSplits = str.Split(';', 3);
string[] splits = { tSplits[0] + ";" + tSplits[1], tSplits[2] };
You could achieve the same result using a combination of IndexOf() and Substring(), however that is most likely not what you'll end up using as it's too specific and not very helpful for various inputs.
For your case, you need something that understands context.
In real-world complex cases you'd probably use a lexer / parser, but that seems like an overkill here.
Your best effort would probably be to use a loop, walk through all characters while counting +/- square brackets and spliting when you find a semicolon & the count is 1.
You can use Regex.Split, which is a more flexible form of String.Split:
string str = "AND[AND[Firstpart;Sndpart];sndpart]";
string[] arr = Regex.Split(str, #"(.*?;.*?;)");
foreach (var s in arr)
Console.WriteLine("'{0}'", s);
// output: ''
// 'AND[AND[Firstpart;Sndpart];'
// 'sndpart]'
Regex.Split splits not by chars, but by a string matching a regex expression, so it comes down to constructing a regex pattern meeting particular requirements. Splitting by a second semicolon is in practice splitting by a string that ends in a semicolon and that contains another semicolon before, so the matching pattern by which you split the input string could be for example: (.*?;.*?;).
The returned array has three elements instead of two because the splitting regex matches the beginning of the input string, in this case the empty string is returned as the first element.
You can read more on Regex.Split on msdn.

String Regex Help C#

I'm trying to create a regex that reads a string, and if the last character is something like !"£$% etc, it ignores the last character, reads the string (to allow my code to look it up in a dictionary class) and then outputs the string, with the character on the end it ignored. Is this actually possible, or do I have to just remove the last character?
So far...
foreach(var line in yourReader)
{
var dict = new Dictionary<string,string>(); // your replacement dictionaries
foreach(var kvp in dict)
{
System.Text.RegularExpressions.Regex.Replace(line,"(\s|,|\.|:|\\t)" + kvp.Key + "(\s|,|\.|:|\\t)","\0" + kvp.Value + "\1");
}
}
I've also been told to try this
var trans = textbox1.Text;
foreach (var kvp in d) //d is my dictionary so use yours
{
trans = trans.Replace(kvp.Key, kvp.Value);
}
textbox2.Text = trans;
but have literally no idea what it does
I didn't find any point using Regex, so I hope this will help:
const int ARRAY_OFFSET = 1;
List<char> ForbiddenChars = new List<char>()
{
'!', '#', '#', '$', '%', '^', '&', '*', '£' //Add more if you'd like
};
string myString = "Hello World!&";
foreach (var forbiddenChar in ForbiddenChars)
{
if (myString[myString.Length - ARRAY_OFFSET] == forbiddenChar)
{
myString = myString.Remove(myString.Length - ARRAY_OFFSET);
break;
}
}
Edit:
I checked the old code, and it had a problem: when the string's last "forbidden" characters were in order of the ForbiddenChars array it deleted all of them. if your string was "Hello World&!" it would delete both the ! and &. so I set a break; and it won't be a problem anymore.
Take a look at Regex.Replace. A regular expression such as [!"£$%]$ should do what you need.
In your case I'd recommend using the regex expression for a range of characters to remove the !"£$% etc.
The way you'd want to use this in your case would be something like:
"<the bit you want to capture>(?:[!-%]\\r)"
The (?:[!-%]\\r) bit matches, but doesn't store, a single character in range !-% which comes right before a carriage return character.
I also recommend using this handy cheat sheet of reg ex expressions:
http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet

Read text file word-by-word using LINQ

I am learning LINQ, and I want to read a text file (let's say an e-book) word by word using LINQ.
This is wht I could come up with:
static void Main()
{
string[] content = File.ReadAllLines("text.txt");
var query = (from c in content
select content);
foreach (var line in content)
{
Console.Write(line+"\n");
}
}
This reads the file line by line. If i change ReadAllLines to ReadAllText, the file is read letter by letter.
Any ideas?
string[] content = File.ReadAllLines("text.txt");
var words=content.SelectMany(line=>line.Split(' ', StringSplitOptions.RemoveEmptyEntries));
foreach(string word in words)
{
}
You'll need to add whatever whitespace characters you need. Using StringSplitOptions to deal with consecutive whitespaces is cleaner than the Where clause I originally used.
In .net 4 you can use File.ReadLines for lazy evaluation and thus lower RAM usage when working on large files.
string str = File.ReadAllText();
char[] separators = { '\n', ',', '.', ' ', '"', ' ' }; // add your own
var words = str.Split(separators, StringSplitOptions.RemoveEmptyEntries);
string content = File.ReadAllText("Text.txt");
var words = from word in content.Split(WhiteSpace, StringSplitOptions.RemoveEmptyEntries)
select word;
You will need to define the array of whitespace chars with your own values like so:
List<char> WhiteSpace = { Environment.NewLine, ' ' , '\t'};
This code assumes that panctuation is a part of the word (like a comma).
It's probably better to read all the text using ReadAllText() then use regular expressions to get the words. Using the space character as a delimiter can cause some troubles as it will also retrieve punctuation (commas, dots .. etc). For example:
Regex re = new Regex("[a-zA-Z0-9_-]+", RegexOptions.Compiled); // You'll need to change the RE to fit your needs
Match m = re.Match(text);
while (m.Success)
{
string word = m.Groups[1].Value;
// do your processing here
m = m.NextMatch();
}
The following uses iterator blocks, and therefore uses deferred loading. Other solutions have you loading the entire file into memory before being able to iterate over the words.
static IEnumerable<string> GetWords(string path){
foreach (var line in File.ReadLines(path)){
foreach (var word in line.Split(null)){
yield return word;
}
}
}
(Split(null) automatically removes whitespace)
Use it like this:
foreach (var word in GetWords(#"text.txt")){
Console.WriteLine(word);
}
Works with standard Linq funness too:
GetWords(#"text.txt").Take(25);
GetWords(#"text.txt").Where(w => w.Length > 3)
Of course error handling etc. left out for sake of learning.
You could write content.ToList().ForEach(p => p.Split(' ').ToList().ForEach(Console.WriteLine)) but that's not a lot of linq.

Categories