Get String (Text) before next upper letter - c#

I have the following:
string test = "CustomerNumber";
or
string test2 = "CustomerNumberHello";
the result should be:
string result = "Customer";
The first word from the string is the result, the first word goes until the first uppercase letter, here 'N'
I already tried some things like this:
var result = string.Concat(s.Select(c => char.IsUpper(c) ? " " + c.ToString() : c.ToString()))
.TrimStart();
But without success, hope someone could offer me a small and clean solution (without RegEx).

The following should work:
var result = new string(
test.TakeWhile((c, index) => index == 0 || char.IsLower(c)).ToArray());

You could just go through the string to see which values (ASCII) are below 97 and remove the end. Not the prettiest or LINQiest way, but it works...
string test2 = "CustomerNumberHello";
for (int i = 1; i < test2.Length; i++)
{
if (test2[i] < 97)
{
test2 = test2.Remove(i, test2.Length - i);
break;
}
}
Console.WriteLine(test2); // Prints Customer

Try this
private static string GetFirstWord(string source)
{
return source.Substring(0, source.IndexOfAny("ABCDEFGHIJKLMNOPQRSTUVWXYZ".ToArray(), 1));
}

Z][a-z]+ regex it will split the string to string that start with big letters her is an example
regex = "[A-Z][a-z]+";
MatchCollection mc = Regex.Matches(richTextBox1.Text, regex);
foreach (Match match in mc)
if (!match.ToString().Equals(""))
Console.writln(match.ToString() + "\n");

I have tested, this works:
string cust = "CustomerNumberHello";
string[] str = System.Text.RegularExpressions.Regex.Split(cust, #"[a-z]+");
string str2 = cust.Remove(cust.IndexOf(str[1], 1));

Related

Get count of unique characters between first and last letter

I'm trying to get the unique characters count that are between the first and last letter of a word. For example: if I type Yellow the expected output is Y3w, if I type People the output should be P4e and if I type Money the output should be M3y. This is what I tried:
//var strArr = wordToConvert.Split(' ');
string[] strArr = new[] { "Money","Yellow", "People" };
List<string> newsentence = new List<string>();
foreach (string word in strArr)
{
if (word.Length > 2)
{
//ignore 2-letter words
string newword = null;
int distinctCount = 0;
int k = word.Length;
int samecharcount = 0;
int count = 0;
for (int i = 1; i < k - 2; i++)
{
if (word.ElementAt(i) != word.ElementAt(i + 1))
{
count++;
}
else
{
samecharcount++;
}
}
distinctCount = count + samecharcount;
char frst = word[0];
char last = word[word.Length - 1];
newword = String.Concat(frst, distinctCount.ToString(), last);
newsentence.Add(newword);
}
else
{
newsentence.Add(word);
}
}
var result = String.Join(" ", newsentence.ToArray());
Console.WriteLine("Output: " + result);
Console.WriteLine("----------------------------------------------------");
With this code I'm getting the expect output for Yellow, but seems that is not working with People and Money. What can I do to fix this issue or also I'm wondering is maybe there is a better way to do this for example using LINQ/Regex.
Here's an implementation that uses Linq:
string[] strArr = new[]{"Money", "Yellow", "People"};
List<string> newsentence = new List<string>();
foreach (string word in strArr)
{
if (word.Length > 2)
{
// we want the first letter, the last letter, and the distinct count of everything in between
var first = word.First();
var last = word.Last();
var others = word.Skip(1).Take(word.Length - 2);
// Case sensitive
var distinct = others.Distinct();
// Case insensitive
// var distinct = others.Select(c => char.ToLowerInvariant(c)).Distinct();
string newword = first + distinct.Count().ToString() + last;
newsentence.Add(newword);
}
else
{
newsentence.Add(word);
}
}
var result = String.Join(" ", newsentence.ToArray());
Console.WriteLine(result);
Output:
M3y Y3w P4e
Note that this doesn't take account of case, so the output for FiIsSh is 4.
Maybe not the most performant, but here is another example using linq:
var words = new[] { "Money","Yellow", "People" };
var transformedWords = words.Select(Transform);
var sentence = String.Join(' ', transformedWords);
public string Transform(string input)
{
if (input.Length < 3)
{
return input;
}
var count = input.Skip(1).SkipLast(1).Distinct().Count();
return $"{input[0]}{count}{input[^1]}";
}
You can implement it with the help of Linq. e.g. (C# 8+)
private static string EncodeWord(string value) => value.Length <= 2
? value
: $"{value[0]}{value.Substring(1, value.Length - 2).Distinct().Count()}{value[^1]}";
Demo:
string[] tests = new string[] {
"Money","Yellow", "People"
};
var report = string.Join(Environment.NewLine, tests
.Select(test => $"{test} :: {EncodeWord(test)}"));
Console.Write(report);
Outcome:
Money :: M3y
Yellow :: Y3w
People :: P4e
A lot of people have put up some good solutions. I have two solutions for you: one uses LINQ and the other does not.
LINQ, Probably not much different from others
if (str.Length < 3) return str;
var midStr = str.Substring(1, str.Length - 2);
var midCount = midStr.Distinct().Count();
return string.Concat(str[0], midCount, str[str.Length - 1]);
Non-LINQ
if (str.Length < 3) return str;
var uniqueLetters = new Dictionary<char, int>();
var midStr = str.Substring(1, str.Length - 2);
foreach (var c in midStr)
{
if (!uniqueLetters.ContainsKey(c))
{
uniqueLetters.Add(c, 0);
}
}
var midCount = uniqueLetters.Keys.Count();
return string.Concat(str[0], midCount, str[str.Length - 1]);
I tested this with the following 6 strings:
Yellow
Money
Purple
Me
You
Hiiiiiiiii
Output:
LINQ: Y3w, Non-LINQ: Y3w
LINQ: M3y, Non-LINQ: M3y
LINQ: P4e, Non-LINQ: P4e
LINQ: Me, Non-LINQ: Me
LINQ: Y1u, Non-LINQ: Y1u
LINQ: H1i, Non-LINQ: H1i
Fiddle
Performance-wise I'd guess they're pretty much the same, if not identical, but I haven't run any real perf test on the two approaches. I can't imagine they'd be much different, if at all. The only real difference is that the second route expands Distinct() into what it probably does under the covers anyway (I haven't looked at the source to see if that's true, but that's a pretty common way to get a count of . And the first route is certainly less code.
I Would use Linq for that purpose:
string[] words = new string[] { "Yellow" , "People", "Money", "Sh" }; // Sh for 2 letter words (or u can insert 0 and then remove the trinary operator)
foreach (string word in words)
{
int uniqeCharsInBetween = word.Substring(1, word.Length - 2).ToCharArray().Distinct().Count();
string result = word[0] + (uniqeCharsInBetween == 0 ? string.Empty : uniqeCharsInBetween.ToString()) + word[word.Length - 1];
Console.WriteLine(result);
}

Retrieve String Containing Specific substring C#

I am having an output in string format like following :
"ABCDED 0000A1.txt PQRSNT 12345"
I want to retreieve substring(s) having .txt in above string. e.g. For above it should return 0000A1.txt.
Thanks
You can either split the string at whitespace boundaries like it's already been suggested or repeatedly match the same regex like this:
var input = "ABCDED 0000A1.txt PQRSNT 12345 THE.txt FOO";
var match = Regex.Match (input, #"\b([\w\d]+\.txt)\b");
while (match.Success) {
Console.WriteLine ("TEST: {0}", match.Value);
match = match.NextMatch ();
}
Split will work if it the spaces are the seperator. if you use oter seperators you can add as needed
string input = "ABCDED 0000A1.txt PQRSNT 12345";
string filename = input.Split(' ').FirstOrDefault(f => System.IO.Path.HasExtension(f));
filname = "0000A1.txt" and this will work for any extension
You may use c#, regex and pattern, match :)
Here is the code, plug it in try. Please comment.
string test = "afdkljfljalf dkfjd.txt lkjdfjdl";
string ffile = Regex.Match(test, #"\([a-z0-9])+.txt").Groups[1].Value;
Console.WriteLine(ffile);
Reference: regexp
I did something like this:
string subString = "";
char period = '.';
char[] chArString;
int iSubStrIndex = 0;
if (myString != null)
{
chArString = new char[myString.Length];
chArString = myString.ToCharArray();
for (int i = 0; i < myString.Length; i ++)
{
if (chArString[i] == period)
iSubStrIndex = i;
}
substring = myString.Substring(iSubStrIndex);
}
Hope that helps.
First split your string in array using
char[] whitespace = new char[] { ' ', '\t' };
string[] ssizes = myStr.Split(whitespace);
Then find .txt in array...
// Find first element starting with .txt.
//
string value1 = Array.Find(array1,
element => element.Contains(".txt", StringComparison.Ordinal));
Now your value1 will have the "0000A1.txt"
Happy coding.

Find NOT matching characters in a string with regex?

If Im able to check a string if there are invalid characters:
Regex r = new Regex("[^A-Z]$");
string myString = "SOMEString";
if (r.IsMatch(myString))
{
Console.WriteLine("invalid string!");
}
it is fine. But what I would like to print out every invalid character in this string? Like in the example SOMEString => invalid chars are t,r,i,n,g. Any ideas?
Use LINQ. Following will give you an array of 5 elements, not matching to the regex.
char[] myCharacterArray = myString.Where(c => r.IsMatch(c.ToString())).ToArray();
foreach (char c in myCharacterArray)
{
Console.WriteLine(c);
}
Output will be:
t
r
i
n
g
EDIT:
It looks like, you want to treat all lower case characters as invalid string. You may try:
char[] myCharacterArray2 = myString
.Where(c => ((int)c) >= 97 && ((int)c) <= 122)
.ToArray();
In your example the regex would succeed on one character since it's looking for the last character if it isn't uppercase, and your string has such a character.
The regex should be changed to Regex r = new Regex("[^A-Z]");.
(updated following #Chris's comments)
However, for your purpose the regex is actually what you want - just use Matches.
e.g.:
foreach (Match item in r.Matches(myString))
{
Console.WriteLine(item.ToString() + " is invalid");
}
Or, if you want one line:
foreach (Match item in r.Matches(myString))
{
str += item.ToString() + ", ";
}
Console.WriteLine(str + " are invalid");
Try with this:
char[] list = new char[5];
Regex r = new Regex("[^A-Z]*$");
string myString = "SOMEString";
foreach (Match match in r.Matches(myString))
{
list = match.Value.ToCharArray();
break;
}
string str = "invalid chars are ";
foreach (char ch in list)
{
str += ch + ", ";
}
Console.Write(str);
OUTPUT: invalid chars are t, r, i, n, g

All elements before last comma in a string in c#

How can i get all elements before comma(,) in a string in c#?
For e.g.
if my string is say
string s = "a,b,c,d";
then I want all the element before d i.e. before the last comma.So my new string shout look like
string new_string = "a,b,c";
I have tried split but with that i can only one particular element at a time.
string new_string = s.Remove(s.LastIndexOf(','));
If you want everything before the last occurrence, use:
int lastIndex = input.LastIndexOf(',');
if (lastIndex == -1)
{
// Handle case with no commas
}
else
{
string beforeLastIndex = input.Substring(0, lastIndex);
...
}
Use the follwoing regex: "(.*),"
Regex rgx = new Regex("(.*),");
string s = "a,b,c,d";
Console.WriteLine(rgx.Match(s).Groups[1].Value);
You can also try:
string s = "a,b,c,d";
string[] strArr = s.Split(',');
Array.Resize(strArr, Math.Max(strArr.Length - 1, 1))
string truncatedS = string.join(",", strArr);

Find substring ignoring specified characters

Do any of you know of an easy/clean way to find a substring within a string while ignoring some specified characters to find it. I think an example would explain things better:
string: "Hello, -this- is a string"
substring to find: "Hello this"
chars to ignore: "," and "-"
found the substring, result: "Hello, -this"
Using Regex is not a requirement for me, but I added the tag because it feels related.
Update:
To make the requirement clearer: I need the resulting substring with the ignored chars, not just an indication that the given substring exists.
Update 2:
Some of you are reading too much into the example, sorry, i'll give another scenario that should work:
string: "?A&3/3/C)412&"
substring to find: "A41"
chars to ignore: "&", "/", "3", "C", ")"
found the substring, result: "A&3/3/C)41"
And as a bonus (not required per se), it will be great if it's also not safe to assume that the substring to find will not have the ignored chars on it, e.g.: given the last example we should be able to do:
substring to find: "A3C412&"
chars to ignore: "&", "/", "3", "C", ")"
found the substring, result: "A&3/3/C)412&"
Sorry if I wasn't clear before, or still I'm not :).
Update 3:
Thanks to everyone who helped!, this is the implementation I'm working with for now:
http://www.pastebin.com/pYHbb43Z
An here are some tests:
http://www.pastebin.com/qh01GSx2
I'm using some custom extension methods I'm not including but I believe they should be self-explainatory (I will add them if you like)
I've taken a lot of your ideas for the implementation and the tests but I'm giving the answer to #PierrOz because he was one of the firsts, and pointed me in the right direction.
Feel free to keep giving suggestions as alternative solutions or comments on the current state of the impl. if you like.
in your example you would do:
string input = "Hello, -this-, is a string";
string ignore = "[-,]*";
Regex r = new Regex(string.Format("H{0}e{0}l{0}l{0}o{0} {0}t{0}h{0}i{0}s{0}", ignore));
Match m = r.Match(input);
return m.Success ? m.Value : string.Empty;
Dynamically you would build the part [-, ] with all the characters to ignore and you would insert this part between all the characters of your query.
Take care of '-' in the class []: put it at the beginning or at the end
So more generically, it would give something like:
public string Test(string query, string input, char[] ignorelist)
{
string ignorePattern = "[";
for (int i=0; i<ignoreList.Length; i++)
{
if (ignoreList[i] == '-')
{
ignorePattern.Insert(1, "-");
}
else
{
ignorePattern += ignoreList[i];
}
}
ignorePattern += "]*";
for (int i = 0; i < query.Length; i++)
{
pattern += query[0] + ignorepattern;
}
Regex r = new Regex(pattern);
Match m = r.Match(input);
return m.IsSuccess ? m.Value : string.Empty;
}
Here's a non-regex string extension option:
public static class StringExtensions
{
public static bool SubstringSearch(this string s, string value, char[] ignoreChars, out string result)
{
if (String.IsNullOrEmpty(value))
throw new ArgumentException("Search value cannot be null or empty.", "value");
bool found = false;
int matches = 0;
int startIndex = -1;
int length = 0;
for (int i = 0; i < s.Length && !found; i++)
{
if (startIndex == -1)
{
if (s[i] == value[0])
{
startIndex = i;
++matches;
++length;
}
}
else
{
if (s[i] == value[matches])
{
++matches;
++length;
}
else if (ignoreChars != null && ignoreChars.Contains(s[i]))
{
++length;
}
else
{
startIndex = -1;
matches = 0;
length = 0;
}
}
found = (matches == value.Length);
}
if (found)
{
result = s.Substring(startIndex, length);
}
else
{
result = null;
}
return found;
}
}
EDIT: here's an updated solution addressing the points in your recent update. The idea is the same except if you have one substring it will need to insert the ignore pattern between each character. If the substring contains spaces it will split on the spaces and insert the ignore pattern between those words. If you don't have a need for the latter functionality (which was more in line with your original question) then you can remove the Split and if checking that provides that pattern.
Note that this approach is not going to be the most efficient.
string input = #"foo ?A&3/3/C)412& bar A341C2";
string substring = "A41";
string[] ignoredChars = { "&", "/", "3", "C", ")" };
// builds up the ignored pattern and ensures a dash char is placed at the end to avoid unintended ranges
string ignoredPattern = String.Concat("[",
String.Join("", ignoredChars.Where(c => c != "-")
.Select(c => Regex.Escape(c)).ToArray()),
(ignoredChars.Contains("-") ? "-" : ""),
"]*?");
string[] substrings = substring.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
string pattern = "";
if (substrings.Length > 1)
{
pattern = String.Join(ignoredPattern, substrings);
}
else
{
pattern = String.Join(ignoredPattern, substring.Select(c => c.ToString()).ToArray());
}
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine("Index: {0} -- Match: {1}", match.Index, match.Value);
}
Try this solution out:
string input = "Hello, -this- is a string";
string[] searchStrings = { "Hello", "this" };
string pattern = String.Join(#"\W+", searchStrings);
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine(match.Value);
}
The \W+ will match any non-alphanumeric character. If you feel like specifying them yourself, you can replace it with a character class of the characters to ignore, such as [ ,.-]+ (always place the dash character at the start or end to avoid unintended range specifications). Also, if you need case to be ignored use RegexOptions.IgnoreCase:
Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
If your substring is in the form of a complete string, such as "Hello this", you can easily get it into an array form for searchString in this way:
string[] searchString = substring.Split(new[] { ' ' },
StringSplitOptions.RemoveEmptyEntries);
This code will do what you want, although I suggest you modify it to fit your needs better:
string resultString = null;
try
{
resultString = Regex.Match(subjectString, "Hello[, -]*this", RegexOptions.IgnoreCase).Value;
}
catch (ArgumentException ex)
{
// Syntax error in the regular expression
}
You could do this with a single Regex but it would be quite tedious as after every character you would need to test for zero or more ignored characters. It is probably easier to strip all the ignored characters with Regex.Replace(subject, "[-,]", ""); then test if the substring is there.
Or the single Regex way
Regex.IsMatch(subject, "H[-,]*e[-,]*l[-,]*l[-,]*o[-,]* [-,]*t[-,]*h[-,]*i[-,]*s[-,]*")
Here's a non-regex way to do it using string parsing.
private string GetSubstring()
{
string searchString = "Hello, -this- is a string";
string searchStringWithoutUnwantedChars = searchString.Replace(",", "").Replace("-", "");
string desiredString = string.Empty;
if(searchStringWithoutUnwantedChars.Contains("Hello this"))
desiredString = searchString.Substring(searchString.IndexOf("Hello"), searchString.IndexOf("this") + 4);
return desiredString;
}
You could do something like this, since most all of these answer require rebuilding the string in some form.
string1 is your string you want to look through
//Create a List(Of string) that contains the ignored characters'
List<string> ignoredCharacters = new List<string>();
//Add all of the characters you wish to ignore in the method you choose
//Use a function here to get a return
public bool subStringExist(List<string> ignoredCharacters, string myString, string toMatch)
{
//Copy Your string to a temp
string tempString = myString;
bool match = false;
//Replace Everything that you don't want
foreach (string item in ignoredCharacters)
{
tempString = tempString.Replace(item, "");
}
//Check if your substring exist
if (tempString.Contains(toMatch))
{
match = true;
}
return match;
}
You could always use a combination of RegEx and string searching
public class RegExpression {
public static void Example(string input, string ignore, string find)
{
string output = string.Format("Input: {1}{0}Ignore: {2}{0}Find: {3}{0}{0}", Environment.NewLine, input, ignore, find);
if (SanitizeText(input, ignore).ToString().Contains(SanitizeText(find, ignore)))
Console.WriteLine(output + "was matched");
else
Console.WriteLine(output + "was NOT matched");
Console.WriteLine();
}
public static string SanitizeText(string input, string ignore)
{
Regex reg = new Regex("[^" + ignore + "]");
StringBuilder newInput = new StringBuilder();
foreach (Match m in reg.Matches(input))
{
newInput.Append(m.Value);
}
return newInput.ToString();
}
}
Usage would be like
RegExpression.Example("Hello, -this- is a string", "-,", "Hello this"); //Should match
RegExpression.Example("Hello, -this- is a string", "-,", "Hello this2"); //Should not match
RegExpression.Example("?A&3/3/C)412&", "&/3C\\)", "A41"); // Should match
RegExpression.Example("?A&3/3/C) 412&", "&/3C\\)", "A41"); // Should not match
RegExpression.Example("?A&3/3/C)412&", "&/3C\\)", "A3C412&"); // Should match
Output
Input: Hello, -this- is a string
Ignore: -,
Find: Hello this
was matched
Input: Hello, -this- is a string
Ignore: -,
Find: Hello this2
was NOT matched
Input: ?A&3/3/C)412&
Ignore: &/3C)
Find: A41
was matched
Input: ?A&3/3/C) 412&
Ignore: &/3C)
Find: A41
was NOT matched
Input: ?A&3/3/C)412&
Ignore: &/3C)
Find: A3C412&
was matched

Categories