c# string manipulation - c#

I have string of 9 letters.
string myString = "123987898";
I want to retrieve the first 3 letters "123"
then 2 more letters "98"
and then 4 more letters "7898".
Which c# string function support this functionality.

You can use Substring:
myString.Substring(0,3)
myString.Substring(3,2)
myString.Substring(5,4)
It's also possible to make it more complicated than necessary by using a combination of regular expressions and LINQ:
string myString = "123987898";
Regex regex = new Regex("(.{3})(.{2})(.{4})");
string[] bits = regex
.Match(myString)
.Groups
.Cast<Group>()
.Skip(1)
.Select(match => match.Value)
.ToArray();

There is nothing built in but it is easy enough to make yourself.
public static IEnumerable<string> SplitBySize(string value, IEnumerable<int> sizes)
{
if (value == null) throw new ArgumentNullException("value");
if (sizes == null) throw new ArgumentNullException("sizes");
var length = value.Length;
var currentIndex = 0;
foreach (var size in sizes)
{
var nextIndex = currentIndex + size;
if (nextIndex > length)
{
throw new ArgumentException("The sum of the sizes specified is larger than the length of the value specified.", "sizes");
}
yield return value.Substring(currentIndex, size);
currentIndex = nextIndex;
}
}
Example Usage
foreach (var item in SplitBySize("1234567890", new[] { 2, 3, 5 }))
{
Console.WriteLine(item);
}
Console.ReadKey();

The best and robust way of dealing with this is to use a regex
public Regex MyRegex = new Regex(
"(?\\d{3})(?\\d{2})(?\\d{4})",
RegexOptions.IgnoreCase
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
Then you can access them through the Groups property of the Match instance
Match m = MyRegex.Match("123987898");
if (m.Success){
int first3 = int.Parse(m.Groups["first3"].Value;
int next2 = int.Parse(m.Groups["next2"].Value;
int last4 = int.Parse(m.Groups["last4"].Value;
/* Do whatever you have to do with first3, next2 and last 4! */
}

Related

Get count of unique characters between first and last letter

I'm trying to get the unique characters count that are between the first and last letter of a word. For example: if I type Yellow the expected output is Y3w, if I type People the output should be P4e and if I type Money the output should be M3y. This is what I tried:
//var strArr = wordToConvert.Split(' ');
string[] strArr = new[] { "Money","Yellow", "People" };
List<string> newsentence = new List<string>();
foreach (string word in strArr)
{
if (word.Length > 2)
{
//ignore 2-letter words
string newword = null;
int distinctCount = 0;
int k = word.Length;
int samecharcount = 0;
int count = 0;
for (int i = 1; i < k - 2; i++)
{
if (word.ElementAt(i) != word.ElementAt(i + 1))
{
count++;
}
else
{
samecharcount++;
}
}
distinctCount = count + samecharcount;
char frst = word[0];
char last = word[word.Length - 1];
newword = String.Concat(frst, distinctCount.ToString(), last);
newsentence.Add(newword);
}
else
{
newsentence.Add(word);
}
}
var result = String.Join(" ", newsentence.ToArray());
Console.WriteLine("Output: " + result);
Console.WriteLine("----------------------------------------------------");
With this code I'm getting the expect output for Yellow, but seems that is not working with People and Money. What can I do to fix this issue or also I'm wondering is maybe there is a better way to do this for example using LINQ/Regex.
Here's an implementation that uses Linq:
string[] strArr = new[]{"Money", "Yellow", "People"};
List<string> newsentence = new List<string>();
foreach (string word in strArr)
{
if (word.Length > 2)
{
// we want the first letter, the last letter, and the distinct count of everything in between
var first = word.First();
var last = word.Last();
var others = word.Skip(1).Take(word.Length - 2);
// Case sensitive
var distinct = others.Distinct();
// Case insensitive
// var distinct = others.Select(c => char.ToLowerInvariant(c)).Distinct();
string newword = first + distinct.Count().ToString() + last;
newsentence.Add(newword);
}
else
{
newsentence.Add(word);
}
}
var result = String.Join(" ", newsentence.ToArray());
Console.WriteLine(result);
Output:
M3y Y3w P4e
Note that this doesn't take account of case, so the output for FiIsSh is 4.
Maybe not the most performant, but here is another example using linq:
var words = new[] { "Money","Yellow", "People" };
var transformedWords = words.Select(Transform);
var sentence = String.Join(' ', transformedWords);
public string Transform(string input)
{
if (input.Length < 3)
{
return input;
}
var count = input.Skip(1).SkipLast(1).Distinct().Count();
return $"{input[0]}{count}{input[^1]}";
}
You can implement it with the help of Linq. e.g. (C# 8+)
private static string EncodeWord(string value) => value.Length <= 2
? value
: $"{value[0]}{value.Substring(1, value.Length - 2).Distinct().Count()}{value[^1]}";
Demo:
string[] tests = new string[] {
"Money","Yellow", "People"
};
var report = string.Join(Environment.NewLine, tests
.Select(test => $"{test} :: {EncodeWord(test)}"));
Console.Write(report);
Outcome:
Money :: M3y
Yellow :: Y3w
People :: P4e
A lot of people have put up some good solutions. I have two solutions for you: one uses LINQ and the other does not.
LINQ, Probably not much different from others
if (str.Length < 3) return str;
var midStr = str.Substring(1, str.Length - 2);
var midCount = midStr.Distinct().Count();
return string.Concat(str[0], midCount, str[str.Length - 1]);
Non-LINQ
if (str.Length < 3) return str;
var uniqueLetters = new Dictionary<char, int>();
var midStr = str.Substring(1, str.Length - 2);
foreach (var c in midStr)
{
if (!uniqueLetters.ContainsKey(c))
{
uniqueLetters.Add(c, 0);
}
}
var midCount = uniqueLetters.Keys.Count();
return string.Concat(str[0], midCount, str[str.Length - 1]);
I tested this with the following 6 strings:
Yellow
Money
Purple
Me
You
Hiiiiiiiii
Output:
LINQ: Y3w, Non-LINQ: Y3w
LINQ: M3y, Non-LINQ: M3y
LINQ: P4e, Non-LINQ: P4e
LINQ: Me, Non-LINQ: Me
LINQ: Y1u, Non-LINQ: Y1u
LINQ: H1i, Non-LINQ: H1i
Fiddle
Performance-wise I'd guess they're pretty much the same, if not identical, but I haven't run any real perf test on the two approaches. I can't imagine they'd be much different, if at all. The only real difference is that the second route expands Distinct() into what it probably does under the covers anyway (I haven't looked at the source to see if that's true, but that's a pretty common way to get a count of . And the first route is certainly less code.
I Would use Linq for that purpose:
string[] words = new string[] { "Yellow" , "People", "Money", "Sh" }; // Sh for 2 letter words (or u can insert 0 and then remove the trinary operator)
foreach (string word in words)
{
int uniqeCharsInBetween = word.Substring(1, word.Length - 2).ToCharArray().Distinct().Count();
string result = word[0] + (uniqeCharsInBetween == 0 ? string.Empty : uniqeCharsInBetween.ToString()) + word[word.Length - 1];
Console.WriteLine(result);
}

Getting a numbers from a string with chars glued

I need to recover each number in a glued string
For example, from these strings:
string test = "number1+3"
string test1 = "number 1+4"
I want to recover (1 and 3) and (1 and 4)
How can I do this?
CODE
string test= "number1+3";
List<int> res;
string[] digits= Regex.Split(test, #"\D+");
foreach (string value in digits)
{
int number;
if (int.TryParse(value, out number))
{
res.Add(number)
}
}
This regex should work
string pattern = #"\d+";
string test = "number1+3";
foreach (Match match in Regex.Matches(test, pattern))
Console.WriteLine("Found '{0}' at position {1}",
match.Value, match.Index);
Note that if you intend to use it multiple times, it's better, for performance reasons, to create a Regex instance than using this static method.
var res = new List<int>();
var regex = new Regex(#"\d+");
void addMatches(string text) {
foreach (Match match in regex.Matches(text))
{
int number = int.Parse(match.Value);
res.Add(number);
}
}
string test = "number1+3";
addMatches(test);
string test1 = "number 1+4";
addMatches(test1);
MSDN link.
Fiddle 1
Fiddle 2
This calls for a regular expression:
(\d+)\+(\d+)
Test it
Match m = Regex.Match(input, #"(\d+)\+(\d+)");
string first = m.Groups[1].Captures[0].Value;
string second = m.Groups[2].Captures[0].Value;
An alternative to regular expressions:
string test = "number 1+4";
int[] numbers = test.Replace("number", string.Empty, StringComparison.InvariantCultureIgnoreCase)
.Trim()
.Split("+", StringSplitOptions.RemoveEmptyEntries)
.Select(x => Convert.ToInt32(x))
.ToArray();

Finding the longest substring regex?

Someone knows how to find the longest substring composed of letters using using MatchCollection.
public static Regex pattern2 = new Regex("[a-zA-Z]");
public static string zad3 = "ala123alama234ijeszczepsa";
You can loop over all matches and get the longest:
string max = "";
foreach (Match match in Regex.Matches(zad3, "[a-zA-Z]+"))
if (max.Length < match.Value.Length)
max = match.Value;
Try this:
MatchCollection matches = pattern2.Matches(txt);
List<string> strLst = new List<string>();
foreach (Match match in matches)
strLst.Add(match.Value);
var maxStr1 = strLst.OrderByDescending(s => s.Length).First();
or better way :
var maxStr2 = matches.Cast<Match>().Select(m => m.Value).ToArray().OrderByDescending(s => s.Length).First();
best solution for your task is:
string zad3 = "ala123alama234ijeszczepsa54dsfd";
string max = Regex.Split(zad3,#"\d+").Max(x => x);
You must change your Regex pattern to include the repetition operator + so that it matches more than once.
[a-zA-Z] should be [a-zA-Z]+
You can get the longest value using LINQ. Order by the match length descending and then take the first entry. If there are no matches the result is null.
string pattern2 = "[a-zA-Z]+";
string zad3 = "ala123alama234ijeszczepsa";
var matches = Regex.Matches(zad3, pattern2);
string result = matches
.Cast<Match>()
.OrderByDescending(x => x.Value.Length)
.FirstOrDefault()?
.Value;
The string named result in this example is:
ijeszczepsa
Using linq and the short one:
string longest= Regex.Matches(zad3, pattern2).Cast<Match>()
.OrderByDescending(x => x.Value.Length).FirstOrDefault()?.Value;
you can find it in O(n) like this (if you do not want to use regex):
string zad3 = "ala123alama234ijeszczepsa";
int max=0;
int count=0;
for (int i=0 ; i<zad3.Length ; i++)
{
if (zad3[i]>='0' && zad3[i]<='9')
{
if (count > max)
max=count;
count=0;
continue;
}
count++;
}
if (count > max)
max=count;
Console.WriteLine(max);

Find string in an array based on first word

I'm trying to filter out an array based on 2 keywords that must be in order, So far i've got this:
string[] matchedOne = Array.FindAll(converList, s => s.Contains(split[1]));
string[] matchedTwo = Array.FindAll(matchedOne, s => s.Contains(split[2]));
if (matchedTwo.Length == 0)
{
Console.Clear();
Console.WriteLine("Sorry, Your Conversion is invalid");
Main();
}
converlist =
ounce,gram,28.0
ounce,fake,28.0 - Fake one I added for examples
gram, ounce, 3.0 - Fake one I added for examples
pound,ounce,16.0
pound,kilogram,0.454
pint,litre,0.568
inch, centimetre,2.5
mile,inch,63360.0
If the user types in 5, ounce, gram, When passed through "matchedOne" it would find; "once,gram,28.0" and "ounce,fake,28.0" . not those, "pound,ounce,16.0" and "gram,ounce,3.0" as it does now.
Then in "matchedTwo" it would only find "once,gram,28.0" not that and "gram,ounce,3.0"
-- Just to add: I cant use anything over "system;".
var regex = new Regex(
string.Format("^.*{0}.*{1}.*$",
Regex.Escape(split[1]), Regex.Escape(split[2])),
RegexOptions.IgnoreCase
| RegexOptions.Multiline
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
Match m = regex.Match(converlist);
which basically just matches a line where split[1] comes before split[2]
or without regex
string[] match = Array.FindAll(matchedOne, s => s.IndexOf(split[1])==-1?false: s.IndexOf(split[2], s.IndexOf(split[1])) != -1);
and working conversion...
const string converlist = "ounce,gram,28.0\r\nounce,fake,28.0\r\ngram, ounce, 3.0\r\npound,ounce,16.0\r\npound,kilogram,0.454\r\npint,litre,0.568\r\ninch, centimetre,2.5\r\nmile,inch,63360.0\r\n";
var split = "5,ounce,gram".Split(new[] { ',' });
var list = converlist.Split(new[]{"\r\n"}, StringSplitOptions.RemoveEmptyEntries).ToList();
var matches = list.FindAll(s => s.IndexOf(split[1])==-1?false: s.IndexOf(split[2], s.IndexOf(split[1])) != -1);
var conversionLine = matches[0];
Console.WriteLine(conversionLine);
var conversionFactor = decimal.Parse(conversionLine.Split(new[] { ',' })[2]);
var valueToConvert = decimal.Parse(split[0].Trim());
Console.WriteLine(string.Format("{0} {2}s is {1} {3}s", valueToConvert, conversionFactor * valueToConvert, split[1], split[2]));
Above answer satisfies what you need, however, this will also work.
string[] s = { "ounce,gram", "gram,ounce", "pound,carret" };
foreach (string temp in s.Where(x => (x.IndexOf("ounce")>-1) && (x.IndexOf("ounce") < x.IndexOf("gram"))))
Debug.WriteLine(temp);

Splitting a string array

I have a string array string[] arr, which contains values like N36102W114383, N36102W114382 etc...
I want to split the each and every string such that the value comes like this N36082 and W115080.
What is the best way to do this?
This should work for you.
Regex regexObj = new Regex(#"\w\d+"); # matches a character followed by a sequence of digits
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
matchResults = matchResults.NextMatch(); #two mathches N36102 and W114383
}
If you have the fixed format every time you can just do this:
string[] split_data = data_string.Insert(data_string.IndexOf("W"), ",")
.Split(",", StringSplitOptions.None);
Here you insert a recognizable delimiter into your string and then split it by this delimiter.
Forgive me if this doesn't quite compile, but I'd just break down and write the string processing function by hand:
public static IEnumerable<string> Split(string str)
{
char [] chars = str.ToCharArray();
int last = 0;
for(int i = 1; i < chars.Length; i++) {
if(char.IsLetter(chars[i])) {
yield return new string(chars, last, i - last);
last = i;
}
}
yield return new string(chars, last, chars.Length - last);
}
If you use C#, please try:
String[] code = new Regex("(?:([A-Z][0-9]+))").Split(text).Where(e => e.Length > 0 && e != ",").ToArray();
in case you're only looking for the format NxxxxxWxxxxx, this will do just fine :
Regex r = new Regex(#"(N[0-9]+)(W[0-9]+)");
Match mc = r.Match(arr[i]);
string N = mc.Groups[1];
string W = mc.Groups[2];
Using the 'Split' and 'IsLetter' string functions, this is relatively easy in c#.
Don't forget to write unit tests - the following may have some corner case errors!
// input has form "N36102W114383, N36102W114382"
// output: "N36102", "W114383", "N36102", "W114382", ...
string[] ParseSequenceString(string input)
{
string[] inputStrings = string.Split(',');
List<string> outputStrings = new List<string>();
foreach (string value in inputstrings) {
List<string> valuesInString = ParseValuesInString(value);
outputStrings.Add(valuesInString);
}
return outputStrings.ToArray();
}
// input has form "N36102W114383"
// output: "N36102", "W114383"
List<string> ParseValuesInString(string inputString)
{
List<string> outputValues = new List<string>();
string currentValue = string.Empty;
foreach (char c in inputString)
{
if (char.IsLetter(c))
{
if (currentValue .Length == 0)
{
currentValue += c;
} else
{
outputValues.Add(currentValue);
currentValue = string.Empty;
}
}
currentValue += c;
}
outputValues.Add(currentValue);
return outputValues;
}

Categories