LINQ conditional selection and formatting - c#

I have a string of characters and I'm trying to set up a query that'll substitute a specific sequence of similar characters into a character count. Here's an example of what I'm trying to do:
agd69dnbd555bdggjykbcx555555bbb
In this case, I'm trying to isolate and count ONLY the occurrences of the number 5, so my output should read:
agd69dnbd3bdggjykbcx6bbb
My current code is the following, where GroupAdjacentBy is a function that groups and counts the character occurrences as above.
var res = text
.GroupAdjacentBy((l, r) => l == r)
.Select(x => new { n = x.First(), c = x.Count()})
.ToArray();
The problem is that the above function groups and counts EVERY SINGLE character in my string, not the just the one character I'm after. Is there a way to conditionally perform that operation on ONLY the character I need counted?

Regex is a better tool for this job than LINQ.
Have a look at this:
string input = "agd69dnbd555bdggjykbcx555555bbb";
string pattern = #"5+"; // Match 1 or more adjacent 5 character
string output = Regex.Replace(input, pattern, match => match.Length.ToString());
// output = agd69dnbd3bdggjykbcx6bbb
Not sure if your intending to replace every 5 character, or just when there is more than one adjacent 5.
If it's the latter, just change your pattern to:
string pattern = #"5{2,}"; // Match 2 or more adjacent 5's

The best answer was already given by Johnathan Barclay. But just for the case that you need something similar by using Linq and to show an alternative solution:
var charToCombine = '5';
var res = text
.GroupAdjacentBy((l, r) => l == r )
.SelectMany(x => x.Count() > 1 && x.First() == charToCombine ? x.Count().ToString() : x)
.ToArray();

Related

How to check for a particular range of keys exists in json string or not

I have the following json string data as input:
string json="{"-1":0,"78":6,"79":6,"80":2,"81":16777215,"82":16777215,"83":1,"84":0,"85":0,"86":"2023/05/07","87":0,"88":0,"89":1,"90":1,"124":1,"16":5,"17":null,"18":null,"19":0,"20":2,"21":2000,"22":0,"23":0,"24":0,"25":0,"26":0,"109":0,"110":0,"29":0,"30":0,"31":0,"32":2000,"33":13710,"34":15710,"135":null}"
I want to check if any of the key above has other than from 1 to 150 so that it can return false. How can I achieve this in c#. **Condition: I don't want to use Json Deserializer here. I have tried with jSon.Contains("\"-1\":")
This works only for one key i,e: -1 . Instead what I want is that from 1 to 150
Ideally you wouldn't try to parse this your self, however since you can't use a parser, I guess you could do this using Regex
Given
var input = "{\"-1\":0,\"78\":6,\"79\":6,\"80\":2,\"81\":16777215,\"82\":16777215,\"83\":1,\"84\":0,\"85\":0,\"86\":\"2023/05/07\",\"87\":0,\"88\":0,\"89\":1,\"90\":1,\"124\":1,\"16\":5,\"17\":null,\"18\":null,\"19\":0,\"20\":2,\"21\":2000,\"22\":0,\"23\":0,\"24\":0,\"25\":0,\"26\":0,\"109\":0,\"110\":0,\"29\":0,\"30\":0,\"31\":0,\"32\":2000,\"33\":13710,\"34\":15710,\"135\":null}";
Option 1
var isOutOfRange = Regex.Matches(input, #"""-?\d+""")
.Cast<Match>()
.Select(x => int.Parse(x.Value.Trim('"')))
.Any(x => x < 1 || x > 150);
Explanation
Option 2
With negative and positive lookbehind (which removes the Quotes)
var isOutOfRange = Regex.Matches(input, #"(?<="")-?\d+(?<!"")")
.Cast<Match>()
.Select(x => int.Parse(x.Value))
.Any(x => x < 1 || x > 150);
Explanation
Update
From Comments Jimi suggested that it might be prudent to check for the colon in the case one of the values are encased in quotes
In that case you could probably modify the patters to the following
"-?\d+"(?=:)
(?<=")-?\d+(?<!":)

Check if a string contains a certain set of characters? (duplicate characters might be needed)

I know there are plenty of ways to check if a string contains certain characters, but I'm trying to figure out a way of excluding duplicate letters.
So for instance, we have the string "doaurid" (random letters entered by the user)
And they type the word "dad" to see if it's valid
I can't figure out a simple solution to check if that string has 2 D's and one A.
the only way I've thought of is to use nested for loops and go through every single element in a char array and convert used letters to 1 or something
You can use:
var userInput = "doaurid";
var toCheck = "dad";
var check = toCheck.GroupBy(c=> c).ToDictionary(g => g.Key, g => g.Count());
var input = userInput.GroupBy(c=> c).ToDictionary(g => g.Key, g => g.Count());
bool validMatch = check.All(g => input.ContainsKey(g.Key) && input[g.Key] == g.Value);
This will only be valid if the userInput string contains all of the letters in toCheck, and the exact same number of letters.
If the input string can allow more duplicated letters (ie: if "dddoauriddd" should match), the check could be done via:
bool validMatch = check.All(g => input.ContainsKey(g.Key) && input[g.Key] >= g.Value);
Reed Copsey's answer is correct.Anyway here is another alternative with LINQ:
var userInput = "doaurid";
var searchWord = "dad";
var control = userInput.Where(searchWord.Contains).Count() == searchWord.Length;
One possibility for this is using regular expressions. For instance, the following code will detect whether or not the supplied string contains any duplicate letters.
var expression = new Regex(#"(?<letter>\w).*\k<letter>");
if (expression.IsMatch(userInput)) {
Console.WriteLine("Found a duplicate letter: {0}", expression.Match(userInput).Groups["letter"].Value);
}
This expression works by first matching any word character, and then storing that result in the "letter" group. It then skips over 0 or more other intervening letters and finally matches a "backreference" to whatever it captured in the "letter" group. If a string doesn't contain any duplicate letters, this regular expression will not match - so if it matches, you know it contains duplicates, and you know at least one of those duplicates by examining the value it captured in the letter group.
In this case, it would be case sensitive. If you wanted it to be case insensitive, you could pass the RegexOptions.IgnoreCase argument to the constructor of your regular expression.
Another apporach, using ToLookup:
var charCount = "dad".ToLookup(chr => chr);
bool allSame = charCount.All(g => g.Count() == "doaurid".Count(c => c == g.Key));
I could think of a simpler solution like this
var userInput = "dddoauriddd"; //"doaurid";
var toCheck = "dad";
var toCheckR = "";
foreach(var c in toCheck)
{
toCheckR += ".*";
}
Console.WriteLine(Regex.IsMatch(userInput, toCheckR));

Is there a better way to create acronym from upper letters in C#?

What is the best way to create acronym from upper letters in C#?
Example:
Alfa_BetaGameDelta_Epsilon
Expected result:
ABGDE
My solution works, but it's not nice
var classNameAbbreviationRegex = new Regex("[A-Z]+", RegexOptions.Compiled);
var matches = classNameAbbreviationRegex.Matches(enumTypeName);
var letters = new string[matches.Count];
for (var i = 0; i < matches.Count; i++)
{
letters[i] = matches[i].Value;
}
var abbreviation = string.Join(string.Empty, letters);
string.Join("", s.Where(char.IsUpper));
string.Join("", s.Where(x => char.IsUpper(x))
string test = "Alfa_BetaGameDelta_Epsilon";
string result = string.Concat(test.Where(char.IsUpper));
You can use the Where method to filter out the upper case characters, and the Char.IsUpper method can be used as a delegate directly without a lambda expression. You can create the resulting string from an array of characters:
string abbreviation = new String(enumTypeName.Where(Char.IsUpper).ToArray());
By using MORE regexes :-)
var ac = string.Join(string.Empty,
Regex.Match("Alfa_BetaGameDelta_Epsilon",
"(?:([A-Z]+)(?:[^A-Z]*))*")
.Groups[1]
.Captures
.Cast<Capture>()
.Select(p => p.Value));
More regexes are always the solution, expecially with LINQ! :-)
The regex puts all the [A-Z] in capture group 1 (because all the other () are non-capturing group (?:)) and "skips" all the non [A-Z] ([^A-Z]) by putting them in a non-capturing group. This is done 0-infinite times by the last *. Then a little LINQ to select the value of each capture .Select(p => p.Value) and the string.Join to join them.
Note that this isn't Unicode friendly... ÀÈÌÒÙ will be ignored. A better regex would use #"(?:(\p{Lu}+)(?:[^\p{Lu}]*))*" where \p{Lu} is the Unicode category UppercaseLetter.
(yes, this is useless... The other methods that use LINQ + IsUpper are better :-) but the whole example was built just to show the problems of Regexes with Unicode)
MUCH EASIER:
var ac = Regex.Replace("Alfa_BetaGameDelta_Epsilon", #"[^\p{Lu}]", string.Empty);
simply remove all the non-uppercase letters :-)
var str = "Alfa_BetaGammaDelta_Epsilon";
var abbreviation = string.Join(string.Empty, str.Where(c => c.IsUpper()));

LINQ to count occurrences

I have the following query which works great:
string[] Words = {"search","query","example"};
... Snip ...
var Results = (
from a in q
from w in Words
where
(
a.Title.ToLower().Contains(w)
|| a.Body.ToLower().Contains(w)
)
select new
{
a,
Count = 0
}).OrderByDescending(x=> x.Count)
.Distinct()
.Take(Settings.ArticlesPerPage);
What I need it to do, is return Count which is the total occurrences of the words. I'm going to weight it in favour of the title as well, example:
Count = (OccuranceInTitle * 5) + (OccurancesInBody)
I'm assuming I need to use the Linq.Count but I'm not sure how to apply it in this instance.
This is what I came up with:
var query =
from a in q
from w in Words
let title = a.Title.ToLower()
let body = a.Body.ToLower()
let replTitle = Regex.Replace(title, string.Format("\\b{0}\\b", w), string.Empty)
let replBody = Regex.Replace(body, string.Format("\\b{0}\\b", w), string.Empty)
let titleOccurences = (title.Length - replTitle.Length) / w.Length
let bodyOccurences = (body.Length - replBody.Length) / w.Length
let score = titleOccurences * 5 + bodyOccurences
where score > 0
select new { Article = a, Score = score };
var results = query.GroupBy(r => r.Article)
.OrderByDescending(g => g.Sum(r => r.Score))
.Take(Settings.ArticlesPerPage);
Counting occurrences is done with the (surprisingly) quick and dirty method of replacing occurrences with string.Empty and calculating based on the resulting string length. After the scores for each article and each word are calculated, I 'm grouping for each article, ordering by the sum of scores for all the words and taking a chunk out of the results.
I didn't fire up the compiler, so please excuse any obvious mistakes.
Update: This version uses regexes as in
Regex.Replace(title, string.Format("\\b{0}\\b", w), string.Empty)
instead of the original version's
title.Replace(w, string.Empty)
so that it now matches only whole words (the string.Replace version would also match word fragments).

Using Regular Expressions to extract groups of numbers from a string

I need to convert a string like,
"[1,2,3,4][5,6,7,8]"
into groups of integers, adjusted to be zero based rather than one based:
{0,1,2,3} {4,5,6,7}
The following rules also apply:
The string must contain at least 1 group of numbers with enclosing square brackets.
Each group must contain at least 2 numbers.
Every number must be unique (not something I'm attempting to achieve with the regex).
0 is not valid, but 10, 100 etc are.
Since I'm not that experienced with regular expressions, I'm currently using two;
#"^(?:\[(?:[1-9]+[\d]*,)+(?:[1-9]+[\d]*){1}\])+$";
and
#"\[(?:[1-9]+[\d]*,)+(?:[1-9]+[\d]*){1}\]";
I'm using the first one to check the input and the second to get all matches of a set of numbers inside square brackets.
I'm then using .Net string manipulation to trim off the square brackets and extract the numbers, parsing them and subtracting 1 to get the result I need.
I was wondering if I could get at the numbers better by using captures, but not sure how they work.
Final Solution:
In the end I used the following regular expression to validate the input string
#"^(?<set>\[(?:[1-9]\d{0,7}(?:]|,(?=\d))){2,})+$"
agent-j's pattern is fine for capturing the information needed but also matches a string like "[1,2,3,4][5]" and would require me to do some additional filtering of the results.
I access the captures via the named group 'set' and use a second simple regex to extract the numbers.
The '[1-9]\d{0,7}' simplifies parsing ints by limiting numbers to 99,999,999 and avoiding overflow exceptions.
MatchCollection matches = new Regex(#"^(?<set>\[(?:[1-9]\d{0,7}(?:]|,(?=\d))){2,})+$").Matches(inputText);
if (matches.Count != 1)return;
CaptureCollection captures = matches[0].Groups["set"].Captures;
var resultJArray = new int[captures.Count][];
var numbersRegex = new Regex(#"\d+");
for (int captureIndex = 0; captureIndex < captures.Count; captureIndex++)
{
string capture = captures[captureIndex].Value;
MatchCollection numberMatches = numbersRegex.Matches(capture);
resultJArray [captureIndex] = new int[numberMatches.Count];
for (int numberMatchIndex = 0; numberMatchIndex < numberMatches.Count; numberMatchIndex++)
{
string number = numberMatches[numberMatchIndex].Value;
int numberAdjustedToZeroBase = Int32.Parse(number) - 1;
resultJArray [captureIndex][numberMatchIndex] = numberAdjustedToZeroBase;
}
}
string input = "[1,2,3,4][5,6,7,8][534,63433,73434,8343434]";
string pattern = #"\G(?:\[(?:(\d+)(?:,|(?=\]))){2,}\])";//\])+$";
MatchCollection matches = Regex.Matches (input, pattern);
To start out, any (regex) with plain parenthasis is a capturing group. This means that the regex engine will capture (store positions matched by that group). To avoid this (when you don't need it, use (?:regex). I did that above.
Index 0 is special and it means the whole of the parent. I.E. match.Groups[0].Value is always the same as match.Value and match.Groups[0].Captures[0].Value. So, you can consider the Groups and Capture collections to start at index 1.
As you can see below, each match contains a bracketed digit group. You'll want to use captures 1-n from Group 1 of each match.
foreach (Match match in matches)
{
// [1,2]
// use captures 1-n from the first group.
for (int i = 1; i < match.Group[1].Captures.Count; i++)
{
int number = int.Parse(match.Group[1].Captures[i]);
if (number == 0)
throw new Exception ("Cannot be 0.");
}
}
Match[0] => [1,2,3,4]
Group[0] => [1,2,3,4]
Capture[0] => [1,2,3,4]
Group[1] => 4
Capture[0] => 1
Capture[1] => 2
Capture[2] => 3
Capture[3] => 4
Match[1] => [5,6,7,8]
Group[0] => [5,6,7,8]
Capture[0] => [5,6,7,8]
Group[1] => 8
Capture[0] => 5
Capture[1] => 6
Capture[2] => 7
Capture[3] => 8
Match[2] => [534,63433,73434,8343434]
Group[0] => [534,63433,73434,8343434]
Capture[0] => [534,63433,73434,8343434]
Group[1] => 8343434
Capture[0] => 534
Capture[1] => 63433
Capture[2] => 73434
Capture[3] => 8343434
The \G causes the match to begin at the start of the last match (so you won't match [1,2] [3,4]). The {2,} satisfies your requirement that there be at least 2 numbers per match.
The expression will match even if there is a 0. I suggest that you put that validation in with the other non-regex stuff. It will keep the regex simpler.
The following regex will validate and also spit out match groups of the bracketed [] group and also the inside that, each number
(?:([1-9][0-9]*)\,?){2,}
[1][5] - fail
[1] - fail
[] - fail
[a,b,c][5] - fail
[1,2,3,4] - pass
[1,2,3,4,5,6,7,8][5,6,7,8] - pass
[1,2,3,4][5,6,7,8][534,63433,73434,8343434] - pass
What about \d+ and a global flag?

Categories