Elegant way to count alphanumeric chars in a string?

Elegant way to count alphanumeric chars in a string? - c#

I am looking for an elegant way, preferably a short linq expression, to count how many alphanumeric chars a given string contains.
The 'boring' way I do it now is this:
int num = 0;
for (int i = 0; i < password.Length; i++)
{
if (!char.IsLetterOrDigit(password, i))
{
num++;
}
}
if (num < MinRequiredNonAlphanumericCharacters)
return false;
This is rather short already, but I am sure with some linq magic this can be done in an even shorter, equally understandable expression, right?

Here's the quick and dirty LINQ way to get the letter & digit count:
password.Count(char.IsLetterOrDigit)
This is more of a direct copy of what you're doing:
password.Count(c => !char.IsLetterOrDigit(c))

int num = password.Where((t, i) => !char.IsLetterOrDigit(password, i)).Count();
if (num < MinRequiredNonAlphanumericCharacters)
return false;

You could do it with one line of Regex. Whether that's understandable is a moot point.
num = new Regex(#"\w", RegexOptions.IgnoreCase).Matches(input).Count

You can use Take() to ensure that you don't inspect more letters than necessary:
int minCount = password.Where(x => !char.IsLetterOrDigit(x)).Take(MinRequired).Count();
if (minCount < MinRequired) { // No good }
The idea is that we only keep checking until you've hit the minimum number. At that point, we can stop, because we know we have an admissible password. Take() takes as many as it can, and no more, so if there aren't enough, it will just return a number less than you've requested.

Related

C# Type of String Index

I need to access a very large number in the index of the string which int and long can't handle. I had to use ulong but the problem is that the indexer can only handle the type int.
This is my code and I have marked the line where the error is located. Any ideas how to solve this?
string s = Console.ReadLine();
long n = Convert.ToInt64(Console.ReadLine());
var cont = s.Count(x => x == 'a');
Console.WriteLine(cont);
Console.ReadKey();
The main idea of the code is to identify how many 'a's there are in the string. What are some other ways I can do this?
EDIT:
i didn't know that is the string index Capicity cant exceed the int type. and i fixed my for loop by replacing it with this linq line
var cont = s.Count(x => x == 'a');
now since my string can't exceed certain amount. so how i can repeat my string to append its char for 1,000,000,000,000 times rather than using this code
for (int i = 0; i < 20; i++)
{
s += s;
}
since this code is generating random char numbers in the string and if i raised the 20 may cause to overflow so i need to adjust it to repeat itself to make the string[index] = n // the long i declared above.
so for example if my string input is "aba" and n is 10 so the string will be "abaabaabaa" // total chars 10
PS: I Edited the original code

I assume you got a programming assignment or online coding challenge, where the requirement was "Count all instances of the letter 'a' in this > 2 GB file". You solution is to read the file in memory at once, and loop over it with a variable type that allows values over 2GB.
This causes an XY problem. You cannot have an array that large in memory in the first place, so you're not going to reach the point where you need a uint, long or ulong to index into it.
Instead, use a StreamReader to read the file in chunks, as explained in for example Reading large file in chunks c#.

You can repeat your string using an infinite sequence. I haven't added any check for valid arguments, etc.
static void Main(string[] args)
{
long count = countCharacters("aba", 'a', 10);
Console.WriteLine("Count is {0}", count);
Console.WriteLine("Press ENTER to exit...");
Console.ReadLine();
}
private static long countCharacters(string baseString, char c, long limit)
{
long result = 0;
if (baseString.Length == 1)
{
result = baseString[0] == c ? limit : 0;
}
else
{
long n = 0;
foreach (var ch in getInfiniteSequence(baseString))
{
if (n >= limit)
break;
if (ch == c)
{
result++;
}
n++;
}
}
return result;
}
//This method iterates through a base string infinitely
private static IEnumerable<char> getInfiniteSequence(string baseString)
{
int stringIndex = 0;
while (true)
{
yield return baseString[stringIndex++ % baseString.Length];
}
}
For the given inputs, the result is 7

I highly recommend you rethink the way you are doing this, but a quick fix would be to use a foreach loop instead:
foreach(char c in s)
{
if (c == 'a')
cont++;
}
Alternative using Linq:
cont = s.Count(c => c == 'a');
I'm not sure about what n is supposed to do. According to your code it limits the string length but your question never mentions why or to what end.

i need to access a very large number in the index of the string which
int, long can't handle
this statement is not true
c# string's max length is int.Max since string.Length is an integer and it is limited by that. You should be able to do
for (int i = 0; i <= n; i++)

The maximum length of a string cannot exceed the size of an int so there really is no point in using ulong or long to index into the string.
Simply put, you're trying to solve the wrong problem.
If we disregard the fact that the program is likely to cause an out of memory exception when building such a long string, you can simply fix your code by switching to an int instead of a ulong:
for (int i = 0; i <= n; i++)
Having said that you can also use LINQ to do this:
int cont = s.Take(n + 1).Count(c => c == 'a');
Now, in the first sentence of your question you state this:
I need to access a very large number in the index of the string which int and long can't handle.
This is wholly unnecessary because any legal index of a string will fit inside an int.

If you need to do this on some input that's longer than the maximum length of a string in .NET, you'll need to change your approach; use a Stream instead trying to read all input into a string.
char seeking = 'a';
ulong count = 0;
char[] buffer = new char[4096];
using (var reader = new StreamReader(inStream))
{
int length;
while ((length = reader.Read(buffer, 0, buffer.Length)) > 0)
{
count += (ulong)buffer.Count(c => c == seeking);
}
}

C# largest prime factor with modulo?

I was wondering if it is possible to find the largest prime factor of a number by using modulos in C#. In other words, if i % x == 0 then we could break a for loop or something like that, where x is equal to all natural numbers below our i value.
How would I specify the all natural numbers below our i value as equal to our x variable? It becomes a little tedious to write out conditionals for every single integer if you know what I'm saying.
By the way, I'm sure there is a far easier way to do this in C#, so please let me know about it if you have an idea, but I'd also like to try and solve it this way, just to see if I can do it with my beginner knowledge.
Here is my current code if you want to see what I have so far:
static void Main()
{
int largestPrimeFactor = 0;
for (long i = 98739853; i <= 98739853; i--)
{
if (true)
{
largestPrimeFactor += (int) i;
break;
}
}
Console.WriteLine(largestPrimeFactor);
Console.ReadLine();
}

If I were to do this using loop and modulos I would do:
long number = 98739853;
long biggestdiv = number;
while(number%2==0) //get rid of even numbers
number/=2;
long divisor = 3;
if(number!=1)
while(divisor!=number)
{
while(number%divisor==0)
{
number/=divisor;
biggestdiv = divisor;
}
divisor+=2;
}
In the end, biggestdiv would be the largest prime factor.
Note: This code is written directly in browser. I didn't try to compile or run it. This is only for showing my concept. There might be algorithm mistakes. It they are, let me know. I'm aware of the fact that it is not optimized at all (I think Sieve is the best for this).
EDIT:
fixed: previous code would return 1 when number were prime.
fixed: previous code would end in loop leading to overflow of divisor where number were power of 2

Ooh, this sounds like a fun use for iterator blocks. Don't turn this in to your professor, though:
private static List<int> primes = new List<int>() {2};
public static IEnumerable<int> Primes()
{
int p;
foreach(int i in primes) {p = i; yield return p;}
while (p < int.MaxValue)
{
p++;
if (!primes.Any(i => p % i ==0))
{
primes.Add(p);
yield return p;
}
}
}
public int LargestPrimeFactor(int n)
{
return Primes.TakeWhile(p => p <= Math.Sqrt(n)).Where(p => n % p == 0).Last();
}

I'm not sure quite what your question is: perhaps you need a loop over the numbers? However there are two clear problems with your code:
Your for loop has the same stop and end value. Ie it will run once and once only
You have a break before the largestPrimeFactor sum. This sum will NEVER execute, because break will stop the for loop ( and hence execution of that block). The compiler should be giving a warning that this sum is unreachable.

Testing for repeated characters in a string

I'm doing some work with strings, and I have a scenario where I need to determine if a string (usually a small one < 10 characters) contains repeated characters.
`ABCDE` // does not contain repeats
`AABCD` // does contain repeats, ie A is repeated
I can loop through the string.ToCharArray() and test each character against every other character in the char[], but I feel like I am missing something obvious.... maybe I just need coffee. Can anyone help?
EDIT:
The string will be sorted, so order is not important so ABCDA => AABCD
The frequency of repeats is also important, so I need to know if the repeat is pair or triplet etc.

If the string is sorted, you could just remember each character in turn and check to make sure the next character is never identical to the last character.
Other than that, for strings under ten characters, just testing each character against all the rest is probably as fast or faster than most other things. A bit vector, as suggested by another commenter, may be faster (helps if you have a small set of legal characters.)
Bonus: here's a slick LINQ solution to implement Jon's functionality:
int longestRun =
s.Select((c, i) => s.Substring(i).TakeWhile(x => x == c).Count()).Max();
So, OK, it's not very fast! You got a problem with that?!
:-)

If the string is short, then just looping and testing may well be the simplest and most efficient way. I mean you could create a hash set (in whatever platform you're using) and iterate through the characters, failing if the character is already in the set and adding it to the set otherwise - but that's only likely to provide any benefit when the strings are longer.
EDIT: Now that we know it's sorted, mquander's answer is the best one IMO. Here's an implementation:
public static bool IsSortedNoRepeats(string text)
{
if (text.Length == 0)
{
return true;
}
char current = text[0];
for (int i=1; i < text.Length; i++)
{
char next = text[i];
if (next <= current)
{
return false;
}
current = next;
}
return true;
}
A shorter alternative if you don't mind repeating the indexer use:
public static bool IsSortedNoRepeats(string text)
{
for (int i=1; i < text.Length; i++)
{
if (text[i] <= text[i-1])
{
return false;
}
}
return true;
}
EDIT: Okay, with the "frequency" side, I'll turn the problem round a bit. I'm still going to assume that the string is sorted, so what we want to know is the length of the longest run. When there are no repeats, the longest run length will be 0 (for an empty string) or 1 (for a non-empty string). Otherwise, it'll be 2 or more.
First a string-specific version:
public static int LongestRun(string text)
{
if (text.Length == 0)
{
return 0;
}
char current = text[0];
int currentRun = 1;
int bestRun = 0;
for (int i=1; i < text.Length; i++)
{
if (current != text[i])
{
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = text[i];
}
currentRun++;
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Now we can also do this as a general extension method on IEnumerable<T>:
public static int LongestRun(this IEnumerable<T> source)
{
bool first = true;
T current = default(T);
int currentRun = 0;
int bestRun = 0;
foreach (T element in source)
{
if (first || !EqualityComparer<T>.Default(element, current))
{
first = false;
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = element;
}
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Then you can call "AABCD".LongestRun() for example.

This will tell you very quickly if a string contains duplicates:
bool containsDups = "ABCDEA".Length != s.Distinct().Count();
It just checks the number of distinct characters against the original length. If they're different, you've got duplicates...
Edit: I guess this doesn't take care of the frequency of dups you noted in your edit though... but some other suggestions here already take care of that, so I won't post the code as I note a number of them already give you a reasonably elegant solution. I particularly like Joe's implementation using LINQ extensions.

Since you're using 3.5, you could do this in one LINQ query:
var results = stringInput
.ToCharArray() // not actually needed, I've left it here to show what's actually happening
.GroupBy(c=>c)
.Where(g=>g.Count()>1)
.Select(g=>new {Letter=g.First(),Count=g.Count()})
;
For each character that appears more than once in the input, this will give you the character and the count of occurances.

I think the easiest way to achieve that is to use this simple regex
bool foundMatch = false;
foundMatch = Regex.IsMatch(yourString, #"(\w)\1");
If you need more information about the match (start, length etc)
Match match = null;
string testString = "ABCDE AABCD";
match = Regex.Match(testString, #"(\w)\1+?");
if (match.Success)
{
string matchText = match.Value; // AA
int matchIndnex = match.Index; // 6
int matchLength = match.Length; // 2
}

How about something like:
string strString = "AA BRA KA DABRA";
var grp = from c in strString.ToCharArray()
group c by c into m
select new { Key = m.Key, Count = m.Count() };
foreach (var item in grp)
{
Console.WriteLine(
string.Format("Character:{0} Appears {1} times",
item.Key.ToString(), item.Count));
}

Update Now, you'd need an array of counters to maintain a count.
Keep a bit array, with one bit representing a unique character. Turn the bit on when you encounter a character, and run over the string once. A mapping of the bit array index and the character set is upto you to decide. Break if you see that a particular bit is on already.

/(.).*\1/
(or whatever the equivalent is in your regex library's syntax)
Not the most efficient, since it will probably backtrack to every character in the string and then scan forward again. And I don't usually advocate regular expressions. But if you want brevity...

I started looking for some info on the net and I got to the following solution.
string input = "aaaaabbcbbbcccddefgg";
char[] chars = input.ToCharArray();
Dictionary<char, int> dictionary = new Dictionary<char,int>();
foreach (char c in chars)
{
if (!dictionary.ContainsKey(c))
{
dictionary[c] = 1; //
}
else
{
dictionary[c]++;
}
}
foreach (KeyValuePair<char, int> combo in dictionary)
{
if (combo.Value > 1) //If the vale of the key is greater than 1 it means the letter is repeated
{
Console.WriteLine("Letter " + combo.Key + " " + "is repeated " + combo.Value.ToString() + " times");
}
}
I hope it helps, I had a job interview in which the interviewer asked me to solve this and I understand it is a common question.

When there is no order to work on you could use a dictionary to keep the counts:
String input = "AABCD";
var result = new Dictionary<Char, int>(26);
var chars = input.ToCharArray();
foreach (var c in chars)
{
if (!result.ContainsKey(c))
{
result[c] = 0; // initialize the counter in the result
}
result[c]++;
}
foreach (var charCombo in result)
{
Console.WriteLine("{0}: {1}",charCombo.Key, charCombo.Value);
}

The hash solution Jon was describing is probably the best. You could use a HybridDictionary since that works well with small and large data sets. Where the letter is the key and the value is the frequency. (Update the frequency every time the add fails or the HybridDictionary returns true for .Contains(key))

What is the most efficient (read time) string search method? (C#)

I find that my program is searching through lots of lengthy strings (20,000+) trying to find a particular unique phrase.
What is the most efficent method for doing this in C#?
Below is the current code which works like this:
The search begins at startPos because the target area is somewhat removed from the start
It loops through the string, at each step it checks if the substring from that point starts with the startMatchString, which is an indicator that the start of the target string has been found. (The length of the target string varys).
From here it creates a new substring (chopping off the 11 characters that mark the start of the target string) and searches for the endMatchString
I already know that this is a horribly complex and possibly very inefficent algorithm.
What is a better way to accomplish the same result?
string result = string.Empty;
for (int i = startPos; i <= response.Length - 1; i++)
{
if (response.Substring(i).StartsWith(startMatchString))
{
string result = response.Substring(i).Substring(11);
for (int j = 0; j <= result.Length - 1; j++)
{
if (result.Substring(j).StartsWith(endMatchString))
{
return result.Remove(j)
}
}
}
}
return result;

You can use String.IndexOf, but make sure you use StringComparison.Ordinal or it may be one order of magnitude slower.
private string Search2(int startPos, string startMatchString, string endMatchString, string response) {
int startMarch = response.IndexOf(startMatchString, startPos, StringComparison.Ordinal);
if (startMarch != -1) {
startMarch += startMatchString.Length;
int endMatch = response.IndexOf(endMatchString, startMarch, StringComparison.Ordinal);
if (endMatch != -1) { return response.Substring(startMarch, endMatch - startMarch); }
}
return string.Empty;
}
Searching 1000 times a string at about the 40% of a 183 KB file took about 270 milliseconds. Without StringComparison.Ordinal it took about 2000 milliseconds.
Searching 1 time with your method took over 60 seconds as it creates a new string (O(n)) each iteration, making your method O(n^2).

There are a whole bunch of algorithms,
boyer and moore
Sunday
Knuth-Morris-Pratt
Rabin-Karp
I would recommend to use the simplified Boyer-Moore, called Boyer–Moore–Horspool.
The C-code appears at the wikipedia.
For the java code look at
http://www.fmi.uni-sofia.bg/fmi/logic/vboutchkova/sources/BoyerMoore_java.html
A nice article about these is available under
http://www.ibm.com/developerworks/java/library/j-text-searching.html
If you want to use built-in stuff go for regular expressions.

It depends on what you're trying to find in the string. If you're looking for a specific sequence IndexOf/Contains are fast, but if you're looking for wild card patterns Regex is optimized for this kind of search.

I would try to use a Regular Expression instead of rolling my own string search algorithm. You can precompile the regular expression to make it run faster.

For very long strings you cannot beat the boyer-moore search algorithm. It is more complex than I might try to explain here, but The CodeProject site has a pretty good article on it.

You could use a regex; it’s optimized for this kind of searching and manipulation.
You could also try IndexOf ...
string result = string.Empty;
if (startPos >= response.Length)
return result;
int startingIndex = response.IndexOf(startMatchString, startPos);
int rightOfStartIndex = startingIndex + startMatchString.Length;
if (startingIndex > -1 && rightOfStartIndex < response.Length)
{
int endingIndex = response.IndexOf(endMatchString, rightOfStartIndex);
if (endingIndex > -1)
result = response.Substring(rightOfStartIndex, endingIndex - rightOfStartIndex);
}
return result;

Here's an example using IndexOf (beware: written from the top of my head, didn't test it):
int skip = 11;
int start = response.IndexOf(startMatchString, startPos);
if (start >= 0)
{
int end = response.IndexOf(startMatchString, start + skip);
if (end >= 0)
return response.Substring(start + skip, end - start - skip);
else
return response.Substring(start + skip);
}
return string.Empty;

As said before regex is your friend.
You might want to look at RegularExpressions.Group.
This way you can name part of the matched resultset.
Here is an example

performance issues with finding nth occurence of a character with a regular expression

I have a regex to find the nth occurrence of a character in a string, here's the code:
public static int NthIndexOf(this string target, string value, int n)
{
Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}");
if (m.Success)
{
return m.Groups[2].Captures[n - 1].Index;
}
else
{
return -1;
}
}
Now, I have 1594 entries in this string, with 1593 semicolons. If I write:
tempstring.NthIndexOf(";", 1593)
The answer comes back immediately and correctly. If I give it anything over 1594 it hangs. Does anyone know how to fix this?
Test Case
string holder = "test;test2;test3";
string test = "";
for (int i = 0; i < 600; i++)
{
test += holder;
}
int index = test.NthIndexOf(";", 2000);
This takes a very long time. Change 600 to 6 and it is very fast. Make 2000 to 1700 and it is very fast as well.
Why is my regular expression so slow?

If you're really only looking for character repetitions, and not string repetitions, then you should be able to replace you method with something simple like
public static int NthIndexOf(this string target, char testChar, int n)
{
int count = 0;
for(int i=0; i<target.Length; i++)
{
if(target[i] == testChar)
{
count++;
if(count == n) return i;
}
}
return -1;
}
and use that. It should have far fewer limitations.
As for why your original regex is going slow, here's what I suspect:
For your fast case, it's working because it can find a match on it's first pass through (with each group matching exactly one character)
For the slow case is because it can't find a match (and won't ever find one, because there aren't enough semicolons to satisfy the regex), but it recursively tries every possible way to break up the string (which is a really big operation)

Try to use a more distinct and efficient regular expression:
"^(?:[^" + value + "]*" + value + "){" + (n - 1) + "}([^" + value + "]*)
This will build the following regular expression for tempstring.NthIndexOf(";", 1593):
^(?:[^;]*;){1592}([^;]*)
But this will only work for single characters as separator.
Another approach would be to step through each character and count the occurences of the character you were looking for.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Elegant way to count alphanumeric chars in a string? - c#

Here's the quick and dirty LINQ way to get the letter & digit count: password.Count(char.IsLetterOrDigit) This is more of a direct copy of what you're doing: password.Count(c => !char.IsLetterOrDigit(c))

int num = password.Where((t, i) => !char.IsLetterOrDigit(password, i)).Count(); if (num < MinRequiredNonAlphanumericCharacters) return false;

You could do it with one line of Regex. Whether that's understandable is a moot point. num = new Regex(#"\w", RegexOptions.IgnoreCase).Matches(input).Count

Related

C# Type of String Index

C# largest prime factor with modulo?

Testing for repeated characters in a string

What is the most efficient (read time) string search method? (C#)

performance issues with finding nth occurence of a character with a regular expression

Categories

Resources