Neat solution to a counting within a string

Neat solution to a counting within a string - c#

I am trying to solve the following problem but cannot find an elegant solution. Any ideas?
Thanks.
Input - a variable length string of numbers, e.g.,
string str = "5557476374202110373551116201";
Task - Check (from left to right) that each number (ignoring the repetitions) does not appear in the following 2 indexes. Using eg. above, First number = 5. Ignoring reps we see that last index of 5 in the group is 2. So we check next 2 indexes, i.e. 3 and 4 should not have 5. If it does we count it as error. Goal is to count such errors in the string.
In the above string errors are at indexes, 3,10 and 16.

in addition to the other excellent solutions you can use a simple regexp:
foreach (Match m in Regexp.Matches(str, #"(\d)(?!\1)(?=\d\1)"))
Console.WriteLine("Error: " + m.Index);
returns 3,10,16. this would match adjacent errors using lookahead with a backreference. handles repetitions. .net should support that. if not, you can use a non-backreference version:
(?<=0[^0])0|(?<=1[^1])1|(?<=2[^2])2|(?<=3[^3])3|(?<=4[^4])4|(?<=5[^5])5|(?<=6[^6])6|(?<=7[^7])7|(?<=8[^8])8|(?<=9[^9])9

A simple indexed for loop with a couple of look ahead if checks would work. You can treat a string as a char[] or as an IEnumerable - either way you can use that to loop over all of the characters and perform a lookahead check to see if the following one or two characters is a duplicate.

Sorry, not a C# man, but here's a simple solution in Ruby:
a="5557476374202110373551116201"
0.upto(a.length) do |i|
puts "error at #{i}" if a[i]!=a[i+1] && a[i]==a[i+2]
end
Output:
error at 3
error at 10
error at 16

Here's something I threw together in C# that worked with the example input from the question. I haven't checked it that thoroughly, though...
public static IEnumerable<int> GetErrorIndices(string text) {
if (string.IsNullOrEmpty(text))
yield break;
int i = 0;
while (i < text.Length) {
char c = text[i];
// get the index of the next character that isn't a repetition
int nextIndex = i + 1;
while (nextIndex < text.Length && text[nextIndex] == c)
nextIndex++;
// if we've reached the end of the string, there's no error
if (nextIndex + 1 >= text.Length)
break;
// we actually only care about text[nextIndex + 1],
// NOT text[nextIndex] ... why? because text[nextIndex]
// CAN'T be a repetition (we already skipped to the first
// non-repetition)
if (text[nextIndex + 1] == c)
yield return i;
i = nextIndex;
}
yield break;
}

Related

What is the appropriate way to append string to the end of the current line?

I'm working through a practice problem for one of my classes and I'm having a bit of trouble with part of the prompt.
We need to:
Write code to go through number -20 - 20 (but skip 5-15 inclusively)
If the number is negative, the line should start with an "*"
If the number is divisible by 2, add "#" to the end of the current line
If the number is divisible by 3, add "!" to the front of the line
If the previous line has both "!" and "#", then add "wow" to the end of the current line (Hint: use bool)
With the code I've written so far, I've managed to complete the first two tasks on the list, but I run into trouble starting with the third task. In my code I'm using
if (num%2==0)
{
Console.WriteLine(num+"#");
}
but all it's doing is outputting another number with "#" instead of putting "#" on the current line. How do I make it so "#" is appended to the end of the current line?
Here's my code for reference:
static void Main(string[] args)
{
int num = -20;
while (num <= 20)
{
if (num < 5 || num > 15)
{
if (num < 0)
{
Console.WriteLine("*" + num);
}
if (num%2==0)
{
Console.WriteLine(num+"#");
}
else
{
Console.WriteLine(num);
}
}
num++;
}
}

Since we are only working with the first 3 points in this question (try the other 2 yourself), I will only address those.
The main problem with the code is that it will always write the number more than once if more than one rule applies to it. There's 2 ways to tackle this. Also note I cleaned up the code a bit and I'll explain why later since it's secondary.
Fix
Method 1 : Incremental Writing
This method uses incremental writing to apply rules and then write a new line at the end before going to the next iteration.
// More succinct than a while loop for this particular scenario
for (int num = -20; num <=20; num++)
{
//Skip while avoiding nested if
if (num >= 5 && num <= 15)
continue;
//Since it needs to start with * in this case we prioritize it
if (num < 0)
Console.Write("*");
Console.Write(num);
// Since this would need to be appended at the end if it's true
if (num % 2 == 0)
Console.Write("#");
Console.Write(Environment.NewLine);
}
Method 2: Store the line then print before next iteration
In this case you would build the line that you want to print and then use one Console.WriteLine statement to write it while avoiding duplication. The writing would need to be done before moving to the next iteration.
You can use string concatenation instead of StringBuilder which would generally be more costly but in this case performance doesn't really matter (string are really small and amount of concatenation is minimal). However this would be a typical use case for a StringBuilder.
Also, since we know we know that we will always print out num when we aren't skipping then we can start off with num. But we could also do it like Method 1 where we add it in the middle. I'll illustrate both ways.
StringBuilder constructor with number
// More succinct than a while loop for this particular scenario
for (int num = -20; num <= 20; num++)
{
//Skip while avoiding nested if
if (num >= 5 && num <= 15)
continue;
// Create and update your string in a string builder, apply rules thereafter
// (this constructor usage means we don't need to add the number later)
var line = new StringBuilder(num.ToString());
//Since it needs to start with * in this case we prioritize it
if (num < 0)
line.Insert(0, "*");
// No need to add the number, already present
// Since this would need to be appended at the end if it's true
if (num % 2 == 0)
line.Append("#");
Console.WriteLine(line.ToString());
}
StringBuilder constructor without number
// More succinct than a while loop for this particular scenario
for (int num = -20; num <= 20; num++)
{
//Skip while avoiding nested if
if (num >= 5 && num <= 15)
continue;
// Create and update your string in a string builder, apply rules thereafter
// (this constructor usage means we must add the number later)
var line = new StringBuilder();
//Since it needs to start with * in this case we prioritize it
if (num < 0)
line.Append("*"); // NOTICE: This is now Append instead of Insert since line is empty
// Since we didn't add the number before
line.Append(num);
// Since this would need to be appended at the end if it's true
if (num % 2 == 0)
line.Append("#");
Console.WriteLine(line.ToString());
}
Additional Changes
for loop is better suited for this situation since you have an int with a clear start, end and an incrementor supported by the structure.
Avoid unnecessary nesting of conditions, common newbie mistake. If you have a condition to skip in certain cases, simply check and skip and otherwise the rest of the code will apply. This could otherwise lead to really annoying duplication and condition checks that are unnecessary (most of the time).

use string.concat but first save the string into variable and reach the end and finally do the concatenation

How string quote, length total count and arrayname[int] work?

I'm making a program which reverses words to sdrow.
I understand that for it to work it needs to be written this way.
I'm more interested on WHY it has to be this way.
Here is my code:
Console.WriteLine("Enter a word : ");
string word = Console.ReadLine();
string rev = "";
int length;
for (length = word.Length - 1; length >= 0; length--)
{
rev = rev + word[length];
}
Console.WriteLine("The reversed word is : {0}", rev);
My questions are:
a) why you must use quotes to initialize your string
b) Why must you start your loop at one less than the total length of your string
c) How arrayname[int] works
I'm pretty new to C# and this site, as well. I hope my questions make sense and that I've asked them in the correct and proper way.
Thank you for your time!

This is how I've interpreted your question.
You want to know why you must use quotes to initialize your string
Why must you start your loop at one less than the total length of your string
How arrayname[int] works
I think that the best way to explain this to you is to go through your code, and explain what it does.
Console.WriteLine("Enter a word : ");
The first line of code prints Enter a word : into the console.
string word = Console.ReadLine();
This line "reads" the input from the console, and puts it into a string called word.
string rev = "";
This initiates a string called rev, setting it's value to "", or an empty string. The other way to initiate the string would be this:
string rev;
That would initiate a string called rev to the value of null. This does not work for your program because rev = rev + word[length]; sets rev to itself + word[length]. It throws an error if it is null.
The next line of your code is:
int length;
That sets an int (which in real life we call an integer, basically a number) to the value of null. That is okay, because it gets set later on without referencing itself.
The next line is a for loop:
for (length = word.Length - 1; length >= 0; length--)
This loop sets an internal variable called length to the current value of word.Length -1. The second item tells how long to run the loop. While the value of length is more than, or equal to 0, the loop will continue to run. The third item generally sets the rate of increase or decrease of your variable. In this case, it is length-- that decreases length by one each time the loop runs.
The next relevant line of code is his:
rev = rev + word[length];
This, as I said before sets rev as itself + the string word at the index of length, whatever number that is at the time.
At the first run through the for loop, rev is set to itself (an empty string), plus the word at the index of length - 1. If the word entered was come (for example), the index 0 would be c, the index 1 would be o, 2 would be m, and 3 = e.
The word length is 4, so that minus one is 3 (yay - back to Kindergarten), which is the last letter in the word.
The second time through the loop, length will be 2, so rev will be itself (e) plus index 2, which is m. This repeats until length hits -1, at which point the loop does not run, and you go on to the next line of code.
...Which is:
Console.WriteLine("The reversed word is : {0}", rev);
This prints a line to the console, saying The reversed word is : <insert value of rev here> The {0} is an internal var set by the stuff after the comma, which in this case would be rev.
The final output of the program, if you inserted come, would look something like this:
>Enter a word :
>come
>
>The reversed word is : emoc

a) you must to initialize or instantiate the variable in order to can work with it. In your case is better to use and StringBuilder, the string are inmutable objects, so each assignement means to recreate a new string.
b) The arrays in C# are zero index based so their indexes go from zero to length -1.
c) An string is an characters array.
Any case maybe you must to try the StringBuilder, it is pretty easy to use
I hope this helps

a) you need to initialize a variable if you want to use it in an assignment rev = rev + word[length];
b) you are using a for loop which means that you define a start number length = word.Length - 1, a stop criteria length >= 0 and a variable change length--
So lenght basically descends from 5 to 0 (makes 6 loops). The reason is that Arrays like 'string' a char[] are zerobased indexed.. means that first element is 0 last is array.Length - 1
c) So a string is basically a chain of char's.. with the []-Operator you can access a single index. The Return Type of words[index] is in this case a character.
I hope it helped

a) You need to first instantiate the string rev before you can assign it values in the loop. You could also say "string rev = String.Empty". You are trying to add word to rev and if you don't tell it first that rev is an empty string it doesn't know what it is adding word to.
b) The characters of the string have indexes to show which position they appear in the string. These indexes start at 0. So you're first character will have an index of 0. If the length of your string is 6 then the last character's index will be 5. So length - 1 will give you the value of the last character's index which is 5. A c# for loop uses the following parameters
for (int i = 0; i < 10; i++)
{
Console.WriteLine(i);
}
where "int i = 0;" is the starting index; "i < 10" tells the loop when to stop looping; and "i++" tells it to increment i (the index) after each loop.
So in your code you are saying start at the last character of my string; perform this loop while there are still characters in the string; and decrease the index of the string after each loop so the next loop will look at the previous character in the string.
c) word[length] then in this scenario is saying add the character that has the position with index "length" to the rev string. This will basically reverse the word.

As usual you can do it in linq and avoid the (explicit) loops
using System;
using System.Linq;
namespace ConsoleApplication1
{
class Program
{
public static void Main(string[] args)
{
Console.WriteLine("Enter a word : ");
string word = Console.ReadLine();
Console.WriteLine("The reversed word is : {0}", new string (word.Reverse().ToArray()));
Console.ReadLine();
}
}
}

var reversedWords = string.Join(" ",
str.Split(' ')
.Select(x => new String(x.Reverse().ToArray())));
Take a look here :
Easy way to reverse each word in a sentence

Constantly Incrementing String

So, what I'm trying to do this something like this: (example)
a,b,c,d.. etc. aa,ab,ac.. etc. ba,bb,bc, etc.
So, this can essentially be explained as generally increasing and just printing all possible variations, starting at a. So far, I've been able to do it with one letter, starting out like this:
for (int i = 97; i <= 122; i++)
{
item = (char)i
}
But, I'm unable to eventually add the second letter, third letter, and so forth. Is anyone able to provide input? Thanks.

Since there hasn't been a solution so far that would literally "increment a string", here is one that does:
static string Increment(string s) {
if (s.All(c => c == 'z')) {
return new string('a', s.Length + 1);
}
var res = s.ToCharArray();
var pos = res.Length - 1;
do {
if (res[pos] != 'z') {
res[pos]++;
break;
}
res[pos--] = 'a';
} while (true);
return new string(res);
}
The idea is simple: pretend that letters are your digits, and do an increment the way they teach in an elementary school. Start from the rightmost "digit", and increment it. If you hit a nine (which is 'z' in our system), move on to the prior digit; otherwise, you are done incrementing.
The obvious special case is when the "number" is composed entirely of nines. This is when your "counter" needs to roll to the next size up, and add a "digit". This special condition is checked at the beginning of the method: if the string is composed of N letters 'z', a string of N+1 letter 'a's is returned.
Here is a link to a quick demonstration of this code on ideone.

Each iteration of Your for loop is completely
overwriting what is in "item" - the for loop is just assigning one character "i" at a time
If item is a String, Use something like this:
item = "";
for (int i = 97; i <= 122; i++)
{
item += (char)i;
}

something to the affect of
public string IncrementString(string value)
{
if (string.IsNullOrEmpty(value)) return "a";
var chars = value.ToArray();
var last = chars.Last();
if(char.ToByte() == 122)
return value + "a";
return value.SubString(0, value.Length) + (char)(char.ToByte()+1);
}
you'll probably need to convert the char to a byte. That can be encapsulated in an extension method like static int ToByte(this char);
StringBuilder is a better choice when building large amounts of strings. so you may want to consider using that instead of string concatenation.

Another way to look at this is that you want to count in base 26. The computer is very good at counting and since it always has to convert from base 2 (binary), which is the way it stores values, to base 10 (decimal--the number system you and I generally think in), converting to different number bases is also very easy.
There's a general base converter here https://stackoverflow.com/a/3265796/351385 which converts an array of bytes to an arbitrary base. Once you have a good understanding of number bases and can understand that code, it's a simple matter to create a base 26 counter that counts in binary, but converts to base 26 for display.

Testing for repeated characters in a string

I'm doing some work with strings, and I have a scenario where I need to determine if a string (usually a small one < 10 characters) contains repeated characters.
`ABCDE` // does not contain repeats
`AABCD` // does contain repeats, ie A is repeated
I can loop through the string.ToCharArray() and test each character against every other character in the char[], but I feel like I am missing something obvious.... maybe I just need coffee. Can anyone help?
EDIT:
The string will be sorted, so order is not important so ABCDA => AABCD
The frequency of repeats is also important, so I need to know if the repeat is pair or triplet etc.

If the string is sorted, you could just remember each character in turn and check to make sure the next character is never identical to the last character.
Other than that, for strings under ten characters, just testing each character against all the rest is probably as fast or faster than most other things. A bit vector, as suggested by another commenter, may be faster (helps if you have a small set of legal characters.)
Bonus: here's a slick LINQ solution to implement Jon's functionality:
int longestRun =
s.Select((c, i) => s.Substring(i).TakeWhile(x => x == c).Count()).Max();
So, OK, it's not very fast! You got a problem with that?!
:-)

If the string is short, then just looping and testing may well be the simplest and most efficient way. I mean you could create a hash set (in whatever platform you're using) and iterate through the characters, failing if the character is already in the set and adding it to the set otherwise - but that's only likely to provide any benefit when the strings are longer.
EDIT: Now that we know it's sorted, mquander's answer is the best one IMO. Here's an implementation:
public static bool IsSortedNoRepeats(string text)
{
if (text.Length == 0)
{
return true;
}
char current = text[0];
for (int i=1; i < text.Length; i++)
{
char next = text[i];
if (next <= current)
{
return false;
}
current = next;
}
return true;
}
A shorter alternative if you don't mind repeating the indexer use:
public static bool IsSortedNoRepeats(string text)
{
for (int i=1; i < text.Length; i++)
{
if (text[i] <= text[i-1])
{
return false;
}
}
return true;
}
EDIT: Okay, with the "frequency" side, I'll turn the problem round a bit. I'm still going to assume that the string is sorted, so what we want to know is the length of the longest run. When there are no repeats, the longest run length will be 0 (for an empty string) or 1 (for a non-empty string). Otherwise, it'll be 2 or more.
First a string-specific version:
public static int LongestRun(string text)
{
if (text.Length == 0)
{
return 0;
}
char current = text[0];
int currentRun = 1;
int bestRun = 0;
for (int i=1; i < text.Length; i++)
{
if (current != text[i])
{
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = text[i];
}
currentRun++;
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Now we can also do this as a general extension method on IEnumerable<T>:
public static int LongestRun(this IEnumerable<T> source)
{
bool first = true;
T current = default(T);
int currentRun = 0;
int bestRun = 0;
foreach (T element in source)
{
if (first || !EqualityComparer<T>.Default(element, current))
{
first = false;
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = element;
}
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Then you can call "AABCD".LongestRun() for example.

This will tell you very quickly if a string contains duplicates:
bool containsDups = "ABCDEA".Length != s.Distinct().Count();
It just checks the number of distinct characters against the original length. If they're different, you've got duplicates...
Edit: I guess this doesn't take care of the frequency of dups you noted in your edit though... but some other suggestions here already take care of that, so I won't post the code as I note a number of them already give you a reasonably elegant solution. I particularly like Joe's implementation using LINQ extensions.

Since you're using 3.5, you could do this in one LINQ query:
var results = stringInput
.ToCharArray() // not actually needed, I've left it here to show what's actually happening
.GroupBy(c=>c)
.Where(g=>g.Count()>1)
.Select(g=>new {Letter=g.First(),Count=g.Count()})
;
For each character that appears more than once in the input, this will give you the character and the count of occurances.

I think the easiest way to achieve that is to use this simple regex
bool foundMatch = false;
foundMatch = Regex.IsMatch(yourString, #"(\w)\1");
If you need more information about the match (start, length etc)
Match match = null;
string testString = "ABCDE AABCD";
match = Regex.Match(testString, #"(\w)\1+?");
if (match.Success)
{
string matchText = match.Value; // AA
int matchIndnex = match.Index; // 6
int matchLength = match.Length; // 2
}

How about something like:
string strString = "AA BRA KA DABRA";
var grp = from c in strString.ToCharArray()
group c by c into m
select new { Key = m.Key, Count = m.Count() };
foreach (var item in grp)
{
Console.WriteLine(
string.Format("Character:{0} Appears {1} times",
item.Key.ToString(), item.Count));
}

Update Now, you'd need an array of counters to maintain a count.
Keep a bit array, with one bit representing a unique character. Turn the bit on when you encounter a character, and run over the string once. A mapping of the bit array index and the character set is upto you to decide. Break if you see that a particular bit is on already.

/(.).*\1/
(or whatever the equivalent is in your regex library's syntax)
Not the most efficient, since it will probably backtrack to every character in the string and then scan forward again. And I don't usually advocate regular expressions. But if you want brevity...

I started looking for some info on the net and I got to the following solution.
string input = "aaaaabbcbbbcccddefgg";
char[] chars = input.ToCharArray();
Dictionary<char, int> dictionary = new Dictionary<char,int>();
foreach (char c in chars)
{
if (!dictionary.ContainsKey(c))
{
dictionary[c] = 1; //
}
else
{
dictionary[c]++;
}
}
foreach (KeyValuePair<char, int> combo in dictionary)
{
if (combo.Value > 1) //If the vale of the key is greater than 1 it means the letter is repeated
{
Console.WriteLine("Letter " + combo.Key + " " + "is repeated " + combo.Value.ToString() + " times");
}
}
I hope it helps, I had a job interview in which the interviewer asked me to solve this and I understand it is a common question.

When there is no order to work on you could use a dictionary to keep the counts:
String input = "AABCD";
var result = new Dictionary<Char, int>(26);
var chars = input.ToCharArray();
foreach (var c in chars)
{
if (!result.ContainsKey(c))
{
result[c] = 0; // initialize the counter in the result
}
result[c]++;
}
foreach (var charCombo in result)
{
Console.WriteLine("{0}: {1}",charCombo.Key, charCombo.Value);
}

The hash solution Jon was describing is probably the best. You could use a HybridDictionary since that works well with small and large data sets. Where the letter is the key and the value is the frequency. (Update the frequency every time the add fails or the HybridDictionary returns true for .Contains(key))

performance issues with finding nth occurence of a character with a regular expression

I have a regex to find the nth occurrence of a character in a string, here's the code:
public static int NthIndexOf(this string target, string value, int n)
{
Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}");
if (m.Success)
{
return m.Groups[2].Captures[n - 1].Index;
}
else
{
return -1;
}
}
Now, I have 1594 entries in this string, with 1593 semicolons. If I write:
tempstring.NthIndexOf(";", 1593)
The answer comes back immediately and correctly. If I give it anything over 1594 it hangs. Does anyone know how to fix this?
Test Case
string holder = "test;test2;test3";
string test = "";
for (int i = 0; i < 600; i++)
{
test += holder;
}
int index = test.NthIndexOf(";", 2000);
This takes a very long time. Change 600 to 6 and it is very fast. Make 2000 to 1700 and it is very fast as well.
Why is my regular expression so slow?

If you're really only looking for character repetitions, and not string repetitions, then you should be able to replace you method with something simple like
public static int NthIndexOf(this string target, char testChar, int n)
{
int count = 0;
for(int i=0; i<target.Length; i++)
{
if(target[i] == testChar)
{
count++;
if(count == n) return i;
}
}
return -1;
}
and use that. It should have far fewer limitations.
As for why your original regex is going slow, here's what I suspect:
For your fast case, it's working because it can find a match on it's first pass through (with each group matching exactly one character)
For the slow case is because it can't find a match (and won't ever find one, because there aren't enough semicolons to satisfy the regex), but it recursively tries every possible way to break up the string (which is a really big operation)

Try to use a more distinct and efficient regular expression:
"^(?:[^" + value + "]*" + value + "){" + (n - 1) + "}([^" + value + "]*)
This will build the following regular expression for tempstring.NthIndexOf(";", 1593):
^(?:[^;]*;){1592}([^;]*)
But this will only work for single characters as separator.
Another approach would be to step through each character and count the occurences of the character you were looking for.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Neat solution to a counting within a string - c#

A simple indexed for loop with a couple of look ahead if checks would work. You can treat a string as a char[] or as an IEnumerable - either way you can use that to loop over all of the characters and perform a lookahead check to see if the following one or two characters is a duplicate.

Sorry, not a C# man, but here's a simple solution in Ruby: a="5557476374202110373551116201" 0.upto(a.length) do |i| puts "error at #{i}" if a[i]!=a[i+1] && a[i]==a[i+2] end Output: error at 3 error at 10 error at 16

Related

What is the appropriate way to append string to the end of the current line?

How string quote, length total count and arrayname[int] work?

Constantly Incrementing String

Testing for repeated characters in a string

performance issues with finding nth occurence of a character with a regular expression

Categories

Resources