C# Type of String Index - c#

I need to access a very large number in the index of the string which int and long can't handle. I had to use ulong but the problem is that the indexer can only handle the type int.
This is my code and I have marked the line where the error is located. Any ideas how to solve this?
string s = Console.ReadLine();
long n = Convert.ToInt64(Console.ReadLine());
var cont = s.Count(x => x == 'a');
Console.WriteLine(cont);
Console.ReadKey();
The main idea of the code is to identify how many 'a's there are in the string. What are some other ways I can do this?
EDIT:
i didn't know that is the string index Capicity cant exceed the int type. and i fixed my for loop by replacing it with this linq line
var cont = s.Count(x => x == 'a');
now since my string can't exceed certain amount. so how i can repeat my string to append its char for 1,000,000,000,000 times rather than using this code
for (int i = 0; i < 20; i++)
{
s += s;
}
since this code is generating random char numbers in the string and if i raised the 20 may cause to overflow so i need to adjust it to repeat itself to make the string[index] = n // the long i declared above.
so for example if my string input is "aba" and n is 10 so the string will be "abaabaabaa" // total chars 10
PS: I Edited the original code

I assume you got a programming assignment or online coding challenge, where the requirement was "Count all instances of the letter 'a' in this > 2 GB file". You solution is to read the file in memory at once, and loop over it with a variable type that allows values over 2GB.
This causes an XY problem. You cannot have an array that large in memory in the first place, so you're not going to reach the point where you need a uint, long or ulong to index into it.
Instead, use a StreamReader to read the file in chunks, as explained in for example Reading large file in chunks c#.

You can repeat your string using an infinite sequence. I haven't added any check for valid arguments, etc.
static void Main(string[] args)
{
long count = countCharacters("aba", 'a', 10);
Console.WriteLine("Count is {0}", count);
Console.WriteLine("Press ENTER to exit...");
Console.ReadLine();
}
private static long countCharacters(string baseString, char c, long limit)
{
long result = 0;
if (baseString.Length == 1)
{
result = baseString[0] == c ? limit : 0;
}
else
{
long n = 0;
foreach (var ch in getInfiniteSequence(baseString))
{
if (n >= limit)
break;
if (ch == c)
{
result++;
}
n++;
}
}
return result;
}
//This method iterates through a base string infinitely
private static IEnumerable<char> getInfiniteSequence(string baseString)
{
int stringIndex = 0;
while (true)
{
yield return baseString[stringIndex++ % baseString.Length];
}
}
For the given inputs, the result is 7

I highly recommend you rethink the way you are doing this, but a quick fix would be to use a foreach loop instead:
foreach(char c in s)
{
if (c == 'a')
cont++;
}
Alternative using Linq:
cont = s.Count(c => c == 'a');
I'm not sure about what n is supposed to do. According to your code it limits the string length but your question never mentions why or to what end.

i need to access a very large number in the index of the string which
int, long can't handle
this statement is not true
c# string's max length is int.Max since string.Length is an integer and it is limited by that. You should be able to do
for (int i = 0; i <= n; i++)

The maximum length of a string cannot exceed the size of an int so there really is no point in using ulong or long to index into the string.
Simply put, you're trying to solve the wrong problem.
If we disregard the fact that the program is likely to cause an out of memory exception when building such a long string, you can simply fix your code by switching to an int instead of a ulong:
for (int i = 0; i <= n; i++)
Having said that you can also use LINQ to do this:
int cont = s.Take(n + 1).Count(c => c == 'a');
Now, in the first sentence of your question you state this:
I need to access a very large number in the index of the string which int and long can't handle.
This is wholly unnecessary because any legal index of a string will fit inside an int.

If you need to do this on some input that's longer than the maximum length of a string in .NET, you'll need to change your approach; use a Stream instead trying to read all input into a string.
char seeking = 'a';
ulong count = 0;
char[] buffer = new char[4096];
using (var reader = new StreamReader(inStream))
{
int length;
while ((length = reader.Read(buffer, 0, buffer.Length)) > 0)
{
count += (ulong)buffer.Count(c => c == seeking);
}
}

Related

Filling a character array with characters from a string

I'm trying to fill an array with characters from a string inputted via console. I've tried the code bellow but it doesnt seem to work. I get Index out Of Range exception in the for loop part, and i didn't understand why it occured. Is the for loop range incorrect? Any insight would be greatly appreciated
Console.WriteLine("Enter a string: ");
var name = Console.ReadLine();
var intoarray = new char[name.Length];
for (var i = 0; i <= intoarray.Length; i++)
{
intoarray[i] = name[i];
}
foreach (var n in intoarray)
Console.WriteLine(intoarray[n]);
using ToCharArray() strings can be converted into character arrays.
Console.WriteLine("Enter a string: ");
var name = Console.ReadLine();
var intoarray= name.ToCharArray();
foreach (var n in intoarray)
Console.WriteLine(n);
if you are using foreach, you should wait for the index to behave as if you were taking the value.
Console.WriteLine(n);
Since arrays start at 0 and you are counting inclusive of length, the last iteration will exceed the bounds.
Just update the loop conditional to be less than length rather than less than or equal to..
I like snn bm's answer, but to answer you question directly, you're exceeding the length of the input by one. It should be:
for (var i = 0; i <= intoarray.Length - 1; i++)
(Since strings are zero-indexed, the last character in the underlying array is always going to be in the position of arrayLength - 1.)
1: the iteration should be for (var i = 0; i < intoarray.Length; i++)
2: the code
foreach (var n in intoarray)
Console.WriteLine(intoarray[n]);
also throws an exception, for "n" is a character in the array while it's used as array index.
3: In addition there's a much easier way to convert string to char array
var intoarray = name.ToCharArray();
Here's the result
Here is your mistake. There are so many options to represent i < intoarray.Length.
for (var i = 0; i < intoarray.Length; i++) // original was i <= intoarray.Length in your code
{
intoarray[i] = name[i];
}
With linq:
// Select all chars
IEnumerable<char> intoarray =
from ch in name
select ch; // can use var instead of IEnumerable<char>
// Execute the query
foreach (char temp in intoarray)
Console.WriteLine(temp);

Counting Lines, is this Unsafe Method valid?

SO I tried to make an unsafe fast method to count lines.
I previously used StringReader, but wanted to see if I could make something faster.
So is this code valid, it seems to work but it looks a bit confusing,
and I am very new to C# pointers so I might be doing something bad.
Original Method:
//Return number of (non Empty) lines
private static int getLineCount(string input)
{
int lines = 0;
string line = null;
//Don't count Empty lines
using (StringReader reader = new StringReader(input))
while ((line = reader.ReadLine()) != null)
if (!string.IsNullOrWhiteSpace(line))
lines++;
return lines;
}
Unsafe Method:
//Return number of (non Empty) lines (fast method using pointers)
private unsafe static int getLineCountUnsafe(string input)
{
int lines = 0;
fixed (char* strptr = input)
{
char* charptr = strptr;
int length = input.Length;
//Don't count Empty lines
for (int i = 0; i < length; i++)
{
char c = *charptr;
//If char is an empty line, look if it's empty
if (c == '\n' || c == '\r')
{
//If char is empty, continue till it's not
while (c == '\n' || c == '\r')
{
if (i >= length)
return lines;
i++;
charptr++;
c = *charptr;
}
//Add a line when line is not just a new line (empty)
lines++;
}
charptr++;
}
return lines;
}
}
Benchmark:
(Looped through 100000, 10 times)
Total Milliseconds used.
Safe(Original) - AVG = 770.10334, MIN = 765.678, MAX = 778.0017 , TOTAL 07.701
Unsafe - AVG = 406.91843, MIN = 405.7931, MAX = 408.5505 , TOTAL 04.069
EDIT:
It seems that the Unsafe version isn't always correct,
if it's one line it won't count it, been trying to solve it without making it count too many;(
Your second implementation seems okay, but don't bother too much with learning unsafe, it is not so widely used in C#, neither pointers. This is getting close to C++. The time difference between the both approaches might come from the avoiding garbage collector to collect any memory inside the method until it is done (because of the fixed keyword).
The reason why one should rarely use unsafe is because C# provides much readability and ease of use within it's already defined methods, like in your case:
//Return number of (non Empty) lines
private static int getLineCount(string input)
{
return Regex.Matches(input, Environment.NewLine).Count;
}
which may be even faster because of the evaluating at once of the entire string.

Optimizing counting characters within a string

I just created a simple method to count occurences of each character within a string, without taking caps into account.
static List<int> charactercount(string input)
{
char[] characters = "abcdefghijklmnopqrstuvwxyz".ToCharArray();
input = input.ToLower();
List<int> counts = new List<int>();
foreach (char c in characters)
{
int count = 0;
foreach (char c2 in input) if (c2 == c)
{
count++;
}
counts.Add(count);
}
return counts;
}
Is there a cleaner way to do this (i.e. without creating a character array to hold every character in the alphabet) that would also take into account numbers, other characters, caps, etc?
Conceptually, I would prefer to return a Dictionary<string,int> of counts. I'll assume that it's ok to know by omission rather than an explicit count of 0 that a character occurs zero times, you can do it via LINQ. #Oded's given you a good start on how to do that. All you would need to do is replace the Select() with ToDictionary( k => k.Key, v => v.Count() ). See my comment on his answer about doing the case insensitive grouping. Note: you should decide if you care about cultural differences in characters or not and adjust the ToLower method accordingly.
You can also do this without LINQ;
public static Dictionary<string,int> CountCharacters(string input)
{
var counts = new Dictionary<char,int>(StringComparer.OrdinalIgnoreCase);
foreach (var c in input)
{
int count = 0;
if (counts.ContainsKey(c))
{
count = counts[c];
}
counts[c] = counts + 1;
}
return counts;
}
Note if you wanted a Dictionary<char,int>, you could easily do that by creating a case invariant character comparer and using that as the IEqualityComparer<T> for a dictionary of the required type. I've used string for simplicity in the example.
Again, adjust the type of the comparer to be consistent with how you want to handle culture.
Using GroupBy and Select:
aString.GroupBy(c => c).Select(g => new { Character = g.Key, Num = g.Count() })
The returned anonymous type list will contain each character and the number of times it appears in the string.
You can then filter it in any way you wish, using the static methods defined on Char.
Your code is kind of slow because you are looping through the range a-z instead of just looping through the input.
If you only need to count letters (like your code suggests), the fastest way to do it would be:
int[] CountCharacters(string text)
{
var counts = new int[26];
for (var i = 0; i < text.Length; i++)
{
var charIndex - text[index] - (int)'a';
counts[charIndex] = counts[charindex] + 1;
}
return counts;
}
Note that you need to add some thing like verify the character is in the range, and convert it to lowercase when needed, or this code might throw exceptions. I'll leave those for you to add. :)
Based on +Ran's answer to avoiding IndexOutOfRangeException:
static readonly int differ = 'a';
int[] CountCharacters(string text) {
text = text.ToLower();
var counts = new int[26];
for (var i = 0; i < text.Length; i++) {
var charIndex = text[i] - differ;
// to counting chars between 'a' and 'z' we have to do this:
if(charIndex >= 0 && charIndex < 26)
counts[charIndex] += 1;
}
return counts;
}
Actually using Dictionary and/or LINQ is not optimized enough as counting chars and working with a low level array.

C# Char permutation with repetition on large set of chars

Hello Im trying to get all possible combinations with repetitions of given char array.
Char array consists of alphabet letters(only lower) and I need to generate strings with length of 30 or more chars.
I tried with method of many for-loops,but when I try to get all combinations of char in char array with length of string more then 5 I get out of Memory Exception.
So I created similar Method that takes only first 200000 strings,then next 2000000 and so on this was proven sucessfull but only with smaller length strings.
This was my method with length of 7 chars:
public static int Progress = 0;
public static ArrayList CreateRngUrl7()
{
ArrayList AllCombos = new ArrayList();
int passed = 0;
int Too = Progress + 200000;
char[] alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".ToLower().ToCharArray();
for (int i = 0; i < alpha.Length; i++)
for (int i1 = 0; i1 < alpha.Length; i1++)
for (int i2 = 0; i2 < alpha.Length; i2++)
for (int i3 = 0; i3 < alpha.Length; i3++)
for (int i4 = 0; i4 < alpha.Length; i4++)
for (int i5 = 0; i5 < alpha.Length; i5++)
for (int i6 = 0; i6 < alpha.Length; i6++)
{
if (passed > (Too - 200000) && passed < Too)
{
string word = new string(new char[] { alpha[i], alpha[i1], alpha[i2], alpha[i3], alpha[i4], alpha[i5],alpha[i6] });
AllCombos.Add(word);
}
passed++;
}
if (Too >= passed)
{
MessageBox.Show("All combinations of RNG7 were returned");
}
Progress = Too;
return AllCombos;
}
I tried adding 30 for-loops with in way described above so i Would get strings with lenghts of 30 but application just hangs.Is there any better way to do this? All answers would be much appreciated. Thank you in advance!
Can someone please just post method how it is done with larger legth strings I just want to see an example? I don't have to store that data,I just need to compare it with something and release it from memory. I used alphabet for example I don't need whole alphabet.Question was not how long it would take or how much combinations would it be!!!!!
You get an OutOfMemoryException because inside the loop you allocate a string and store it in an ArrayList. The strings have to stay in memory until the ArrayList is garbage collected and your loop creates more strings than you will be able to store.
If you simply want to check the string for a condition you should put the check inside the loop:
for ( ... some crazy loop ...) {
var word = ... create word ...
if (!WordPassesTest(word)) {
Console.WriteLine(word + " failed test.");
return false;
}
}
return true;
Then you only need storage for a single word. Of course, if the loop is crazy enough, it will not terminate before the end of the universe as we know it.
If you need to execute many nested but similar loops you can use recursion to simplify the code. Here is an example that is not incredible efficient, but at least it is simple:
Char[] chars = "ABCD".ToCharArray();
IEnumerable<String> GenerateStrings(Int32 length) {
if (length == 0) {
yield return String.Empty;
yield break;
}
var strings = chars.SelectMany(c => GenerateStrings(length - 1), (c, s) => c + s);
foreach (var str in strings)
yield return str;
}
Calling GenerateStrings(3) will generate all strings of length 3 using lazy evaluation (so no additional storage is required for the strings).
Building on top of an IEnumerable generating your strings you can create primites to buffer and process buffers of strings. An easy solution is to using Reactive Extensions for .NET. Here you already have a Buffer primitive:
GenerateStrings(3)
.ToObservable()
.Buffer(10)
.Subscribe(list => ... ship the list to another computer and process it ...);
The lambda in Subscribe will be called with a List<String> with at most 10 strings (the parameter provided in the call to Buffer).
Unless you have an infinte number of computers you will still have to pull the computers from a pool and only recycle them back to the pool when they have finished the computation.
It should be obvious from the comments on this question that you will not be able to process 26^30 strings even if you have multiple computers at your disposal.
I don't have time right now to write some code but essentially if you are running out of RAM use disk. I'm thinking along the lines of one thread running an algorithm to find the combinations and another persisting the results to disk and releasing the RAM.

Testing for repeated characters in a string

I'm doing some work with strings, and I have a scenario where I need to determine if a string (usually a small one < 10 characters) contains repeated characters.
`ABCDE` // does not contain repeats
`AABCD` // does contain repeats, ie A is repeated
I can loop through the string.ToCharArray() and test each character against every other character in the char[], but I feel like I am missing something obvious.... maybe I just need coffee. Can anyone help?
EDIT:
The string will be sorted, so order is not important so ABCDA => AABCD
The frequency of repeats is also important, so I need to know if the repeat is pair or triplet etc.
If the string is sorted, you could just remember each character in turn and check to make sure the next character is never identical to the last character.
Other than that, for strings under ten characters, just testing each character against all the rest is probably as fast or faster than most other things. A bit vector, as suggested by another commenter, may be faster (helps if you have a small set of legal characters.)
Bonus: here's a slick LINQ solution to implement Jon's functionality:
int longestRun =
s.Select((c, i) => s.Substring(i).TakeWhile(x => x == c).Count()).Max();
So, OK, it's not very fast! You got a problem with that?!
:-)
If the string is short, then just looping and testing may well be the simplest and most efficient way. I mean you could create a hash set (in whatever platform you're using) and iterate through the characters, failing if the character is already in the set and adding it to the set otherwise - but that's only likely to provide any benefit when the strings are longer.
EDIT: Now that we know it's sorted, mquander's answer is the best one IMO. Here's an implementation:
public static bool IsSortedNoRepeats(string text)
{
if (text.Length == 0)
{
return true;
}
char current = text[0];
for (int i=1; i < text.Length; i++)
{
char next = text[i];
if (next <= current)
{
return false;
}
current = next;
}
return true;
}
A shorter alternative if you don't mind repeating the indexer use:
public static bool IsSortedNoRepeats(string text)
{
for (int i=1; i < text.Length; i++)
{
if (text[i] <= text[i-1])
{
return false;
}
}
return true;
}
EDIT: Okay, with the "frequency" side, I'll turn the problem round a bit. I'm still going to assume that the string is sorted, so what we want to know is the length of the longest run. When there are no repeats, the longest run length will be 0 (for an empty string) or 1 (for a non-empty string). Otherwise, it'll be 2 or more.
First a string-specific version:
public static int LongestRun(string text)
{
if (text.Length == 0)
{
return 0;
}
char current = text[0];
int currentRun = 1;
int bestRun = 0;
for (int i=1; i < text.Length; i++)
{
if (current != text[i])
{
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = text[i];
}
currentRun++;
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Now we can also do this as a general extension method on IEnumerable<T>:
public static int LongestRun(this IEnumerable<T> source)
{
bool first = true;
T current = default(T);
int currentRun = 0;
int bestRun = 0;
foreach (T element in source)
{
if (first || !EqualityComparer<T>.Default(element, current))
{
first = false;
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = element;
}
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Then you can call "AABCD".LongestRun() for example.
This will tell you very quickly if a string contains duplicates:
bool containsDups = "ABCDEA".Length != s.Distinct().Count();
It just checks the number of distinct characters against the original length. If they're different, you've got duplicates...
Edit: I guess this doesn't take care of the frequency of dups you noted in your edit though... but some other suggestions here already take care of that, so I won't post the code as I note a number of them already give you a reasonably elegant solution. I particularly like Joe's implementation using LINQ extensions.
Since you're using 3.5, you could do this in one LINQ query:
var results = stringInput
.ToCharArray() // not actually needed, I've left it here to show what's actually happening
.GroupBy(c=>c)
.Where(g=>g.Count()>1)
.Select(g=>new {Letter=g.First(),Count=g.Count()})
;
For each character that appears more than once in the input, this will give you the character and the count of occurances.
I think the easiest way to achieve that is to use this simple regex
bool foundMatch = false;
foundMatch = Regex.IsMatch(yourString, #"(\w)\1");
If you need more information about the match (start, length etc)
Match match = null;
string testString = "ABCDE AABCD";
match = Regex.Match(testString, #"(\w)\1+?");
if (match.Success)
{
string matchText = match.Value; // AA
int matchIndnex = match.Index; // 6
int matchLength = match.Length; // 2
}
How about something like:
string strString = "AA BRA KA DABRA";
var grp = from c in strString.ToCharArray()
group c by c into m
select new { Key = m.Key, Count = m.Count() };
foreach (var item in grp)
{
Console.WriteLine(
string.Format("Character:{0} Appears {1} times",
item.Key.ToString(), item.Count));
}
Update Now, you'd need an array of counters to maintain a count.
Keep a bit array, with one bit representing a unique character. Turn the bit on when you encounter a character, and run over the string once. A mapping of the bit array index and the character set is upto you to decide. Break if you see that a particular bit is on already.
/(.).*\1/
(or whatever the equivalent is in your regex library's syntax)
Not the most efficient, since it will probably backtrack to every character in the string and then scan forward again. And I don't usually advocate regular expressions. But if you want brevity...
I started looking for some info on the net and I got to the following solution.
string input = "aaaaabbcbbbcccddefgg";
char[] chars = input.ToCharArray();
Dictionary<char, int> dictionary = new Dictionary<char,int>();
foreach (char c in chars)
{
if (!dictionary.ContainsKey(c))
{
dictionary[c] = 1; //
}
else
{
dictionary[c]++;
}
}
foreach (KeyValuePair<char, int> combo in dictionary)
{
if (combo.Value > 1) //If the vale of the key is greater than 1 it means the letter is repeated
{
Console.WriteLine("Letter " + combo.Key + " " + "is repeated " + combo.Value.ToString() + " times");
}
}
I hope it helps, I had a job interview in which the interviewer asked me to solve this and I understand it is a common question.
When there is no order to work on you could use a dictionary to keep the counts:
String input = "AABCD";
var result = new Dictionary<Char, int>(26);
var chars = input.ToCharArray();
foreach (var c in chars)
{
if (!result.ContainsKey(c))
{
result[c] = 0; // initialize the counter in the result
}
result[c]++;
}
foreach (var charCombo in result)
{
Console.WriteLine("{0}: {1}",charCombo.Key, charCombo.Value);
}
The hash solution Jon was describing is probably the best. You could use a HybridDictionary since that works well with small and large data sets. Where the letter is the key and the value is the frequency. (Update the frequency every time the add fails or the HybridDictionary returns true for .Contains(key))

Categories