String compression for repeated chars or substrings [closed]

String compression for repeated chars or substrings [closed] - c#

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a string that may contain any letter
string uncompressed = "dacacacd";
I need to compress this string in the format
string compressed = "d3(ac)d";
But also compress any substring if possible ex:
string uncompressed = "dabcabcdabcabc";
string compressed = "2(d2(abc))";
Is there a way to achieve this without any third party library?

Here's an example that will do the compression based on the longest sub-strings first. Maybe not the most efficient or best compression for something like "abababcabc" but should at least get you started.
public class CompressedString
{
private class Segment
{
public Segment(int count, CompressedString value)
{
Count = count;
Value = value;
}
public int Count { get; set; }
public CompressedString Value { get; set; }
}
private List<Segment> segments = new List<Segment>();
private string uncompressible;
private CompressedString(){}
public static CompressedString Compress(string s)
{
var compressed = new CompressedString();
// longest possible repeating substring is half the length of the
// string, so try that first and work down to shorter lengths
for(int len = s.Length/2; len > 0; len--)
{
// look for the substring at each possible index
for(int i = 0; i < s.Length - len - 1; i++)
{
var sub = s.Substring(i, len);
int count = 1;
// look for repeats of the substring immediately after it.
for(int j = i + len; j <= s.Length - len; j += len)
{
// increase the count of times the substring is found
// or stop looking when it doesn't match
if(sub == s.Substring(j, len))
{
count++;
}
else
{
break;
}
}
// if we found repeats then handle the substring before the
// repeats, the repeast, and everything after.
if(count > 1)
{
// if anything is before the repeats then add it to the
// segments with a count of one and compress it.
if (i > 0)
{
compressed.segments.Add(new Segment(1, Compress(s.Substring(0, i))));
}
// Add the repeats to the segments with the found count
// and compress it.
compressed.segments.Add(new Segment(count, Compress(sub)));
// if anything is after the repeats then add it to the
// segments with a count of one and compress it.
if (s.Length - (count * len) > i)
{
compressed.segments.Add(new Segment(1, Compress(s.Substring(i + (count * len)))));
}
// We're done compressing so break this loop and the
// outer by setting len to 0.
len = 0;
break;
}
}
}
// If we failed to find any repeating substrings then we just have
// a single uncompressible string.
if (!compressed.segments.Any())
{
compressed.uncompressible = s;
}
// Reduce the the compression for something like "2(2(ab))" to "4(ab)"
compressed.Reduce();
return compressed;
}
private void Reduce()
{
// Attempt to reduce each segment.
foreach(var seg in segments)
{
seg.Value.Reduce();
// If there is only one sub segment then we can reduce it.
if(seg.Value.segments.Count == 1)
{
var subSeg = seg.Value.segments[0];
seg.Value = subSeg.Value;
seg.Count *= subSeg.Count;
}
}
}
public override string ToString()
{
if(segments.Any())
{
StringBuilder builder = new StringBuilder();
foreach(var seg in segments)
{
if (seg.Count == 1)
builder.Append(seg.Value.ToString());
else
{
builder.Append(seg.Count).Append("(").Append(seg.Value.ToString()).Append(")");
}
}
return builder.ToString();
}
return uncompressible;
}
}

Related

function complex_decode( string str) that takes a non-simple repeated encoded string, and returns the original un-encoded string

Im trying to write a function complex_decode( string str) in c sharp that takes a non-simple repeated encoded string, and returns the original un-encoded string.
for example, "t11h12e14" would return "ttttttttttthhhhhhhhhhhheeeeeeeeeeeeee". I have been successful in decoding strings where the length is less than 10, but unable to work with length for than 10. I am not allowed to use regex, libraries or loops. Only recursions.
This is my code for simple decode which decodes when length less than 10.
public string decode(string str)
{
if (str.Length < 1)
return "";
if(str.Length==2)
return repeat_char(str[0], char_to_int(str[1]));
else
return repeat_char(str[0], char_to_int(str[1]))+decode(str.Substring(2));
}
public int char_to_int(char c)
{
return (int)(c-48);
}
public string repeat_char(char c, int n)
{
if (n < 1)
return "";
if (n == 1)
return ""+c;
else
return c + repeat_char(c, n - 1);
}
This works as intended, for example input "a5" returns "aaaaa", "t1h1e1" returns "the"
Any help is appreciated.

Here is another way of doing this, assuming the repeating string is always one character long and using only recursion (and a StringBuilder object):
private static string decode(string value)
{
var position = 0;
var result = decode_char(value, ref position);
return result;
}
private static string decode_char(string value, ref int position)
{
var next = value[position++];
var countBuilder = new StringBuilder();
get_number(value, ref position, countBuilder);
var result = new string(next, Convert.ToInt32(countBuilder.ToString()));
if (position < value.Length)
result += decode_char(value, ref position);
return result;
}
private static void get_number(string value, ref int position, StringBuilder countBuilder)
{
if (position < value.Length && char.IsNumber(value[position]))
{
countBuilder.Append(value[position++]);
get_number(value, ref position, countBuilder);
}
}

I've refactored your code a bit. I've removed 2 unnecessary methods that you don't actually need. So, the logic is simple and it works like this;
Example input: t3h2e4
Get the first digit. (Which is 2 and has index of 1)
Get the first letter comes after that index, which is our next letter. (Which is "h" and has index of 2)
Slice the string. Start from index 1 and end the slicing on index 2 to get repeat count. (Which is 3)
Repeat the first letter of string for repeat count times and combine it with the result you got from step 5.
Slice the starting from the next letter index we got in second step, to the very end of the string and pass this to recursive method.
public static string Decode(string input)
{
// If the string is empty or has only 1 character, return the string itself to not proceed.
if (input.Length <= 1)
{
return input;
}
// Convert string into character list.
var characters = new List<char>();
characters.AddRange(input);
var firstDigitIndex = characters.FindIndex(c => char.IsDigit(c)); // Get first digit
var nextLetterIndex = characters.FindIndex(firstDigitIndex, c => char.IsLetter(c)); // Get the next letter after that digit
if (nextLetterIndex == -1)
{
// This has only one reason. Let's say you are in the last recursion and you have c2
// There is no letter after the digit, so the index will -1, which means "not found"
// So, it will raise an exception, since we try to use the -1 in slicing part
// Instead, if it's not found, we set the next letter index to length of the string
// With doing that, you either get repeatCount correctly (since remaining part is only digits)
// or you will get empty string in the next recursion, which will stop the recursion.
nextLetterIndex = input.Length;
}
// Let's say first digit's index is 1 and the next letter's index is 2
// str[2..3] will start to slice the string from index 2 and will stop in index 3
// So, it will basically return us the repeat count.
var repeatCount = int.Parse(input[firstDigitIndex..nextLetterIndex]);
// string(character, repeatCount) constructor will repeat the "character" you passed to it for "repeatCount" times
return new string(input[0], repeatCount) + Decode(input[nextLetterIndex..]);
}
Examples;
Console.WriteLine(Decode("t1h1e1")); // the
Console.WriteLine(Decode("t2h3e4")); // tthhheeee
Console.WriteLine(Decode("t3h3e3")); // ttthhheee
Console.WriteLine(Decode("t2h10e2")); // tthhhhhhhhhhee
Console.WriteLine(Decode("t2h10e10")); // tthhhhhhhhhheeeeeeeeee

First you can simplify your repeat_char function, you have to have a clear stop condition:
public static string repeat_char(char c, int resultLength)
{
if(resultLength < 1) throw new ArgumentOutOfRangeException("resultLength");
if(resultLength == 1) return c.ToString();
return c + repeat_char(c, resultLength - 1);
}
See the use of the parameter as equivalent of a counter on a loop.
So you can have something similar on the main function, a parameter that tells when your substring is not an int anymore.
public static string decode(string str, int repeatNumberLength = 1)
{
if(repeatNumberLength < 1) throw new ArgumentOutOfRangeException("length");
//stop condition
if(string.IsNullOrWhiteSpace(str)) return str;
if(repeatNumberLength >= str.Length) repeatNumberLength = str.Length; //Some validation, just to be safe
//keep going until str[1...repeatNumberLength] is not an int
int charLength;
if(repeatNumberLength < str.Length && int.TryParse(str.Substring(1, repeatNumberLength), out charLength))
{
return decode(str, repeatNumberLength + 1);
}
repeatNumberLength--;
//Get the repeated Char.
charLength = int.Parse(str.Substring(1, repeatNumberLength));
var repeatedChar = repeat_char(str[0], charLength);
//decode the substring
var decodedSubstring = decode(str.Substring(repeatNumberLength + 1));
return repeatedChar + decodedSubstring;
}
I used a default parameter, but you can easily change it for a more traditonal style.
This also assumes that the original str is in a correct format.
An excellent exercise is to change the function so that you can have a word, instead of a char before the number. Then you could, for example, have "the3" as the parameter (resulting in "thethethe").

I took more of a Lisp-style head and tail approach (car and cdr if you speak Lisp) and created a State class to carry around the current state of the parsing.
First the State class:
internal class State
{
public State()
{
LastLetter = string.Empty;
CurrentCount = 0;
HasStarted = false;
CurrentValue = string.Empty;
}
public string LastLetter { get; private set; }
public int CurrentCount { get; private set; }
public bool HasStarted { get; private set; }
public string CurrentValue { get; private set; }
public override string ToString()
{
return $"LastLetter: {LastLetter}, CurrentCount: {CurrentCount}, HasStarted: {HasStarted}, CurrentValue: {CurrentValue}";
}
public void AddLetter(string letter)
{
CurrentCount = 0;
LastLetter = letter;
HasStarted = true;
}
public int AddDigit(string digit)
{
if (!HasStarted)
{
throw new InvalidOperationException($"The input must start with a letter, not a digit");
}
if (!int.TryParse(digit, out var num))
{
throw new InvalidOperationException($"Digit passed to {nameof(AddDigit)} ({digit}) is not a number");
}
CurrentCount = CurrentCount * 10 + num;
return CurrentCount;
}
public string GetValue()
{
if (string.IsNullOrEmpty(LastLetter))
{
return string.Empty;
}
CurrentValue = new string(LastLetter[0], CurrentCount);
return CurrentValue;
}
}
You'll notice it's got some stuff in there for debugging (example, the ToString override and the CurrentValue property)
Once you have that, the decoder is easy, it just recurses over the string it's given (along with (initially) a freshly constructed State instance):
private string Decode(string input, State state)
{
if (input.Length == 0)
{
_buffer.Append(state.GetValue());
return _buffer.ToString();
}
var head = input[0];
var tail = input.Substring(1);
var headString = head.ToString();
if (char.IsDigit(head))
{
state.AddDigit(headString);
}
else // it's a character
{
_buffer.Append(state.GetValue());
state.AddLetter(headString);
}
Decode(tail, state);
return _buffer.ToString();
}
I did this in a simple Windows Forms app, with a text box for input, a label for output and a button to crank her up:
const string NotAllowedPattern = #"[^a-zA-Z0-9]";
private static Regex NotAllowedRegex = new Regex(NotAllowedPattern);
private StringBuilder _buffer = new StringBuilder();
private void button1_Click(object sender, EventArgs e)
{
if (textBox1.Text.Length == 0 || NotAllowedRegex.IsMatch(textBox1.Text))
{
MessageBox.Show(this, "Only Letters and Digits Allowed", "Bad Input", MessageBoxButtons.OK, MessageBoxIcon.Error);
return;
}
label1.Text = string.Empty;
_buffer.Clear();
var result = Decode(textBox1.Text, new State());
label1.Text = result;
}
Yeah, there's a Regex there, but it's just to make sure that the input is valid; it's not involved in calculating the output.

How to run-length encode 'EEDDDNE' to '2E3DNE'?

Explanation: The task itself is that we have 13 strings (stored in the sor[] array) like the one in the title or 'EEENKDDDDKKKNNKDK'
and we have to shorten it in a way that if there's two or more of the same letter next to eachother then we have to write it in the form of 'NumberoflettersLetter'
So by this rule, 'EEENKDDDDKKKNNKDK' would become '3ENK4D3K2NKDK'
using System;
public class Program
{
public static void Main(string[] args)
{
string[] sor = new string[] { "EEENKDDDDKKKNNKDK", "'EEDDDNE'" };
char holder;
int counter = 0;
string temporary;
int indexholder;
for (int i = 0; i < sor.Length; i++)
{
for (int q = 0; q < sor[i].Length; q++)
{
holder = sor[i][q];
indexholder = q;
counter = 0;
while (sor[i][q] == holder)
{
q++;
counter++;
}
if (counter > 1)
{
temporary = Convert.ToString(counter) + holder;
sor[i].Replace(sor[i].Substring(indexholder, q), temporary); // EX here
}
}
}
Console.ReadLine();
}
}
Sorry I didn't make the error clear, it says that :
"The value of index and length has to represent a place inside the string (System.ArgumentOutOfRangeException) - name of parameter: length"
...but I have no clue what's wrong with it, maybe it's a tiny little mistake, maybe the whole thing is messed up, so this is why I'd like someone to help me with this D:
(Ps 'indexholder' is there because i need it for another exercise)
EDIT:
'sor' is the string array that holds these strings (there are 13 of them) like the one mentioned in the title or in the example

You can use regex for this:
Regex.Replace("EEENKDDDDKKKNNKDK", #"(.)\1+", m => $"{m.Length}{m.Groups[1].Value}")
Explanation:
(.) matches any character and puts it in group #1
\1+ matches group #1 as many times can it can

Shortening the same string inplace is more difficult then construction a new one while iterating the old one char by char. If you plan to iteratively add to a string it is better to use the StringBuilder - class instead of adding directly to a string (performance reasons).
You can streamline your approach by using IEnumerable.Aggregate function wich does the iteration on one string for you automatically:
using System;
using System.Linq;
using System.Text;
public class Program
{
public static string RunLengthEncode(string s)
{
if (string.IsNullOrEmpty(s)) // avoid null ref ex and do simple case
return "";
// we need a "state" between the differenc chars of s that we store here:
char curr_c = s[0]; // our current char, we start with the 1st one
int count = 0; // our char counter, we start with 0 as it will be
// incremented as soon as it is processed by Aggregate
// ( and then incremented to 1)
var agg = s.Aggregate(new StringBuilder(), (acc, c) => // StringBuilder
// performs better for multiple string-"additions" then string itself
{
if (c == curr_c)
count++; // same char, increment
else
{
// other char
if (count > 1) // store count if > 1
acc.AppendFormat("{0}", count);
acc.Append(curr_c); // store char
curr_c = c; // set current char to new one
count = 1; // startcount now is 1
}
return acc;
});
// add last things
if (count > 1) // store count if > 1
agg.AppendFormat("{0}", count);
agg.Append(curr_c); // store char
return agg.ToString(); // return the "simple" string
}
Test with
public static void Main(string[] args)
{
Console.WriteLine(RunLengthEncode("'EEENKDDDDKKKNNKDK' "));
Console.ReadLine();
}
}
Output for "'EEENKDDDDKKKNNKDK' ":
'3ENK4D3K2NKDK'
Your approach without using the same string is more like this:
var data = "'EEENKDDDDKKKNNKDK' ";
char curr_c = '\x0'; // avoid unasssinged warning
int count = 0; // counter for the curr_c occurences in row
string result = string.Empty; // resulting string
foreach (var c in data) // process every character of data in order
{
if (c != curr_c) // new character found
{
if (count > 1) // more then 1, add count as string and the char
result += Convert.ToString(count) + curr_c;
else if (count > 0) // avoid initial `\x0` being put into string
result += curr_c;
curr_c = c; // remember new character
count = 1; // so far we found this one
}
else
count++; // not new, increment counter
}
// add the last counted char as well
if (count > 1)
result += Convert.ToString(count) + curr_c;
else
result += curr_c;
// output
Console.WriteLine(data + " ==> " + result);
Output:
'EEENKDDDDKKKNNKDK' ==> '3ENK4D3K2NKDK'
Instead of using the indexing operator [] on your string and have to struggle with indexes all over I use foreach c in "sometext" ... which will proceed char-wise through the string - much less hassle.
If you need to run-length encode an array/list (your sor) of strings, simply apply the code to each one (preferably by using foreach s in yourStringList ....

How to auto-increment number and letter to generate a string sequence wise in c#

I have to make a string which consists a string like - AAA0009, and once it reaches AAA0009, it will generate AA0010 to AAA0019 and so on.... till AAA9999 and when it will reach to AAA9999, it will give AAB0000 to AAB9999 and so on till ZZZ9999.
I want to use static class and static variables so that it can auto increment by itself on every hit.
I have tried some but not even close, so help me out thanks.
Thanks for being instructive I was trying as I Said already but anyways you already want to put negatives over there without even knowing the thing:
Code:
public class GenerateTicketNumber
{
private static int num1 = 0;
public static string ToBase36()
{
const string base36 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
var sb = new StringBuilder(9);
do
{
sb.Insert(0, base36[(byte)(num1 % 36)]);
num1 /= 36;
} while (num1 != 0);
var paddedString = "#T" + sb.ToString().PadLeft(8, '0');
num1 = num1 + 1;
return paddedString;
}
}
above is the code. this will generate a sequence but not the way I want anyways will use it and thanks for help.

Though there's already an accepted answer, I would like to share this one.
P.S. I do not claim that this is the best approach, but in my previous work we made something similar using Azure Table Storage which is a no sql database (FYI) and it works.
1.) Create a table to store your running ticket number.
public class TicketNumber
{
public string Type { get; set; } // Maybe you want to have different types of ticket?
public string AlphaPrefix { get; set; }
public string NumericPrefix { get; set; }
public TicketNumber()
{
this.AlphaPrefix = "AAA";
this.NumericPrefix = "0001";
}
public void Increment()
{
int num = int.Parse(this.NumericPrefix);
if (num + 1 >= 9999)
{
num = 1;
int i = 2; // We are assuming that there are only 3 characters
bool isMax = this.AlphaPrefix == "ZZZ";
if (isMax)
{
this.AlphaPrefix = "AAA"; // reset
}
else
{
while (this.AlphaPrefix[i] == 'Z')
{
i--;
}
char iChar = this.AlphaPrefix[i];
StringBuilder sb = new StringBuilder(this.AlphaPrefix);
sb[i] = (char)(iChar + 1);
this.AlphaPrefix = sb.ToString();
}
}
else
{
num++;
}
this.NumericPrefix = num.ToString().PadLeft(4, '0');
}
public override string ToString()
{
return this.AlphaPrefix + this.NumericPrefix;
}
}
2.) Make sure you perform row-level locking and issue an error when it fails.
Here's an oracle syntax:
SELECT * FROM TICKETNUMBER WHERE TYPE = 'TYPE' FOR UPDATE NOWAIT;
This query locks the row and returns an error if the row is currently locked by another session.
We need this to make sure that even if you have millions of users generating a ticket number, it will not mess up the sequence.
Just make sure to save the new ticket number before you perform a COMMIT.
I forgot the MSSQL version of this but I recall using WITH (ROWLOCK) or something. Just google it.
3.) Working example:
static void Main()
{
TicketNumber ticketNumber = new TicketNumber();
ticketNumber.AlphaPrefix = "ZZZ";
ticketNumber.NumericPrefix = "9999";
for (int i = 0; i < 10; i++)
{
Console.WriteLine(ticketNumber);
ticketNumber.Increment();
}
Console.Read();
}
Output:

Looking at your code that you've provided, it seems that you're backing this with a number and just want to convert that to a more user-friendly text representation.
You could try something like this:
private static string ValueToId(int value)
{
var parts = new List<string>();
int numberPart = value % 10000;
parts.Add(numberPart.ToString("0000"));
value /= 10000;
for (int i = 0; i < 3 || value > 0; ++i)
{
parts.Add(((char)(65 + (value % 26))).ToString());
value /= 26;
}
return string.Join(string.Empty, parts.AsEnumerable().Reverse().ToArray());
}
It will take the first 4 characters and use them as is, and then for the remainder of the value if will convert it into characters A-Z.
So 9999 becomes AAA9999, 10000 becomes AAB0000, and 270000 becomes ABB0000.
If the number is big enough that it exceeds 3 characters, it will add more letters at the start.

Here's an example of how you could go about implementing it
void Main()
{
string template = #"AAAA00";
var templateChars = template.ToCharArray();
for (int i = 0; i < 100000; i++)
{
templateChars = IncrementCharArray(templateChars);
Console.WriteLine(string.Join("",templateChars ));
}
}
public static char Increment(char val)
{
if(val == '9') return 'A';
if(val == 'Z') return '0';
return ++val;
}
public static char[] IncrementCharArray(char[] val)
{
if (val.All(chr => chr == 'Z'))
{
var newArray = new char[val.Length + 1];
for (int i = 0; i < newArray.Length; i++)
{
newArray[i] = '0';
}
return newArray;
}
int length = val.Length;
while (length > -1)
{
char lastVal = val[--length];
val[length] = Increment(lastVal);
if ( val[length] != '0') break;
}
return val;
}

Is there any way I can make this C# code faster? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am reading in a large file X12 and parsing the information within. I have two bottleneck functions that I can't seem to work around. read_line() and get_element() Is there any way I could make these two functions faster? The main bottleneck in the get_element function seems to be the Substring method.
public String get_element(int element_number) {
int count = 0;
int start_index = 0;
int end_index = 0;
int current_index = 0;
while (count < element_number && current_index != -1) {
current_index = line_text.IndexOf(x12_reader.element_delimiter, start_index);
start_index = current_index + 1;
count++;
}
if (current_index != -1) {
end_index = line_text.IndexOf(x12_reader.element_delimiter, start_index);
if (end_index == -1) end_index = line_text.Length;
return line_text.Substring(start_index, end_index - start_index);
} else {
return "";
}
}
private String read_line() {
string_builder.Clear();
int n;
while ((n = stream_reader.Read()) != -1) {
if (n == line_terminator) return string_builder.ToString();
string_builder.Append((char)n);
}
return string_builder.ToString();
}
I am reading x12 data. Here is an example of what it looks like. http://examples.x12.org/005010X221/dollars-and-data-sent-together/

Since your profiler tells you get_element is a bottleneck, and the method itself is coded very efficiently, you need to minimize the number of times this method is called.
Calling get_element repeatedly in a loop forces it to performs the same parsing job repeatedly:
for (int i = 0 ; i != n ; i++) {
var element = get_element(i);
... // Do something with the element
}
You should be able to fix this problem by rewriting get_element as GetElements returning all elements as a collection, and then taking individual elements from the same collection in a loop:
var allElements = GetElements();
for (int i = 0 ; i != n ; i++) {
var element = allElements[i];
... // Do something with the element
}
in most cases I only need one or two elements
In this case you could make a method that retrieves all required indexes at once - for example, by passing BitArray of required indexes.

Ok, second try. Discarding String.Split due to performance reasons, something like this should work much faster than your implementation:
//DISCLAIMER; typed in my cell phone, not tested. Sure it has bugs but you should get the idea.
public string get_element(int index)
{
var buffer = new StringBuilder();
var counter = -1;
using (var enumerator = text_line.GetEnumerator())
{
while (enumerator.MoveNext())
{
if (enumerator.Current == x12_reader.element_delimiter)
{
counter++;
}
else if (counter == index)
{
buffer.Append(enumerator.Current);
}
else if (counter > index)
break;
}
}
return buffer.ToString();
}

I'm not sure what you are doing exactly, but if I'm understanding your code correctly, wouldn't get element be simpler as follows?
public string get_Element(int index)
{
var elements = line_text.Split(new[] { x12_reader.element_delimiter });
if (index > elements.Length)
return "";
return elements[index];
}

Program to find minimum number in string

I have a c# class like so
internal class QueuedMinimumNumberFinder : ConcurrentQueue<int>
{
private readonly string _minString;
public QueuedMinimumNumberFinder(string number, int takeOutAmount)
{
if (number.Length < takeOutAmount)
{
throw new Exception("Error *");
}
var queueIndex = 0;
var queueAmount = number.Length - takeOutAmount;
var numQueue = new ConcurrentQueue<int>(number.ToCharArray().Where(m => (int) Char.GetNumericValue(m) != 0).Select(m=>(int)Char.GetNumericValue(m)).OrderBy(m=>m));
var zeroes = number.Length - numQueue.Count;
while (queueIndex < queueAmount)
{
int next;
if (queueIndex == 0)
{
numQueue.TryDequeue(out next);
Enqueue(next);
} else
{
if (zeroes > 0)
{
Enqueue(0);
zeroes--;
} else
{
numQueue.TryDequeue(out next);
Enqueue(next);
}
}
queueIndex++;
}
var builder = new StringBuilder();
while (Count > 0)
{
int next = 0;
TryDequeue(out next);
builder.Append(next.ToString());
}
_minString = builder.ToString();
}
public override string ToString() { return _minString; }
}
The point of the program is to find the minimum possible integer that can be made by taking out any x amount of characters from a string(example 100023 is string, if you take out any 3 letters, the minimum int created would be 100). My question is, is this the correct way to do this? Is there a better data structure that can be used for this problem?
First Edit:
Here's how it looks now
internal class QueuedMinimumNumberFinder
{
private readonly string _minString;
public QueuedMinimumNumberFinder(string number, int takeOutAmount)
{
var queue = new Queue<int>();
if (number.Length < takeOutAmount)
{
throw new Exception("Error *");
}
var queueIndex = 0;
var queueAmount = number.Length - takeOutAmount;
var numQueue = new List<int>(number.Where(m=>(int)Char.GetNumericValue(m)!=0).Select(m=>(int)Char.GetNumericValue(m))).ToList();
var zeroes = number.Length - numQueue.Count;
while (queueIndex < queueAmount)
{
if (queueIndex == 0)
{
var nextMin = numQueue.Min();
numQueue.Remove(nextMin);
queue.Enqueue(nextMin);
} else
{
if (zeroes > 1)
{
queue.Enqueue(0);
zeroes--;
} else
{
var nextMin = numQueue.Min();
numQueue.Remove(nextMin);
queue.Enqueue(nextMin);
}
}
queueIndex++;
}
var builder = new StringBuilder();
while (queue.Count > 0)
{
builder.Append(queue.Dequeue().ToString());
}
_minString = builder.ToString();
}
public override string ToString() { return _minString; }
}

A pretty simple and efficient implementation can be made, once you realize that your input string digits map to the domain of only 10 possible values: '0' .. '9'.
This can be encoded as the number of occurrences of a specific digit in your input string using a simple array of 10 integers: var digit_count = new int[10];
#MasterGillBates describes this idea in his answer.
You can then regard this array as your priority queue from which you can dequeue the characters you need by iteratively removing the lowest available character (decreasing its occurrence count in the array).
The code sample below provides an example implementation for this idea.
public static class MinNumberSolver
{
public static string GetMinString(string number, int takeOutAmount)
{
// "Add" the string by simply counting digit occurrance frequency.
var digit_count = new int[10];
foreach (var c in number)
if (char.IsDigit(c))
digit_count[c - '0']++;
// Now remove them one by one in lowest to highest order.
// For the first character we skip any potential leading 0s
var selected = new char[takeOutAmount];
var start_index = 1;
selected[0] = TakeLowest(digit_count, ref start_index);
// For the rest we start in digit order at '0' first.
start_index = 0;
for (var i = 0; i < takeOutAmount - 1; i++)
selected[1 + i] = TakeLowest(digit_count, ref start_index);
// And return the result.
return new string(selected);
}
private static char TakeLowest(int[] digit_count, ref int start_index)
{
for (var i = start_index; i < digit_count.Length; i++)
{
if (digit_count[i] > 0)
{
start_index = ((--digit_count[i] > 0) ? i : i + 1);
return (char)('0' + i);
}
}
throw new InvalidDataException("Input string does not have sufficient digits");
}
}

Just keep a count of how many times each digit appears. An array of size 10 will do. Count[i] gives the count of digit i.
Then pick the smallest non-zero i first, then pick the smallest etc and form your number.

Here's my solution using LINQ:
public string MinimumNumberFinder(string number, int takeOutAmount)
{
var ordered = number.OrderBy(n => n);
var nonZero = ordered.SkipWhile(n => n == '0');
var zero = ordered.TakeWhile(n => n == '0');
var result = nonZero.Take(1)
.Concat(zero)
.Concat(nonZero.Skip(1))
.Take(number.Length - takeOutAmount);
return new string(result.ToArray());
}

You could place every integer into a list and find all possible sequences of these values. From the list of sequences, you could sort through taking only the sets which have the number of integers you want. From there, you can write a quick function which parses a sequence into an integer. Next, you could store all of your parsed sequences into an array or an other data structure and sort based on value, which will allow you to select the minimum number from the data structure. There may be simpler ways to do this, but this will definitely work and gives you options as far as how many digits you want your number to have.

If I'm understanding this correctly, why don't you just pick out your numbers starting with the smallest number greater than zero. Then pick out all zeroes, then any remaining number if all the zeroes are picked up. This is all depending on the length of your ending result
In your example you have a 6 digit number and you want to pick out 3 digits. This means you'll only have 3 digits left. If it was a 10 digit number, then you would end up with a 7 digit number, etc...
So have an algorithm that knows the length of your starting number, how many digits you plan on removing, and the length of your ending number. Then just pick out the numbers.
This is just quick and dirty code:
string startingNumber = "9999903040404"; // "100023";
int numberOfCharactersToRemove = 3;
string endingNumber = string.Empty;
int endingNumberLength = startingNumber.Length - numberOfCharactersToRemove;
while (endingNumber.Length < endingNumberLength)
{
if (string.IsNullOrEmpty(endingNumber))
{
// Find the smallest digit in the starting number
for (int i = 1; i <= 9; i++)
{
if (startingNumber.Contains(i.ToString()))
{
endingNumber += i.ToString();
startingNumber = startingNumber.Remove(startingNumber.IndexOf(i.ToString()), 1);
break;
}
}
}
else if (startingNumber.Contains("0"))
{
// Add any zeroes
endingNumber += "0";
startingNumber = startingNumber.Remove(startingNumber.IndexOf("0"), 1);
}
else
{
// Add any remaining numbers from least to greatest
for (int i = 1; i <= 9; i++)
{
if (startingNumber.Contains(i.ToString()))
{
endingNumber += i.ToString();
startingNumber = startingNumber.Remove(startingNumber.IndexOf(i.ToString()), 1);
break;
}
}
}
}
Console.WriteLine(endingNumber);
100023 starting number resulted in 100 being the end result
9999903040404 starting number resulted in 3000044499 being the end result

Here's my version to fix this problem:
DESIGN:
You can sort your list using a binary tree , there are a lot of
implementations , I picked this one
Then you can keep track of the number of the Zeros you have in your
string Finally you will end up with two lists: I named one
SortedDigitsList and the other one ZeroDigitsList
perform a switch case to determine which last 3 digits should be
returned
Here's the complete code:
class MainProgram2
{
static void Main()
{
Tree theTree = new Tree();
Console.WriteLine("Please Enter the string you want to process:");
string input = Console.ReadLine();
foreach (char c in input)
{
// Check if it's a digit or not
if (c >= '0' && c <= '9')
{
theTree.Insert((int)Char.GetNumericValue(c));
}
}
//End of for each (char c in input)
Console.WriteLine("Inorder traversal resulting Tree Sort without the zeros");
theTree.Inorder(theTree.ReturnRoot());
Console.WriteLine(" ");
//Format the output depending on how many zeros you have
Console.WriteLine("The final 3 digits are");
switch (theTree.ZeroDigitsList.Count)
{
case 0:
{
Console.WriteLine("{0}{1}{2}", theTree.SortedDigitsList[0], theTree.SortedDigitsList[1], theTree.SortedDigitsList[2]);
break;
}
case 1:
{
Console.WriteLine("{0}{1}{2}", theTree.SortedDigitsList[0], 0, theTree.SortedDigitsList[2]);
break;
}
default:
{
Console.WriteLine("{0}{1}{2}", theTree.SortedDigitsList[0], 0, 0);
break;
}
}
Console.ReadLine();
}
}//End of main()
}
class Node
{
public int item;
public Node leftChild;
public Node rightChild;
public void displayNode()
{
Console.Write("[");
Console.Write(item);
Console.Write("]");
}
}
class Tree
{
public List<int> SortedDigitsList { get; set; }
public List<int> ZeroDigitsList { get; set; }
public Node root;
public Tree()
{
root = null;
SortedDigitsList = new List<int>();
ZeroDigitsList = new List<int>();
}
public Node ReturnRoot()
{
return root;
}
public void Insert(int id)
{
Node newNode = new Node();
newNode.item = id;
if (root == null)
root = newNode;
else
{
Node current = root;
Node parent;
while (true)
{
parent = current;
if (id < current.item)
{
current = current.leftChild;
if (current == null)
{
parent.leftChild = newNode;
return;
}
}
else
{
current = current.rightChild;
if (current == null)
{
parent.rightChild = newNode;
return;
}
}
}
}
}
//public void Preorder(Node Root)
//{
// if (Root != null)
// {
// Console.Write(Root.item + " ");
// Preorder(Root.leftChild);
// Preorder(Root.rightChild);
// }
//}
public void Inorder(Node Root)
{
if (Root != null)
{
Inorder(Root.leftChild);
if (Root.item > 0)
{
SortedDigitsList.Add(Root.item);
Console.Write(Root.item + " ");
}
else
{
ZeroDigitsList.Add(Root.item);
}
Inorder(Root.rightChild);
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

String compression for repeated chars or substrings [closed] - c#

Related

function complex_decode( string str) that takes a non-simple repeated encoded string, and returns the original un-encoded string

How to run-length encode 'EEDDDNE' to '2E3DNE'?

How to auto-increment number and letter to generate a string sequence wise in c#

Is there any way I can make this C# code faster? [closed]

Program to find minimum number in string

Categories

Resources