Using Dictionary to map byte to BitArray - c#

I am developing an application that implements Simple Substitution Cypher. Now for speed reasons (and because that was one of the conditions) I need to use BitArray for encryption and decryption. The user will enter "coded" alphabet and I would need to map it in some way so I chose Dictionary since it uses hash table and has O(1) complexity when the user access data. But now I found myself wondering how can I do this when I have "coded" alphabet initialized like this:
BitArray codedAlphabet = new BitArray(bytes);
This would make me use 2 for loops to achieve my goal. Does anyone have different idea? Hopefully you understood what I am trying to achieve. Thank you in advance.
Code:
namespace Harpokrat.EncryptionAlgorithms
{
// Simple substitution cypher algorithm
public class SimpleSubstitutionStrategy : IEncryptionStrategy
{
private string alphabet; // message to be encrypted
private string coded; // this will be the key (input from file or from UI)
private ArrayList AlphabetBackUp = new ArrayList();
private ArrayList CodedBackUp = new ArrayList();
#region Properties
public string Alphabet
{
get
{
return this.alphabet;
}
set
{
this.alphabet = value;
foreach (char c in this.alphabet.ToCharArray())
{
this.AlphabetBackUp.Add(c);
}
}
}
public string Coded
{
get
{
return this.coded;
}
set
{
this.coded = "yqmnnsgwatkgetwtawuiqwemsg"; //for testing purposes
foreach (char c in this.coded.ToCharArray())
{
this.CodedBackUp.Add(c);
}
}
}
#endregion
public string Decrypt(string message)
{
message = message.ToLower();
string result = "";
for (int i = 0; i < message.Length; i++)
{
int indexOfSourceChar = CodedBackUp.IndexOf(message[i]);
if (indexOfSourceChar < 0 || (indexOfSourceChar > alphabet.Length - 1))
{
result += "#";
}
else
{
result += alphabet[indexOfSourceChar].ToString();
}
}
return result;
}
public string Encrypt(string message)
{
message = message.ToLower();
string result = "";
for(int i = 0; i < message.Length; i++)
{
int indexOfSourceChar = AlphabetBackUp.IndexOf(message[i]);
if (indexOfSourceChar < 0 || (indexOfSourceChar > coded.Length - 1))
{
result += "#";
}
else
{
result += coded[indexOfSourceChar].ToString();
}
}
return result;
}
}
}

I'd recommend a single method to set alphabet and coded at the same time, that internally builds the two dictionaries you'd need to do Encryption and Decryption, and a helper method to do a get-or-return-default ('#' in your case) on them.
That way you can implement a single function that does either Encryption or Decryption depending on the dictionary passed in (which could be implemented in a single line of code if you're comfortable using LINQ).

Related

What is a Unicode safe replica of String.IndexOf(string input) that can handle Surrogate Pairs?

I am trying to figure out an equivalent to C# string.IndexOf(string) that can handle surrogate pairs in Unicode characters.
I am able to get the index when only comparing single characters, like in the code below:
public static int UnicodeIndexOf(this string input, string find)
{
return input.ToTextElements().ToList().IndexOf(find);
}
public static IEnumerable<string> ToTextElements(this string input)
{
var e = StringInfo.GetTextElementEnumerator(input);
while (e.MoveNext())
{
yield return e.GetTextElement();
}
}
But if I try to actually use a string as the find variable then it won't work because each text element only contains a single character to compare against.
Are there any suggestions as to how to go about writing this?
Thanks for any and all help.
EDIT:
Below is an example of why this is necessary:
CODE
Console.WriteLine("HolyCow𪘁BUBBYY𪘁YY𪘁Y".IndexOf("BUBB"));
Console.WriteLine("HolyCow#BUBBYY#YY#Y".IndexOf("BUBB"));
OUTPUT
9
8
Notice where I replace the 𪘁 character with # the values change.
You basically want to find index of one string array in another string array. We can adapt code from this question for that:
public static class Extensions {
public static int UnicodeIndexOf(this string input, string find, StringComparison comparison = StringComparison.CurrentCulture) {
return IndexOf(
// split input by code points
input.ToTextElements().ToArray(),
// split searched value by code points
find.ToTextElements().ToArray(),
comparison);
}
// code from another answer
private static int IndexOf(string[] haystack, string[] needle, StringComparison comparision) {
var len = needle.Length;
var limit = haystack.Length - len;
for (var i = 0; i <= limit; i++) {
var k = 0;
for (; k < len; k++) {
if (!String.Equals(needle[k], haystack[i + k], comparision)) break;
}
if (k == len) return i;
}
return -1;
}
public static IEnumerable<string> ToTextElements(this string input) {
var e = StringInfo.GetTextElementEnumerator(input);
while (e.MoveNext()) {
yield return e.GetTextElement();
}
}
}

How to auto-increment number and letter to generate a string sequence wise in c#

I have to make a string which consists a string like - AAA0009, and once it reaches AAA0009, it will generate AA0010 to AAA0019 and so on.... till AAA9999 and when it will reach to AAA9999, it will give AAB0000 to AAB9999 and so on till ZZZ9999.
I want to use static class and static variables so that it can auto increment by itself on every hit.
I have tried some but not even close, so help me out thanks.
Thanks for being instructive I was trying as I Said already but anyways you already want to put negatives over there without even knowing the thing:
Code:
public class GenerateTicketNumber
{
private static int num1 = 0;
public static string ToBase36()
{
const string base36 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
var sb = new StringBuilder(9);
do
{
sb.Insert(0, base36[(byte)(num1 % 36)]);
num1 /= 36;
} while (num1 != 0);
var paddedString = "#T" + sb.ToString().PadLeft(8, '0');
num1 = num1 + 1;
return paddedString;
}
}
above is the code. this will generate a sequence but not the way I want anyways will use it and thanks for help.
Though there's already an accepted answer, I would like to share this one.
P.S. I do not claim that this is the best approach, but in my previous work we made something similar using Azure Table Storage which is a no sql database (FYI) and it works.
1.) Create a table to store your running ticket number.
public class TicketNumber
{
public string Type { get; set; } // Maybe you want to have different types of ticket?
public string AlphaPrefix { get; set; }
public string NumericPrefix { get; set; }
public TicketNumber()
{
this.AlphaPrefix = "AAA";
this.NumericPrefix = "0001";
}
public void Increment()
{
int num = int.Parse(this.NumericPrefix);
if (num + 1 >= 9999)
{
num = 1;
int i = 2; // We are assuming that there are only 3 characters
bool isMax = this.AlphaPrefix == "ZZZ";
if (isMax)
{
this.AlphaPrefix = "AAA"; // reset
}
else
{
while (this.AlphaPrefix[i] == 'Z')
{
i--;
}
char iChar = this.AlphaPrefix[i];
StringBuilder sb = new StringBuilder(this.AlphaPrefix);
sb[i] = (char)(iChar + 1);
this.AlphaPrefix = sb.ToString();
}
}
else
{
num++;
}
this.NumericPrefix = num.ToString().PadLeft(4, '0');
}
public override string ToString()
{
return this.AlphaPrefix + this.NumericPrefix;
}
}
2.) Make sure you perform row-level locking and issue an error when it fails.
Here's an oracle syntax:
SELECT * FROM TICKETNUMBER WHERE TYPE = 'TYPE' FOR UPDATE NOWAIT;
This query locks the row and returns an error if the row is currently locked by another session.
We need this to make sure that even if you have millions of users generating a ticket number, it will not mess up the sequence.
Just make sure to save the new ticket number before you perform a COMMIT.
I forgot the MSSQL version of this but I recall using WITH (ROWLOCK) or something. Just google it.
3.) Working example:
static void Main()
{
TicketNumber ticketNumber = new TicketNumber();
ticketNumber.AlphaPrefix = "ZZZ";
ticketNumber.NumericPrefix = "9999";
for (int i = 0; i < 10; i++)
{
Console.WriteLine(ticketNumber);
ticketNumber.Increment();
}
Console.Read();
}
Output:
Looking at your code that you've provided, it seems that you're backing this with a number and just want to convert that to a more user-friendly text representation.
You could try something like this:
private static string ValueToId(int value)
{
var parts = new List<string>();
int numberPart = value % 10000;
parts.Add(numberPart.ToString("0000"));
value /= 10000;
for (int i = 0; i < 3 || value > 0; ++i)
{
parts.Add(((char)(65 + (value % 26))).ToString());
value /= 26;
}
return string.Join(string.Empty, parts.AsEnumerable().Reverse().ToArray());
}
It will take the first 4 characters and use them as is, and then for the remainder of the value if will convert it into characters A-Z.
So 9999 becomes AAA9999, 10000 becomes AAB0000, and 270000 becomes ABB0000.
If the number is big enough that it exceeds 3 characters, it will add more letters at the start.
Here's an example of how you could go about implementing it
void Main()
{
string template = #"AAAA00";
var templateChars = template.ToCharArray();
for (int i = 0; i < 100000; i++)
{
templateChars = IncrementCharArray(templateChars);
Console.WriteLine(string.Join("",templateChars ));
}
}
public static char Increment(char val)
{
if(val == '9') return 'A';
if(val == 'Z') return '0';
return ++val;
}
public static char[] IncrementCharArray(char[] val)
{
if (val.All(chr => chr == 'Z'))
{
var newArray = new char[val.Length + 1];
for (int i = 0; i < newArray.Length; i++)
{
newArray[i] = '0';
}
return newArray;
}
int length = val.Length;
while (length > -1)
{
char lastVal = val[--length];
val[length] = Increment(lastVal);
if ( val[length] != '0') break;
}
return val;
}

Multiple string replace in c#

I am dynamically editing a regex for matching text in a pdf, which can contain hyphenation at the end of some lines.
Example:
Source string:
"consecuti?vely"
Replace rules:
.Replace("cuti?",#"cuti?(-\s+)?")
.Replace("con",#"con(-\s+)?")
.Replace("consecu",#"consecu(-\s+)?")
Desired output:
"con(-\s+)?secu(-\s+)?ti?(-\s+)?vely"
The replace rules are built dynamically, this is just an example which causes problems.
Whats the best solution to perform such a multiple replace, which will produce the desired output?
So far I thought about using Regex.Replace and zipping the word to replace with optional (-\s+)? between each character, but that would not work, because the word to replace already contains special-meaning characters in regex context.
EDIT: My current code, doesnt work when replace rules overlap like in example above
private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
{
var originalTextOfThePage = mPagesNotModified[searchedPage];
var hyphenatedParts = Regex.Matches(originalTextOfThePage, #"\w+\-\s");
for (int i = 0; i < hyphenatedParts.Count; i++)
{
var partBeforeHyphen = String.Concat(hyphenatedParts[i].Value.TakeWhile(c => c != '-'));
regex = regex.Replace(partBeforeHyphen, partBeforeHyphen + #"(-\s+)?");
}
return regex;
}
the output of this program is "con(-\s+)?secu(-\s+)?ti?(-\s+)?vely";
and as I understand your problem, my code can completely solve your problem.
class Program
{
class somefields
{
public string first;
public string secound;
public string Add;
public int index;
public somefields(string F, string S)
{
first = F;
secound = S;
}
}
static void Main(string[] args)
{
//declaring output
string input = "consecuti?vely";
List<somefields> rules=new List<somefields>();
//declaring rules
rules.Add(new somefields("cuti?",#"cuti?(-\s+)?"));
rules.Add(new somefields("con",#"con(-\s+)?"));
rules.Add(new somefields("consecu",#"consecu(-\s+)?"));
// finding the string which must be added to output string and index of that
foreach (var rul in rules)
{
var index=input.IndexOf(rul.first);
if (index != -1)
{
var add = rul.secound.Remove(0,rul.first.Count());
rul.Add = add;
rul.index = index+rul.first.Count();
}
}
// sort rules by index
for (int i = 0; i < rules.Count(); i++)
{
for (int j = i + 1; j < rules.Count(); j++)
{
if (rules[i].index > rules[j].index)
{
somefields temp;
temp = rules[i];
rules[i] = rules[j];
rules[j] = temp;
}
}
}
string output = input.ToString();
int k=0;
foreach(var rul in rules)
{
if (rul.index != -1)
{
output = output.Insert(k + rul.index, rul.Add);
k += rul.Add.Length;
}
}
System.Console.WriteLine(output);
System.Console.ReadLine();
}
}
You should probably write your own parser, it's probably easier to maintain :).
Maybe you could add "special characters" around pattern in order to protect them like "##" if the strings not contains it.
Try this one:
var final = Regex.Replace(originalTextOfThePage, #"(\w+)(?:\-[\s\r\n]*)?", "$1");
I had to give up an easy solution and did the editing of the regex myself. As a side effect, the new approach goes only twice trough the string.
private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
{
var indexesToInsertPossibleHyphenation = GetPossibleHyphenPositions(regex, searchedPage);
var hyphenationToken = #"(-\s+)?";
return InsertStringTokenInAllPositions(regex, indexesToInsertPossibleHyphenation, hyphenationToken);
}
private static string InsertStringTokenInAllPositions(string sourceString, List<int> insertionIndexes, string insertionToken)
{
if (insertionIndexes == null || string.IsNullOrEmpty(insertionToken)) return sourceString;
var sb = new StringBuilder(sourceString.Length + insertionIndexes.Count * insertionToken.Length);
var linkedInsertionPositions = new LinkedList<int>(insertionIndexes.Distinct().OrderBy(x => x));
for (int i = 0; i < sourceString.Length; i++)
{
if (!linkedInsertionPositions.Any())
{
sb.Append(sourceString.Substring(i));
break;
}
if (i == linkedInsertionPositions.First.Value)
{
sb.Append(insertionToken);
}
if (i >= linkedInsertionPositions.First.Value)
{
linkedInsertionPositions.RemoveFirst();
}
sb.Append(sourceString[i]);
}
return sb.ToString();
}
private List<int> GetPossibleHyphenPositions(string regex, int searchedPage)
{
var originalTextOfThePage = mPagesNotModified[searchedPage];
var hyphenatedParts = Regex.Matches(originalTextOfThePage, #"\w+\-\s");
var indexesToInsertPossibleHyphenation = new List<int>();
//....
// Aho-Corasick to find all occurences of all
//strings in "hyphenatedParts" in the "regex" string
// ....
return indexesToInsertPossibleHyphenation;
}

How to extgract an integer and a two dimensional integer array from a combination of both in C#

I have an input as
2:{{2,10},{6,4}}
I am reading this as
string input = Console.ReadLine();
Next this input has to be passed to a function
GetCount(int count, int[,] arr)
{
}
How can I do so using C#?
Thanks
You could use RegularExpressions for extracting in an easy way each token of your input string. In the following example, support for extra spaces is included also (the \s* in the regular expressions).
Remember that always is a great idea to give a class the responsibility of parsing (in this example) rather than taking an procedural approach.
All the relevant lines are commented for better understanding.
Finally, i tested this and worked with the provided sample input strings.
using System;
using System.Text.RegularExpressions;
namespace IntPairArrayParserDemo
{
class Program
{
static void Main(string[] args)
{
var input = "2:{{2,10},{6,4}}";
ParseAndPrintArray(input);
var anotherInput = "2 : { { 2 , 10 } , { 6 , 4 } }";
ParseAndPrintArray(anotherInput);
}
private static void ParseAndPrintArray(string input)
{
Console.WriteLine("Parsing array {0}...", input);
var array = IntPairArrayParser.Parse(input);
var pairCount = array.GetLength(0);
for (var i = 0; i < pairCount; i++)
{
Console.WriteLine("Pair found: {0},{1}", array[i, 0], array[i, 1]);
}
Console.WriteLine();
}
}
internal static class IntPairArrayParser
{
public static int[,] Parse(string input)
{
if (string.IsNullOrWhiteSpace(input)) throw new ArgumentOutOfRangeException("input");
// parse array length from string
var length = ParseLength(input);
// create the array that will hold all the parsed elements
var result = new int[length, 2];
// parse array elements from input
ParseAndStoreElements(input, result);
return result;
}
private static void ParseAndStoreElements(string input, int[,] array)
{
// get the length of the first dimension of the array
var expectedElementCount = array.GetLength(0);
// parse array elements
var elementMatches = Regex.Matches(input, #"{\s*(\d+)\s*,\s*(\d+)\s*}");
// validate that the number of elements present in input is corrent
if (expectedElementCount != elementMatches.Count)
{
var errorMessage = string.Format("Array should have {0} elements. It actually has {1} elements.", expectedElementCount, elementMatches.Count);
throw new ArgumentException(errorMessage, "input");
}
// parse array elements from input into array
for (var elementIndex = 0; elementIndex < expectedElementCount; elementIndex++)
{
ParseAndStoreElement(elementMatches[elementIndex], elementIndex, array);
}
}
private static void ParseAndStoreElement(Match match, int index, int[,] array)
{
// parse first and second element values from the match found
var first = int.Parse(match.Groups[1].Value);
var second = int.Parse(match.Groups[2].Value);
array[index, 0] = first;
array[index, 1] = second;
}
private static int ParseLength(string input)
{
// get the length from input and parse it as int
var lengthMatch = Regex.Match(input, #"(\d+)\s*:");
return int.Parse(lengthMatch.Groups[1].Value);
}
}
}
Not to do your work for you, you will first have to parse the whole string to find the individual integers, either using regular expressions or, as I would do it myself, the string.Split method. Then parse the substrings representing the individual integers with the int.Parse or the int.TryParse methods.
I doubt you're going to get a serious parsing answer for your custom format. If you NEED to have the value inputted that way, I'd look up some info on regular expressions. If that's not powerful enough for you, there are some fairly convienient parser-generators you can use.
Alternatively, the much more realistic idea would be something like this:
(NOTE: Haven't tried this at all... didn't even put it in VS... but this is the idea...)
int rows = 0;
string rowsInput = "";
do {
Console.Write("Number of rows:");
rowsInput = Console.ReadLine();
} while (!Int32.TryParse(rowsInput, out rows);
int columns = 0;
string columnsInput = "";
do {
Console.Write("Number of columns:");
string columnsInput = Console.ReadLine();
} while (!Int32.TryParse(columnsInput, out columns);
List<List<int>> values = new List<List<int>>();
for (int i = 0; i < rows; i++)
{
bool validInput = false;
do {
Console.Write(String.Format("Enter comma-delimited integers for row #{0}:", i.ToString()));
string row = Console.ReadLine();
string[] items = row.split(',');
int temp;
validInput = (items.Length == columns) && (from item in items where !Int32.TryParse(item, out temp) select item).count() == 0;
if (validInput)
{
values.add(
(from item in items select Convert.ToInt32(item)).ToList()
);
}
} while (!validInput);
}

Boyer-Moore Practical in C#?

Boyer-Moore is probably the fastest non-indexed text-search algorithm known. So I'm implementing it in C# for my Black Belt Coder website.
I had it working and it showed roughly the expected performance improvements compared to String.IndexOf(). However, when I added the StringComparison.Ordinal argument to IndexOf, it started outperforming my Boyer-Moore implementation. Sometimes, by a considerable amount.
I wonder if anyone can help me figure out why. I understand why StringComparision.Ordinal might speed things up, but how could it be faster than Boyer-Moore? Is it because of the the overhead of the .NET platform itself, perhaps because array indexes must be validated to ensure they're in range, or something else altogether. Are some algorithms just not practical in C#.NET?
Below is the key code.
// Base for search classes
abstract class SearchBase
{
public const int InvalidIndex = -1;
protected string _pattern;
public SearchBase(string pattern) { _pattern = pattern; }
public abstract int Search(string text, int startIndex);
public int Search(string text) { return Search(text, 0); }
}
/// <summary>
/// A simplified Boyer-Moore implementation.
///
/// Note: Uses a single skip array, which uses more memory than needed and
/// may not be large enough. Will be replaced with multi-stage table.
/// </summary>
class BoyerMoore2 : SearchBase
{
private byte[] _skipArray;
public BoyerMoore2(string pattern)
: base(pattern)
{
// TODO: To be replaced with multi-stage table
_skipArray = new byte[0x10000];
for (int i = 0; i < _skipArray.Length; i++)
_skipArray[i] = (byte)_pattern.Length;
for (int i = 0; i < _pattern.Length - 1; i++)
_skipArray[_pattern[i]] = (byte)(_pattern.Length - i - 1);
}
public override int Search(string text, int startIndex)
{
int i = startIndex;
// Loop while there's still room for search term
while (i <= (text.Length - _pattern.Length))
{
// Look if we have a match at this position
int j = _pattern.Length - 1;
while (j >= 0 && _pattern[j] == text[i + j])
j--;
if (j < 0)
{
// Match found
return i;
}
// Advance to next comparision
i += Math.Max(_skipArray[text[i + j]] - _pattern.Length + 1 + j, 1);
}
// No match found
return InvalidIndex;
}
}
EDIT: I've posted all my test code and conclusions on the matter at http://www.blackbeltcoder.com/Articles/algorithms/fast-text-search-with-boyer-moore.
Based on my own tests and the comments made here, I've concluded that the reason String.IndexOf() performs so well with StringComparision.Ordinal is because the method calls into unmanaged code that likely employs hand-optimized assembly language.
I have run a number of different tests and String.IndexOf() just seems to be faster than anything I can implement using managed C# code.
If anyone's interested, I've written everything I've discovered about this and posted several variations of the Boyer-Moore algorithm in C# at http://www.blackbeltcoder.com/Articles/algorithms/fast-text-search-with-boyer-moore.
My bet is that setting that flag allows String.IndexOf to use Boyer-Moore itself. And its implementation is better than yours.
Without that flag it has to be careful using Boyer-Moore (and probably doesn't) because of potential issues around Unicode. In particular the possibility of Unicode causes the transition tables that Boyer-Moore uses to blow up.
I made some small changes to your code, and made a different implementation to the Boyer-Moore algorithm and got better results.
I got the idea for this implementation from here
But to be honest, I would expect to reach a higher speed compared to IndexOf.
class SearchResults
{
public int Matches { get; set; }
public long Ticks { get; set; }
}
abstract class SearchBase
{
public const int InvalidIndex = -1;
protected string _pattern;
protected string _text;
public SearchBase(string text, string pattern) { _text = text; _pattern = pattern; }
public abstract int Search(int startIndex);
}
internal class BoyerMoore3 : SearchBase
{
readonly byte[] textBytes;
readonly byte[] patternBytes;
readonly int valueLength;
readonly int patternLength;
private readonly int[] badCharacters = new int[256];
private readonly int lastPatternByte;
public BoyerMoore3(string text, string pattern) : base(text, pattern)
{
textBytes = Encoding.UTF8.GetBytes(text);
patternBytes = Encoding.UTF8.GetBytes(pattern);
valueLength = textBytes.Length;
patternLength = patternBytes.Length;
for (int i = 0; i < 256; ++i)
badCharacters[i] = patternLength;
lastPatternByte = patternLength - 1;
for (int i = 0; i < lastPatternByte; ++i)
badCharacters[patternBytes[i]] = lastPatternByte - i;
}
public override int Search(int startIndex)
{
int index = startIndex;
while (index <= (valueLength - patternLength))
{
for (int i = lastPatternByte; textBytes[index + i] == patternBytes[i]; --i)
{
if (i == 0)
return index;
}
index += badCharacters[textBytes[index + lastPatternByte]];
}
// Text not found
return InvalidIndex;
}
}
Changed code from Form1:
private void RunSearch(string pattern, SearchBase search, SearchResults results)
{
var timer = new Stopwatch();
// Start timer
timer.Start();
// Find all matches
int pos = search.Search(0);
while (pos != -1)
{
results.Matches++;
pos = search.Search(pos + pattern.Length);
}
// Stop timer
timer.Stop();
// Add to total Ticks
results.Ticks += timer.ElapsedTicks;
}

Categories