Getting a space character in output, when counting characters from input - c#

I've been working with this for a few days and I am trying to complete a console application where we are prompted to type in a string of our own and the output makes a list of every unique character and puts a count after it. At the end of the results, a count is displayed showing how many unique characters were found. Everything is translated to lowercase despite whether it is uppercase or not. They key is to use collections. Here is what I have so far. My output shows two space characters in the results despite the fact that I used an if statement to catch them. Can anyone point out a concept that I have overlooked?
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
public class LetterCountTest
{
public static void Main(string[] args)
{
// create sorted dictionary based on user input
SortedDictionary<string, int> dictionary = CollectLetters();
// display sorted dictionary content
DisplayDictionary(dictionary);
} // end Main
// create sorted dictionary from user input
private static SortedDictionary<string, int> CollectLetters()
{
// create a new sorted dictionary
SortedDictionary<string, int> dictionary =
new SortedDictionary<string, int>();
Console.WriteLine("Enter a string: "); // prompt for user input
string input = Console.ReadLine(); // get input
// split every individual letter for the count
string[] letters = Regex.Split(input, "");
// processing input letters
foreach (var letter in letters)
{
string letterKey = letter.ToLower(); // get letter in lowercase
if (letterKey != " ") // statement to exclude whitespace count
{
// if the dictionary contains the letter
if (dictionary.ContainsKey(letterKey))
{
++dictionary[letterKey];
} // end if
else
// add new letter with a count of 1 to the dictionary
dictionary.Add(letterKey, 1);
}
} // end foreach
return dictionary;
} // end method CollectLetters
// display dictionary content
private static void DisplayDictionary<K, V>(
SortedDictionary<K, V> dictionary)
{
Console.WriteLine("\nSorted dictionary contains:\n{0,-12}{1,-12}",
"Key:", "Value:");
// generate output for each key in the sorted dictionary
// by iterating through the Keys property with a foreach statement
foreach (K key in dictionary.Keys)
Console.WriteLine("{0,-12}{1,-12}", key, dictionary[key]);
Console.WriteLine("\nsize: {0}", dictionary.Count);
Console.ReadLine();
} // end method DisplayDictionary
} // end class LetterCountTest
My output states that I am using every letter in the alphabet but also has a whitespace above the 'a' and two instances of it. I don't know where this is coming from, but my guess is that it's counting null characters or carriage returns. The string that I use is...
The quick brown fox jumps over the lazy dog
Aside from counting every letter once, it counts the e three times, the h two times, the o four times, the r two times, the t two times, and the u two times.

The problem your having relies in Regex.Split()'s behavior. Taking an excerpt from the msdn page on the method...
If a match is found at the beginning or the end of the input string, an empty string is included at the beginning or the end of the returned array.
This is exactly what is happening when you call Regex.Split(input, "");. To combat this, you can remove the regex matching from your code, change all instances of your dictionaries to a SortedDictionary<char, int>, and use a foreach loop to iterate through each character of your input string instead.
foreach (char letter in input.ToLower())
{
if (!Char.IsWhiteSpace(letter))
{
//Do your stuff
}
}

Since a string is already an array or characters you don't need that Regex.Split, just use:
foreach (var letter in input)
Then change your if statement to:
foreach (var letter in input)
{
if (!char.IsWhiteSpace(letter))
{
var letterKey = letter.ToString();
if (dictionary.ContainsKey(letterKey))
{
++dictionary[letterKey];
}
else
dictionary.Add(letterKey, 1);
}
}
Then it should exclude the empty strings and the white-spaces.I suspect you get some empty strings after Regex.Split, and you checking for only white-space therefore you get unexpected results.If you work with chars the you don't need to worry about empty strings.

Related

Replacing first 16 digits in a string with Regex.Replace

I'm trying to replace only the first 16 digits of a string with Regex. I want it replaced with "*". I need to take this string:
"Request=Credit Card.Auth
Only&Version=4022&HD.Network_Status_Byte=*&HD.Application_ID=TZAHSK!&HD.Terminal_ID=12991kakajsjas&HD.Device_Tag=000123&07.POS_Entry_Capability=1&07.PIN_Entry_Capability=0&07.CAT_Indicator=0&07.Terminal_Type=4&07.Account_Entry_Mode=1&07.Partial_Auth_Indicator=0&07.Account_Card_Number=4242424242424242&07.Account_Expiry=1024&07.Transaction_Amount=142931&07.Association_Token_Indicator=0&17.CVV=200&17.Street_Address=123
Road SW&17.Postal_Zip_Code=90210&17.Invoice_Number=INV19291"
And replace the credit card number with an asterisk, which is why I say the first 16 digits, as that is how many digits are in a credit card. I am first splitting the string where there is a "." and then checking if it contains "card" and "number". Then if it finds it I want to replace the first 16 numbers with "*"
This is what I've done:
public void MaskData(string input)
{
if (input.Contains("."))
{
string[] userInput = input.Split('.');
foreach (string uInput in userInput)
{
string lowerCaseInput = uInput.ToLower();
string containsCard = "card";
string containsNumber = "number";
if (lowerCaseInput.Contains(containsCard) && lowerCaseInput.Contains(containsNumber))
{
tbStoreInput.Text += Regex.Replace(lowerCaseInput, #"[0-9]", "*") + Environment.NewLine;
}
else
{
tbStoreInput.Text += lowerCaseInput + Environment.NewLine;
}
}
}
}
I am aware that the Regex is wrong, but not sure how to only get the first 16, as right now its putting an asterisks in the entire line like seen here:
"account_card_number=****************&**"
I don't want it to show the asterisks after the "&".
Same answer as in the comments but explained.
your regex pattern "[0-9]" is a single digit match, so each individual digit
including the digits after & will be a match and so would be replaced.
What you want to do is add a quantifier which restricts the matching to a number of characters ie 16, so your regex changes to "[0-9]{16}" to ensure those are the only characters affected by your replace operation
Disclaimer
My answer is purposely broader than what is asked by OP but I saw it as an opportunity to raise awareness of other tools that are available in C# (which are objects).
String replacement
Regex is not the only tool available to replace a simple string by another. Instead of
Regex.Replace(lowerCaseInput, #"[0-9]{16}", "****************")
it can also be
new StringBuilder()
.Append(lowerCaseInput.Take(20))
.Append(new string('*', 16))
.Append(lowerCaseInput.Skip(36))
.ToString();
Shifting from procedural to object
Now the real meat comes in the possibility to encapsulate the logic into an object which holds a kind of string representation of a dictionary (entries being separated by '.' while keys and values are separated by '=').
The only behavior this object has is to give back a string representation of the initial input but with some value (1 in your case) masked to user (I assume for some security reason).
public sealed class CreditCardRequest
{
private readonly string _input;
public CreditCardRequest(string input) => _input = input;
public static implicit operator string(CreditCardRequest request) => request.ToString();
public override string ToString()
{
var entries = _input.Split(".", StringSplitOptions.RemoveEmptyEntries)
.Select(entry => entry.Split("="))
.ToDictionary(kv => kv[0].ToLower(), kv =>
{
if (kv[0] == "Account_Card_Number")
{
return new StringBuilder()
.Append(new string('*', 16))
.Append(kv[1].Skip(16))
.ToString();
}
else
{
return kv[1];
}
});
var output = new StringBuilder();
foreach (var kv in entries)
{
output.AppendFormat("{0}={1}{2}", kv.Key, kv.Value, Environment.NewLine);
}
return output.ToString();
}
}
Usage becomes as follow:
tbStoreInput.Text = new CreditCardRequest(input);
The concerns of your code are now independant of each other (the rule to parse the input is no more tied to UI component) and the implementation details are hidden.
You can even decide to use Regex in CreditCardRequest.ToString() if you wish to, the UI won't ever notice the change.
The class would then becomes:
public override string ToString()
{
var output = new StringBuilder();
if (_input.Contains("."))
{
foreach (string uInput in _input.Split('.'))
{
if (uInput.StartsWith("Account_Card_Number"))
{
output.AppendLine(Regex.Replace(uInput.ToLower(), #"[0-9]{16}", "****************");
}
else
{
output.AppendLine(uInput.ToLower());
}
}
}
return output.ToString();
}
You can match 16 digits after the account number, and replace with 16 times an asterix:
(?<=\baccount_card_number=)[0-9]{16}\b
Regex demo
Or you can use a capture group and use that group in the replacement like $1****************
\b(account_card_number=)[0-9]{16}\b
Regex demo

best way to take an intersection of more than two hashsets in c#, when we donot know before hand how many hashsets are there

I am making a boolean retrieval system for some large no. of documents, in which i have made a dictionary of hashsets, and the the entries into the dictionary are the terms, and the hashsets contains the documentids in which the term was found.
Now when i want to search for a single word, i will simply enter the word and i will index the dictionary using the entered word in query and print out the corresponding hashset.
But i also want to search for sentences, in this case i will split the query into individual words and index the dictionary by those words, now depending upon the number of words in the query, that many number of hash sets will be returned, now i will want to take an intersection of these hash sets so that i can return the document ids in which i find out the words in the query.
My question is what is the best way to take intersection of these hash sets?
Currently i am putting the hash sets into a list, and then i take intersection of these n no. of hashsets two at a time and then take the intersection of result of first two and then the third one and so on...
This is the code
Dictionary<string, HashSet<string>> dt = new Dictionary<string, HashSet<string>>();//assume it is filled with data...
while (true)
{
Console.WriteLine("\n\n\nEnter the query you want to search");
string inp = Console.ReadLine();
string[] words = inp.Split(new Char[] { ' ', ',', '.', ':', '?', '!', '\t' });
List<HashSet<string>> outparr = new List<HashSet<string>>();
foreach(string w in words)
{
HashSet<string> outp = new HashSet<string>();
if (dt.TryGetValue(w, out outp))
{
outparr.Add(outp);
Console.WriteLine("Found {0} documents.", outp.Count);
foreach (string s in outp)
{
Console.WriteLine(s);
}
}
}
HashSet<string> temp = outparr.First();
foreach(HashSet<string> hs in outparr)
{
temp = new HashSet<string>(temp.Intersect(hs));
}
Console.WriteLine("Output After Intersection:");
Console.WriteLine("Found {0} documents: ", temp.Count);
foreach(string s in temp)
{
Console.WriteLine(s);
}
}
IntersectWith is a good aproach. Like this:
HashSet<string> res = null;
HashSet<string> outdictinary = null;
foreach(string w in words)
{
if (dt.TryGetValue(w, out outdictinary))
{
if( res==null)
res =new HashSet( outdictinary,outdictinary.Comparer);
else
{
if (res.Count==0)
break;
res.IntersectWith(outdictinary);
}
}
}
if (res==null) res = new HashSet();
Console.WriteLine("Output After Intersection:");
Console.WriteLine("Found {0} documents: ", res.Count);
foreach(string s in res)
{
Console.WriteLine(s);
}
The principle that you are using is sound, but you can tweak it a bit.
By sorting the hash sets on size, you can start with the smallest one, that way you can minimise the number of comparisons.
Instead of using the IEnumerable<>.Intersect method you can do the same thing in a loop, but using the fact that you already have a hash set. Checking if a value exists in a hash set is very fast, so you can just loop through the items in the smallest set and look for matching values in the next set, and put them in a new set.
In the loop you can skip the first item as you start with that. You don't need to intersect it with itself.
outparr = outparr.OrderBy(o => o.Count).ToList();
HashSet<string> combined = outparr[0];
foreach(HashSet<string> hs in outparr.Skip(1)) {
HashSet<string> temp = new HashSet<string>();
foreach (string s in combined) {
if (hs.Contains(s)) {
temp.Add(s);
}
}
combined = temp;
}
To answer your question, it's possible that at one point you'll find a set of documents that contains words a, b and c and another set that contains only other words in your query so the intersection can become empty after a few iterations. You can check for this and break out of the foreach.
Now, IMHO it doesn't make sense to do that intersection because usualy a search result should contain multiple files ordered by relevance.
It will also be much easier because you already have a list of files containing one word. From the hashes obtained for each word you'll have to count the occurences of file ids and return a limited number of ids ordered descending by the number of occurences.

Searching if an array of strings contain an array if char's

Currently fiddling with a little project I'm working on, it's a count down type game (the tv show).
Currently, the program allows the user to pick a vowel or consonant to a limit of 9 letters and then asks them to input the longest word they can think of using these 9 letters.
I have a large text file acting as a dictionary that i search through using the user inputted string to try match a result to check if the word they entered is a valid word. My problem, is that I want to then search my dictionary for the longest word made up of the nine letters, but i just cant seem to find a way to implement it.
So far I've tried putting every word into an array and searching through each element to check if it contains the letters but this wont cover me if the longest word that can be made out of the 9 letters is a 8 letter word. Any idea's?
Currently I have this (This is under the submit button on the form, sorry for not providing code or mentioning it's a windows form application):
StreamReader textFile = new StreamReader("C:/Eclipse/Personal Projects/Local_Projects/Projects/CountDown/WindowsFormsApplication1/wordlist.txt");
int counter1 = 0;
String letterlist = (txtLetter1.Text + txtLetter2.Text + txtLetter3.Text + txtLetter4.Text + txtLetter5.Text + txtLetter6.Text + txtLetter7.Text + txtLetter8.Text + txtLetter9.Text); // stores the letters into a string
char[] letters = letterlist.ToCharArray(); // reads the letters into a char array
string[] line = File.ReadAllLines("C:/Eclipse/Personal Projects/Local_Projects/Projects/CountDown/WindowsFormsApplication1/wordlist.txt"); // reads every line in the word file into a string array (there is a new word on everyline, and theres 144k words, i assume this will be a big performance hit but i've never done anything like this before so im not sure ?)
line.Any(x => line.Contains(x)); // just playing with linq, i've no idea what im doing though as i've never used before
for (int i = 0; i < line.Length; i++)// a loop that loops for every word in the string array
// if (line.Contains(letters)) //checks if a word contains the letters in the char array(this is where it gets hazy if i went this way, i'd planned on only using words witha letter length > 4, adding any words found to another text file and either finding the longest word then in this text file or keeping a running longest word i.e. while looping i find a word with 7 letters, this is now the longest word, i then go to the next word and it has 8 of our letters, i now set the longest word to this)
counter1++;
if (counter1 > 4)
txtLongest.Text += line + Environment.NewLine;
Mike's code:
using System;
using System.Collections.Generic;
using System.Linq;
class Program
static void Main(string[] args) {
var letters = args[0];
var wordList = new List<string> { "abcbca", "bca", "def" }; // dictionary
var results = from string word in wordList // makes every word in dictionary into a seperate string
where IsValidAnswer(word, letters) // calls isvalid method
orderby word.Length descending // sorts the word with most letters to top
select word; // selects that word
foreach (var result in results) {
Console.WriteLine(result); // outputs the word
}
}
private static bool IsValidAnswer(string word, string letters) {
foreach (var letter in word) {
if (letters.IndexOf(letter) == -1) { // checks if theres letters in the word
return false;
}
letters = letters.Remove(letters.IndexOf(letter), 1);
}
return true;
}
}
Here's an answer I knocked together in a couple of minutes which should do what you want. As others have said, this problem is complex and so the algorithm is going to be slow. The LINQ query evaluates each string in the dictionary, checking whether the supplied letters can be used to produce said word.
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static void Main(string[] args) {
var letters = args[0];
var wordList = new List<string> { "abcbca", "bca", "def" };
var results = from string word in wordList
where IsValidAnswer(word, letters)
orderby word.Length descending
select word;
foreach (var result in results) {
Console.WriteLine(result);
}
}
private static bool IsValidAnswer(string word, string letters) {
foreach (var letter in word) {
if (letters.IndexOf(letter) == -1) {
return false;
}
letters = letters.Remove(letters.IndexOf(letter), 1);
}
return true;
}
}
So where are you getting stuck? Start with the slow brute-force method and just find all the words that contain all the characters. Then order the words by length to get the longest. If you don't want to return a word that is shorter than the number of characters being sought (which I guess is only an issue if there are duplicate characters???), then add a test and eliminate that case.
I've had some more thoughts about this. I think the way to do it efficiently is by preprocessing the dictionary, ordering the letters in each word in alphabetical order and ordering the words in the list alphabetically too (you'll probably have to use some sort of multimap structure to store the original word and the sorted word).
Once you've done that you can much more efficiently find the words that can be generated from your pool of letters. I'll come back and flesh out an algorithm for doing this later, if someone else doesn't beat me to it.
Step 1: Construct a trie structure with each word sort by letter.
Example: EACH is sorted to ACEH is stored as A->C->E->H->(EACH, ACHE, ..) in the trie (ACHE is an anagram of EACH).
Step 2: Sort the input letters and find find the longest word corresponding to that set of letters in the trie.
Have you tried implementing something like this? It would be great to see your code you have tried.
string[] strArray = {"ABCDEFG", "HIJKLMNOP"};
string findThisString = "JKL";
int strNumber;
int strIndex = 0;
for (strNumber = 0; strNumber < strArray.Length; strNumber++)
{
strIndex = strArray[strNumber].IndexOf(findThisString);
if (strIndex >= 0)
break;
}
System.Console.WriteLine("String number: {0}\nString index: {1}",
strNumber, strIndex);
This must do the job :
private static void Main()
{
char[] picked_char = {'r', 'a', 'j'};
string[] dictionary = new[] {"rajan", "rajm", "rajnujaman", "rahim", "ranjan"};
var words = dictionary.Where(word => picked_char.All(word.Contains)).OrderByDescending(word => word.Length);
foreach (string needed_words in words)
{
Console.WriteLine(needed_words);
}
}
Output :
rajnujaman
ranjan
rajan
rajm

Finding the number of occurences strings in a specific format occur in a given text

I have a large string, where there can be specific words (text followed by a single colon, like "test:") occurring more than once. For example, like this:
word:
TEST:
word:
TEST:
TEST: // random text
"word" occurs twice and "TEST" occurs thrice, but the amount can be variable. Also, these words don't have to be in the same order and there can be more text in the same line as the word (as shown in the last example of "TEST"). What I need to do is append the occurrence number to each word, for example the output string needs to be this:
word_ONE:
TEST_ONE:
word_TWO:
TEST_TWO:
TEST_THREE: // random text
The RegEx for getting these words which I've written is ^\b[A-Za-z0-9_]{4,}\b:. However, I don't know how to accomplish the above in a fast way. Any ideas?
Regex is perfect for this job - using Replace with a match evaluator:
This example is not tested nor compiled:
public class Fix
{
public static String Execute(string largeText)
{
return Regex.Replace(largeText, "^(\w{4,}):", new Fix().Evaluator);
}
private Dictionary<String, int> counters = new Dictionary<String, int>();
private static String[] numbers = {"ONE", "TWO", "THREE",...};
public String Evaluator(Match m)
{
String word = m.Groups[1].Value;
int count;
if (!counters.TryGetValue(word, out count))
count = 0;
count++;
counters[word] = count;
return word + "_" + numbers[count-1] + ":";
}
}
This should return what you requested when calling:
result = Fix.Execute(largeText);
i think you can do this with Regax.Replace(string, string, MatchEvaluator) and a dictionary.
Dictionary<string, int> wordCount=new Dictionary<string,int>();
string AppendIndex(Match m)
{
string matchedString = m.ToString();
if(wordCount.Contains(matchedString))
wordCount[matchedString]=wordCount[matchedString]+1;
else
wordCount.Add(matchedString, 1);
return matchedString + "_"+ wordCount.ToString();// in the format: word_1, word_2
}
string inputText = "....";
string regexText = #"";
static void Main()
{
string text = "....";
string result = Regex.Replace(text, #"^\b[A-Za-z0-9_]{4,}\b:",
new MatchEvaluator(AppendIndex));
}
see this:
http://msdn.microsoft.com/en-US/library/cft8645c(v=VS.80).aspx
If I understand you correctly, regex is not necessary here.
You can split your large string by the ':' character. Maybe you also need to read line by line (split by '\n'). After that you just create a dictionary (IDictionary<string, int>), which counts the occurrences of certain words. Every time you find word x, you increase the counter in the dictionary.
EDIT
Read your file line by line OR split the string by '\n'
Check if your delimiter is present. Either by splitting by ':' OR using regex.
Get the first item from the split array OR the first match of your regex.
Use a dictionary to count your occurrences.
if (dictionary.Contains(key)) dictionary[key]++;
else dictionary.Add(key, 1);
If you need words instead of numbers, then create another dictionary for these. So that dictionary[key] equals one if key equals 1. Mabye there is another solution for that.
Look at this example (I know it's not perfect and not so nice)
lets leave the exact argument for the Split function, I think it can help
static void Main(string[] args)
{
string a = "word:word:test:-1+234=567:test:test:";
string[] tks = a.Split(':');
Regex re = new Regex(#"^\b[A-Za-z0-9_]{4,}\b");
var res = from x in tks
where re.Matches(x).Count > 0
select x + DecodeNO(tks.Count(y=>y.Equals(x)));
foreach (var item in res)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
private static string DecodeNO(int n)
{
switch (n)
{
case 1:
return "_one";
case 2:
return "_two";
case 3:
return "_three";
}
return "";
}

Split string into array then loop, in C#

I have Googled this a LOT but my C# skills are pretty terrible and I just can't see why this isn't working.
I have a string which comes from a session object, which I don't have any control over setting. The string contains some sentences separated by six underscores. e.g.:
Sentence number one______Sentence number two______Sentence number three etc
I want to split this string by the six underscores and return each item in the resultant array.
Here's the code I have:
string itemsPlanner = HttpContext.Current.Session["itemsPlanner"].ToString();
string[] arrItemsPlanner = itemsPlanner.Split(new string[] { "______" }, StringSplitOptions.None);
foreach (string i in arrItemsPlanner)
{
newItemsPlanner += "debug1: " + i; //This returns what looks like a number, as I'd expect, starting at zero and iterating by one each loop.
int itemNumber;
try
{
itemNumber = Convert.ToInt32(i);
string sentence = arrItemsPlanner[itemNumber].ToString();
}
catch (FormatException e)
{
return "Input string is not a sequence of digits.";
}
catch (OverflowException e)
{
return "The number cannot fit in an Int32.";
}
finally
{
return "Fail!"
}
}
Whenever I run this, the session is being retreived successfully but the line which says: itemNumber = Convert.ToInt32(i); fails every time and I get an error saying "Input string is not a sequence of digits."
Can anyone point me in the right direction with this please?
Many thanks!
If you just want to get each sentence and do something with it, this will do the trick:
string itemsPlanner = HttpContext.Current.Session["itemsPlanner"].ToString();
string[] arrItemsPlanner = itemsPlanner.Split("______");
foreach (string i in arrItemsPlanner)
{
// Do something with each sentence
}
You can split over a string as well as char (or char[]). In the foreach 'i' will be the value of the sentence, so you can concatenate it or process it or do whatever :)
If I've misunderstood, my apologies. I hope that helps :)
in your case i is not a number, it's the actual element in the array. A foreach loop has no iteration variable, you only have access to the actual element being iterated through i.
So first loop itareation i is Sentence number one, then Sentence number two.
If you want the number, you have to use a for loop instead.
So something like this
for( int i = 0; i < arrItemsPlanner.length; i++ ){
//on first iteration here
//i is 0
//and arrItemsPlanner[i] id "Sentence number one"
}
Hope it helps.
From your example i does not contain a valid integer number, thus Convert.ToInt32 fails. The foreach loop sets i with the current item in the sentences array, so basically i always contains one of the sentences in your main string. If you want i to be the index in the array, use a for loop.
Example from MSDN.
string words = "This is a list of words______with a bit of punctuation" +
"______a tab character.";
string [] split = words.Split(new Char [] {'_'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in split) {
if (s.Trim() != "")
Console.WriteLine(s);
}
Do you need to trim your string before converting to a number? if thats not you may want to use Int32.tryParse()
In your sample code foreach (string i in arrItemsPlanner) 'i' will get the string value of arrItemsPlanner one by one.
For exmaple on first iteration it will have 'Sentence number one' which is obviously not a vlid ont, hence your conversion failed.
i only contains one of the string fragment which will be : number one Sentence number two and Sentence number three. If you want it to contain an int representing ht index, use :
1) a for loop
2) an int defined before your foreach and increase it (myInt++) in the foreach code !

Categories