I'm trying to find a regex pattern to match a word with some given characters. But each character should be used only once. For example if I'm given "yrarbil" (library backwards), it should match these:
library
rar
lib
rarlib
But it should not match the following
libraryy ("y" is used more times than given)
libraries ("i" is used more times than given, and also "es" are not given at all)
I've searched all around but best I could find was code to match a word but the same character is used more than the amount of times it was given. Thank you.
P.S: If this can't be done in regex (I'm a noob at it as you can see) what would be the best way to match a word like this programmatically?
"library" is confusing because it has 2 litters r. But it is solvable from my opinion.
Easily Create a map<char, int> this will store the count of each character in the pattern. Then we will generate a map<char, int> for word to check, it will also contain the count of each char then iterate over the map if any char has more count than the same char in the map of pattern it don't match, also if it is not found at all then it don't match also.
As required the code in C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static bool Match(string pattern, string toMatch)
{
Dictionary<char, int> patternMap = new Dictionary<char, int>();
Dictionary<char, int> toMatchMap = new Dictionary<char, int>();
foreach (char ch in pattern)
{
if (patternMap.ContainsKey(ch))
++patternMap[ch];
else
patternMap[ch] = 1;
}
foreach (char ch in toMatch)
{
if (toMatchMap.ContainsKey(ch))
++toMatchMap[ch];
else
toMatchMap[ch] = 1;
}
foreach (var item in toMatchMap)
{
if (!patternMap.ContainsKey(item.Key) || patternMap[item.Key] < item.Value)
return false;
}
return true;
}
static void Main(string[] args)
{
string pattern = "library";
string[] test = { "lib", "rarlib", "rarrlib", "ll" };
foreach (var item in test)
{
if(Match(pattern, item))
Console.WriteLine("Match item : {0}", item);
else
Console.WriteLine("Failed item : {0}", item);
}
Console.ReadKey();
/*
Match item : lib
Match item : rarlib
Failed item : rarrlib
Failed item : ll
*/
}
}
}
A regex won't work for that. A solution would be to simply count the caracters of your list.
For example in JavaScript:
function count(str){
return str.split('').reduce(function(m,c){
m[c] = (m[c]||0)+1;
return m;
},{})
}
function check(str, reference){
var ms = count(str), mr = count(reference);
for (var k in ms) {
if (!(ms[k]<=mr[k])) return false;
}
return true;
}
// what follows is only for demonstration in a snippet
$('button').click(function(){
$('#r').text(check($('#a').val(), "library") ? "OK":"NOT OK");
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<input id=a value="rarlib">
<button>CHECK</button>
<div id=r></div>
I do not understand why do you want to do this using a regexp when a very simple and straightforward solution is available.
You just count how much times each letter appears in a given word, and in a word you test. Then you check that each letter in a tested word appears no more times than in a given word.
for ch in given_word
cnt[ch]++
for ch in test_word
cnt[ch]--
for ch='a'..'z'
if cnt[ch]<0
answer is no
if for all leters cnt[ch]>=0
answer is yes
Related
I'm trying to refresh my knowledge regarding c# and came accross this problem,
Have the function StringChallenge(str) take the str parameter being passed and return a compressed version of the string using the Run-length encoding algorithm. This algorithm works by taking the occurrence of each repeating character and outputting that number along with a single character of the repeating sequence. For example: "wwwggopp" would return 3w2g1o2p. The string will not contain any numbers, punctuation, or symbols.
and my code is
using System;
using System.Text;
class MainClass {
public static string StringChallenge(string str) {
// code goes here
var newString = new StringBuilder();
var result = new StringBuilder();
foreach (var c in str){
if (newString.Length == 0 || newString[newString.Length - 1] == c){
newString.Append(c);
}
else{
result.Append($"{newString.Length}{newString[0]}");
newString.Clear();
newString.Append(c);
}
}
if (newString.Length > 0){
result.Append($"{newString.Length}{newString[0]}");
}
return result;
}
static void Main() {
// keep this function call here
Console.WriteLine(StringChallenge(Console.ReadLine()));
}
}
please help. thank you!
I’m just so close, but my program is still not working properly. I am trying to count how many times a set of words appear in a text file, list those words and their individual count and then give a sum of all the found matched words.
If there are 3 instances of “lorem”, 2 instances of “ipsum”, then the total should be 5.
My sample text file is simply a paragraph of “Lorem ipsum” repeated a few times in a text file.
My problem is that this code I have so far, only counts the first occurrence of each word, even though each word is repeated several times throughout the text file.
I am using a “pay for” parser called “GroupDocs.Parser” that I added through the NuGet package manager. I would prefer not to use a paid for version if possible.
Is there an easier way to do this in C#?
Here’s a screen shot of my desired results.
Here is the full code that I have so far.
using GroupDocs.Parser;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace ConsoleApp5
{
class Program
{
static void Main(string[] args)
{
using (Parser parser = new Parser(#"E:\testdata\loremIpsum.txt"))
{
// Extract a text into the reader
using (TextReader reader = parser.GetText())
{
// Define the search terms.
string[] wordsToMatch = { "Lorem", "ipsum", "amet" };
Dictionary<string, int> stats = new Dictionary<string, int>();
string text = reader.ReadToEnd();
char[] chars = { ' ', '.', ',', ';', ':', '?', '\n', '\r' };
// split words
string[] words = text.Split(chars);
int minWordLength = 2;// to count words having more than 2 characters
// iterate over the word collection to count occurrences
foreach (string word in wordsToMatch)
{
string w = word.Trim().ToLower();
if (w.Length > minWordLength)
{
if (!stats.ContainsKey(w))
{
// add new word to collection
stats.Add(w, 1);
}
else
{
// update word occurrence count
stats[w] += 1;
}
}
}
// order the collection by word count
var orderedStats = stats.OrderByDescending(x => x.Value);
// print occurrence of each word
foreach (var pair in orderedStats)
{
Console.WriteLine("Total occurrences of {0}: {1}", pair.Key, pair.Value);
}
// print total word count
Console.WriteLine("Total word count: {0}", stats.Count);
Console.ReadKey();
}
}
}
}
}
Any suggestions on what I'm doing wrong?
Thanks in advance.
Splitting the entire content of the text file to get a string array of the words is not a good idea because doing so will create a new string object in memory for each word. You can imagine the cost when you deal with big files.
An alternative approach is:
Use the Parallel.ForEach method to read the lines from the text file in parallel.
Use the thread-safe ConcurrentDictionary<TKey,TValue> collection to be accessed by the paralleled threads.
Increment the values of each word (key) by the count of the Regex.Matches Method.
using System;
using System.Collections.Concurrent;
using System.Linq;
using System.IO;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
static void Main(string[] args)
{
var file = #"loremIpsum.txt";
var obj = new object();
var wordsToMatch = new ConcurrentDictionary<string, int>();
wordsToMatch.TryAdd("Lorem", 0);
wordsToMatch.TryAdd("ipsum", 0);
wordsToMatch.TryAdd("amet", 0);
Console.WriteLine("Press a key to continue...");
Console.ReadKey();
Parallel.ForEach(File.ReadLines(file),
(line) =>
{
foreach (var word in wordsToMatch.Keys)
lock (obj)
wordsToMatch[word] += Regex.Matches(line, word,
RegexOptions.IgnoreCase).Count;
});
foreach (var kv in wordsToMatch.OrderByDescending(x => x.Value))
Console.WriteLine($"Total occurrences of {kv.Key}: {kv.Value}");
Console.WriteLine($"Total word count: {wordsToMatch.Values.Sum()}");
Console.ReadKey();
}
stats is a dictionary, so stats.Count will only tell you how many distinct words there are. You need to add up all the values in it. Something like stats.Values.Sum().
You can replace this code with a LINQ query that uses case-insensitive grouping. Eg:
char[] chars = { ' ', '.', ',', ';', ':', '?', '\n', '\r' };
var text=File.ReadAllText(somePath);
var query=text.Split(chars)
.GroupBy(w=>w,StringComparer.OrdinalIgnoreCase)
.Select(g=>new {word=g.Key,count=g.Count())
.Where(stat=>stat.count>2)
.OrderByDescending(stat=>stat.count);
At that point you can iterate over the query or copy the results to an array or dictionary with ToArray(), ToList() or ToDictionary().
This isn't the most efficient code - for one thing, the entire file is loaded in memory. One could use File.ReadLines to load and iterate over the lines one by one. LINQ could be used to iterate over the lines as well:
var lines=File.ReadLines(somePath);
var query=lines.SelectMany(line=>line.Split(chars))
.GroupBy(w=>w,StringComparer.OrdinalIgnoreCase)
.Select(g=>new {word=g.Key,count=g.Count())
.Where(stat=>stat.count>2)
.OrderByDescending(stat=>stat.count);
I'm trying to create a method which takes two parameters, "word" and "input". The aim of the method is to print any word where all of its characters can be found in "input" no more than once (this is why the character is removed if a letter is found).
Not all the letters from "input" must be in "word" - eg, for input = "cacten" and word = "ace", word would be printed, but if word = "aced" then it would not.
However, when I run the program it produces unexpected results (words being longer than "input", containing letters not found in "input"), and have coded the solution several ways all with the same outcome. This has stumped me for hours and I cannot work out what's going wrong. Any and all help will be greatly appreciated, thanks. My full code for the method is written below.
static void Program(string input, string word)
{
int letters = 0;
List<string> remaining = new List<string>();
foreach (char item in input)
{
remaining.Add(item.ToString());
}
input = remaining.ToString();
foreach (char letter in word)
{
string c = letter.ToString();
if (input.Contains(c))
{
letters++;
remaining.Remove(c);
input = remaining.ToString();
}
}
if (letters == word.Length)
{
Console.WriteLine(word);
}
}
Ok so just to go through where you are going wrong.
Firstly when you assign remaining.ToString() to your input variable. What you actually assign is this System.Collections.Generic.List1[System.String]. Doing to ToString on a List just gives you the the type of list it is. It doesnt join all your characters back up. Thats probably the main thing that is casuing you issues.
Also you are forcing everything into string types and really you don't need to a lot of the time, because string already implements IEnumerable you can get your string as a list of chars by just doing myString.ToList()
So there is no need for this:
foreach (char item in input)
{
remaining.Add(item.ToString());
}
things like string.Contains have overloads that take chars so again no need for making things string here:
foreach (char letter in word)
{
string c = letter.ToString();
if (input.Contains(c))
{
letters++;
remaining.Remove(c);
input = remaining.ToString();
}
}
you can just user the letter variable of type char and pass that into contains and beacuse remaining is now a List<char> you can remove a char from it.
again Don't reassign remaining.ToString() back into input. use string.Join like this
string.Join(string.empty,remaining);
As someone else has posted there is a probably better ways of doing this, but I hope that what I've put here helps you understand what was going wrong and will help you learn
You can also use Regular Expression which was created for such scenarios.
bool IsMatch(string input, string word)
{
var pattern = string.Format("\\b[{0}]+\\b", input);
var r = new Regex(pattern);
return r.IsMatch(word);
}
I created a sample code for you on DotNetFiddle.
You can check what the pattern does at Regex101. It has a pretty "Explanation" and "Quick Reference" panel.
There are a lot of ways to achieve that, here is a suggestion:
static void Main(string[] args)
{
Func("cacten","ace");
Func("cacten", "aced");
Console.ReadLine();
}
static void Func(string input, string word)
{
bool isMatch = true;
foreach (Char s in word)
{
if (!input.Contains(s.ToString()))
{
isMatch = false;
break;
}
}
// success
if (isMatch)
{
Console.WriteLine(word);
}
// no match
else
{
Console.WriteLine("No Match");
}
}
Not really an answer to your question but its always fun to do this sort of thing with Linq:
static void Print(string input, string word)
{
if (word.All(ch => input.Contains(ch) &&
word.GroupBy(c => c)
.All(g => g.Count() <= input.Count(c => c == g.Key))))
Console.WriteLine(word);
}
Functional programming is all about what you want without all the pesky loops, ifs and what nots... Notice that this code does what you'd do in your head without needing to painfully specify step by step how you'd actually do it:
Make sure all characters in word are in input.
Make sure all characters in word are used at most as many times as they are present in input.
Still, getting the basics right is a must, posted this answer as additional info.
I have a string that looks like this
2,"E2002084700801601390870F"
3,"E2002084700801601390870F"
1,"E2002084700801601390870F"
4,"E2002084700801601390870F"
3,"E2002084700801601390870F"
This is one whole string, you can imagine it being on one row.
And I want to split this in the way they stand right now like this
2,"E2002084700801601390870F"
I cannot change the way it is formatted. So my best bet is to split at every second quotation mark. But I haven't found any good ways to do this. I've tried this https://stackoverflow.com/a/17892392/2914876 But I only get an error about invalid arguements.
Another issue is that this project is running .NET 2.0 so most LINQ functions aren't available.
Thank you.
Try this
var regEx = new Regex(#"\d+\,"".*?""");
var lines = regex.Matches(txt).OfType<Match>().Select(m => m.Value).ToArray();
Use foreach instead of LINQ Select on .Net 2
Regex regEx = new Regex(#"\d+\,"".*?""");
foreach(Match m in regex.Matches(txt))
{
var curLine = m.Value;
}
I see three possibilities, none of them are particularly exciting.
As #dvnrrs suggests, if there's no comma where you have line-breaks, you should be in great shape. Replace ," with something novel. Replace the remaining "s with what you need. Replace the "something novel" with ," to restore them. This is probably the most solid--it solves the problem without much room for bugs.
Iterate through the string looking for the index of the next " from the previous index, and maintain a state machine to decide whether to manipulate it or not.
Split the string on "s and rejoin them in whatever way works the best for your application.
I realize regular expressions will handle this but here's a pure 2.0 way to handle as well. It's much more readable and maintainable in my humble opinion.
using System;
using System.Collections.Generic;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main(string[] args)
{
const string data = #"2,""E2002084700801601390870F""3,""E2002084700801601390870F""1,""E2002084700801601390870F""4,""E2002084700801601390870F""3,""E2002084700801601390870F""";
var parsedData = ParseData(data);
foreach (var parsedDatum in parsedData)
{
Console.WriteLine(parsedDatum);
}
Console.ReadLine();
}
private static IEnumerable<string> ParseData(string data)
{
var results = new List<string>();
var split = data.Split(new [] {'"'}, StringSplitOptions.RemoveEmptyEntries);
if (split.Length % 2 != 0)
{
throw new Exception("Data Formatting Error");
}
for (var index = 0; index < split.Length / 2; index += 2)
{
results.Add(string.Format(#"""{0}""{1}""", split[index], split[index + 1]));
}
return results;
}
}
}
Currently fiddling with a little project I'm working on, it's a count down type game (the tv show).
Currently, the program allows the user to pick a vowel or consonant to a limit of 9 letters and then asks them to input the longest word they can think of using these 9 letters.
I have a large text file acting as a dictionary that i search through using the user inputted string to try match a result to check if the word they entered is a valid word. My problem, is that I want to then search my dictionary for the longest word made up of the nine letters, but i just cant seem to find a way to implement it.
So far I've tried putting every word into an array and searching through each element to check if it contains the letters but this wont cover me if the longest word that can be made out of the 9 letters is a 8 letter word. Any idea's?
Currently I have this (This is under the submit button on the form, sorry for not providing code or mentioning it's a windows form application):
StreamReader textFile = new StreamReader("C:/Eclipse/Personal Projects/Local_Projects/Projects/CountDown/WindowsFormsApplication1/wordlist.txt");
int counter1 = 0;
String letterlist = (txtLetter1.Text + txtLetter2.Text + txtLetter3.Text + txtLetter4.Text + txtLetter5.Text + txtLetter6.Text + txtLetter7.Text + txtLetter8.Text + txtLetter9.Text); // stores the letters into a string
char[] letters = letterlist.ToCharArray(); // reads the letters into a char array
string[] line = File.ReadAllLines("C:/Eclipse/Personal Projects/Local_Projects/Projects/CountDown/WindowsFormsApplication1/wordlist.txt"); // reads every line in the word file into a string array (there is a new word on everyline, and theres 144k words, i assume this will be a big performance hit but i've never done anything like this before so im not sure ?)
line.Any(x => line.Contains(x)); // just playing with linq, i've no idea what im doing though as i've never used before
for (int i = 0; i < line.Length; i++)// a loop that loops for every word in the string array
// if (line.Contains(letters)) //checks if a word contains the letters in the char array(this is where it gets hazy if i went this way, i'd planned on only using words witha letter length > 4, adding any words found to another text file and either finding the longest word then in this text file or keeping a running longest word i.e. while looping i find a word with 7 letters, this is now the longest word, i then go to the next word and it has 8 of our letters, i now set the longest word to this)
counter1++;
if (counter1 > 4)
txtLongest.Text += line + Environment.NewLine;
Mike's code:
using System;
using System.Collections.Generic;
using System.Linq;
class Program
static void Main(string[] args) {
var letters = args[0];
var wordList = new List<string> { "abcbca", "bca", "def" }; // dictionary
var results = from string word in wordList // makes every word in dictionary into a seperate string
where IsValidAnswer(word, letters) // calls isvalid method
orderby word.Length descending // sorts the word with most letters to top
select word; // selects that word
foreach (var result in results) {
Console.WriteLine(result); // outputs the word
}
}
private static bool IsValidAnswer(string word, string letters) {
foreach (var letter in word) {
if (letters.IndexOf(letter) == -1) { // checks if theres letters in the word
return false;
}
letters = letters.Remove(letters.IndexOf(letter), 1);
}
return true;
}
}
Here's an answer I knocked together in a couple of minutes which should do what you want. As others have said, this problem is complex and so the algorithm is going to be slow. The LINQ query evaluates each string in the dictionary, checking whether the supplied letters can be used to produce said word.
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static void Main(string[] args) {
var letters = args[0];
var wordList = new List<string> { "abcbca", "bca", "def" };
var results = from string word in wordList
where IsValidAnswer(word, letters)
orderby word.Length descending
select word;
foreach (var result in results) {
Console.WriteLine(result);
}
}
private static bool IsValidAnswer(string word, string letters) {
foreach (var letter in word) {
if (letters.IndexOf(letter) == -1) {
return false;
}
letters = letters.Remove(letters.IndexOf(letter), 1);
}
return true;
}
}
So where are you getting stuck? Start with the slow brute-force method and just find all the words that contain all the characters. Then order the words by length to get the longest. If you don't want to return a word that is shorter than the number of characters being sought (which I guess is only an issue if there are duplicate characters???), then add a test and eliminate that case.
I've had some more thoughts about this. I think the way to do it efficiently is by preprocessing the dictionary, ordering the letters in each word in alphabetical order and ordering the words in the list alphabetically too (you'll probably have to use some sort of multimap structure to store the original word and the sorted word).
Once you've done that you can much more efficiently find the words that can be generated from your pool of letters. I'll come back and flesh out an algorithm for doing this later, if someone else doesn't beat me to it.
Step 1: Construct a trie structure with each word sort by letter.
Example: EACH is sorted to ACEH is stored as A->C->E->H->(EACH, ACHE, ..) in the trie (ACHE is an anagram of EACH).
Step 2: Sort the input letters and find find the longest word corresponding to that set of letters in the trie.
Have you tried implementing something like this? It would be great to see your code you have tried.
string[] strArray = {"ABCDEFG", "HIJKLMNOP"};
string findThisString = "JKL";
int strNumber;
int strIndex = 0;
for (strNumber = 0; strNumber < strArray.Length; strNumber++)
{
strIndex = strArray[strNumber].IndexOf(findThisString);
if (strIndex >= 0)
break;
}
System.Console.WriteLine("String number: {0}\nString index: {1}",
strNumber, strIndex);
This must do the job :
private static void Main()
{
char[] picked_char = {'r', 'a', 'j'};
string[] dictionary = new[] {"rajan", "rajm", "rajnujaman", "rahim", "ranjan"};
var words = dictionary.Where(word => picked_char.All(word.Contains)).OrderByDescending(word => word.Length);
foreach (string needed_words in words)
{
Console.WriteLine(needed_words);
}
}
Output :
rajnujaman
ranjan
rajan
rajm