Hello there again dear friends. I do not for the word of me understand what is going on in this code. I'm trying to implement a dictionary that counts the instances that a word pops up disregarding upper case or not. It keeps showing "isthis" and I dont know where its coming from. How do i rectify this?
The question is as such
Write a program that counts how many times each word from a given
text file words.txt appears in it. The result words should be ordered by
their number of occurrences in the text.
Here is the code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Text.RegularExpressions;
namespace Chapter_18_Question_3
{
class Program
{
static void Main(string[] args)
{
const string path = "words.txt";
string line;
using (var reader = new StreamReader(path))
{
line = reader.ReadToEnd();
}
string text = line.ToLower();
string tmp = Regex.Replace(text, "[^a-zA-Z0-9 ]", "");
string[] newText = tmp.Split(' ');
var table = new SortedDictionary<string, int>();
foreach(var item in newText)
{
if(!table.ContainsKey(item))
{
table.Add(item, 1);
}
else
{
table[item] += 1;
}
}
foreach (var item in table)
{
Console.WriteLine("The word {0} appeared {1} times",
item.Key, item.Value);
}
}
}
My text is this:
"This is the TEXT. Text, text, text – THIS TEXT! Is this the text?"
And the output is this
The word appeared 1 times
The word is appeared 1 times
The word isthis appeared 1 times
The word text appeared 6 times
The word the appeared 2 times
The word this appeared 2 times
If I were to guess, I'd say that your file contains a line break (LF or CRLF) that gets replaced by your regex (which only allows letters and spaces).
For instance, if the file contents were:
This
is the text.
The line break between This and is would be removed, leaving you with the text:
Thisis the text.
If this is the case, you might want to use "[^a-zA-Z0-9 \r\n]" instead as a replacement pattern.
Related
I’m just so close, but my program is still not working properly. I am trying to count how many times a set of words appear in a text file, list those words and their individual count and then give a sum of all the found matched words.
If there are 3 instances of “lorem”, 2 instances of “ipsum”, then the total should be 5.
My sample text file is simply a paragraph of “Lorem ipsum” repeated a few times in a text file.
My problem is that this code I have so far, only counts the first occurrence of each word, even though each word is repeated several times throughout the text file.
I am using a “pay for” parser called “GroupDocs.Parser” that I added through the NuGet package manager. I would prefer not to use a paid for version if possible.
Is there an easier way to do this in C#?
Here’s a screen shot of my desired results.
Here is the full code that I have so far.
using GroupDocs.Parser;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace ConsoleApp5
{
class Program
{
static void Main(string[] args)
{
using (Parser parser = new Parser(#"E:\testdata\loremIpsum.txt"))
{
// Extract a text into the reader
using (TextReader reader = parser.GetText())
{
// Define the search terms.
string[] wordsToMatch = { "Lorem", "ipsum", "amet" };
Dictionary<string, int> stats = new Dictionary<string, int>();
string text = reader.ReadToEnd();
char[] chars = { ' ', '.', ',', ';', ':', '?', '\n', '\r' };
// split words
string[] words = text.Split(chars);
int minWordLength = 2;// to count words having more than 2 characters
// iterate over the word collection to count occurrences
foreach (string word in wordsToMatch)
{
string w = word.Trim().ToLower();
if (w.Length > minWordLength)
{
if (!stats.ContainsKey(w))
{
// add new word to collection
stats.Add(w, 1);
}
else
{
// update word occurrence count
stats[w] += 1;
}
}
}
// order the collection by word count
var orderedStats = stats.OrderByDescending(x => x.Value);
// print occurrence of each word
foreach (var pair in orderedStats)
{
Console.WriteLine("Total occurrences of {0}: {1}", pair.Key, pair.Value);
}
// print total word count
Console.WriteLine("Total word count: {0}", stats.Count);
Console.ReadKey();
}
}
}
}
}
Any suggestions on what I'm doing wrong?
Thanks in advance.
Splitting the entire content of the text file to get a string array of the words is not a good idea because doing so will create a new string object in memory for each word. You can imagine the cost when you deal with big files.
An alternative approach is:
Use the Parallel.ForEach method to read the lines from the text file in parallel.
Use the thread-safe ConcurrentDictionary<TKey,TValue> collection to be accessed by the paralleled threads.
Increment the values of each word (key) by the count of the Regex.Matches Method.
using System;
using System.Collections.Concurrent;
using System.Linq;
using System.IO;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
static void Main(string[] args)
{
var file = #"loremIpsum.txt";
var obj = new object();
var wordsToMatch = new ConcurrentDictionary<string, int>();
wordsToMatch.TryAdd("Lorem", 0);
wordsToMatch.TryAdd("ipsum", 0);
wordsToMatch.TryAdd("amet", 0);
Console.WriteLine("Press a key to continue...");
Console.ReadKey();
Parallel.ForEach(File.ReadLines(file),
(line) =>
{
foreach (var word in wordsToMatch.Keys)
lock (obj)
wordsToMatch[word] += Regex.Matches(line, word,
RegexOptions.IgnoreCase).Count;
});
foreach (var kv in wordsToMatch.OrderByDescending(x => x.Value))
Console.WriteLine($"Total occurrences of {kv.Key}: {kv.Value}");
Console.WriteLine($"Total word count: {wordsToMatch.Values.Sum()}");
Console.ReadKey();
}
stats is a dictionary, so stats.Count will only tell you how many distinct words there are. You need to add up all the values in it. Something like stats.Values.Sum().
You can replace this code with a LINQ query that uses case-insensitive grouping. Eg:
char[] chars = { ' ', '.', ',', ';', ':', '?', '\n', '\r' };
var text=File.ReadAllText(somePath);
var query=text.Split(chars)
.GroupBy(w=>w,StringComparer.OrdinalIgnoreCase)
.Select(g=>new {word=g.Key,count=g.Count())
.Where(stat=>stat.count>2)
.OrderByDescending(stat=>stat.count);
At that point you can iterate over the query or copy the results to an array or dictionary with ToArray(), ToList() or ToDictionary().
This isn't the most efficient code - for one thing, the entire file is loaded in memory. One could use File.ReadLines to load and iterate over the lines one by one. LINQ could be used to iterate over the lines as well:
var lines=File.ReadLines(somePath);
var query=lines.SelectMany(line=>line.Split(chars))
.GroupBy(w=>w,StringComparer.OrdinalIgnoreCase)
.Select(g=>new {word=g.Key,count=g.Count())
.Where(stat=>stat.count>2)
.OrderByDescending(stat=>stat.count);
I need help with text. I got a code which for example finds if the line has even number of words, then it finds every 2nd word in a text file. The problem is i don't know how to append a string to that every 2nd word and print it out.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections;
using System.IO;
namespace sd
{
class Program
{
const string CFd = "..\\..\\A.txt";
const string CFr = "..\\..\\Rezults.txt";
static void Main(string[] args)
{
Apdoroti(CFd, CFr);
Console.WriteLine();
}
static void Apdoroti(string fd, string fr)
{
string[] lines = File.ReadAllLines(fd, Encoding.GetEncoding(1257));
using (var far = File.CreateText(fr))
{
StringBuilder news = new StringBuilder();
VD(CFd,news);
far.WriteLine(news);
}
}
static void VD(string fv, StringBuilder news)
{
using (StreamReader reader = new StreamReader(fv,
Encoding.GetEncoding(1257)))
{
string[] lines = File.ReadAllLines(fv, Encoding.GetEncoding(1257));
int nrl;
int prad = 1;
foreach (string line in lines)
{
nrl = line.Trim().Split(' ').Count();
string[] parts = line.Split(' ');
if (nrl % 2 == 0)
{
Console.WriteLine(nrl);
for (int i = 0; i < nrl; i += 2)
{
int ind = line.IndexOf(parts[i]);
nauja.Append(parts[i]);
Console.WriteLine(" {0} ", news);
}
}
}
}
}
}
}
For example if i got a text like:
"Monster in the Jungle Once upon a time a wise lion lived in jungle.
He was always respected for his intelligence and kindness."
Then it should print out:
"Monster in abb the Jungle abb Once upon abb a time abb a wise abb lion lived abb in jungle.
He was always respected for his intelligence and kindness."
You can do it with a regex replace, like this regex:
#"\w+\s\w+\s"
It maches a Word, a Space, a Word and a Space.
Now replace it with:
"$&abb "
How to use:
using System.Text.RegularExpressions;
string text = "Monster in the Jungle Once upon a time a wise lion lived in jungle. He was always respected for his intelligence and kindness.";
Regex regex = new Regex(#"\w+\s\w+\s");
string output = regex.Replace(text, "$&abb ");
Now you will get the desired output.
Edit:
To Work with any number of Words, you can use:
#"(\w+\s){3}"
where the quantifier (here 3) can be set to whatever you want.
Edit2:
If you don't want to replace numbers:
#"([a-zA-Z]+\s){2}"
You can use linq, first parse the line on spaces to get a list of words (you are doing) and then for every odd element add the text required, finally convert the array back into a string.
string test = "Monster in the Jungle Once upon a time a wise lion lived in jungle. He was always respected for his intelligence and kindness.";
var words = test.Split(' ');
var wordArray = words.Select((w, i) =>
(i % 2 != 0) ? (w+ " asd ") : (w + " ")
).ToArray();
var res = string.Join("", wordArray);
Also this can be easily changed to insert after every n words by changing the mod function. Do remember that array index will start at 0 though.
Okay, so I'm creating a hang-man game (Lame, I know, but I gotta' start somewhere). I have successfully pulled ~30 random words from a text file into a variable and can properly display the word in a random order onto the screen (just to test and make sure the variable is obtaining a whole word in random order).
But I need to take that string and break it into single characters in order to 'blank' out the letters to be 'guessed' by the user. I assume an array is the best way to do this - coupled with a while loop that will run while the character != null.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Hangman
{
class Program
{
static void Main(string[] args)
{
String[] myWordArrays = File.ReadAllLines("WordList.txt");
Random randomWord = new Random();
int lineCount = File.ReadLines("WordList.txt").Count();
int activeWord = randomWord.Next(0, lineCount);
/*CharEnumerator activeWordChar = activeWord; --- I have tried this,
but it says "Cannot implicitly convert type 'int' to 'System.CharEnumerator'
--- while redlining "activeWord." */
/*CharEnumerator activeWordChar = activeWord.ToString
-- I have tried this but it says "Cannot convert method group 'ToString' to
non-delegate type 'System.CharEnumerator'. Did you intend to invoke the method?
I also tried moving the declaration of activeWordChar below the 'writeline'
that displays the word properly to the console.
I have even tried to create a Char[] activeWordChar = activeWord.toCharArray; But this doesn't work either.
*/
//I'm using this writeline "the word for this game is: " ONLY to test that the
//system is choosing random word **end comment
Console.WriteLine("The Word for this game is: " + myWordArrays[activeWord]);
//Console.WriteLine("The Characters are like this: " + activeWordChar[]);
//my attempt at printing something, but it doesn't work. :(
Console.ReadLine();
}
}
}
I'm open to references in order to figure it out myself, but I'm kinda' stuck here.
Also, how do I close the file that I've opened so that it can be accessed later on in the program if need be? I've only learned the StreamReader("filename") way of 'variable.Close();' - but that isn't working here.
Edit
And why someone would vote this question down is beyond me. lol
A couple of points here (first of all, you are off to a great start):
You are needlessly re-reading your file to get the line count. You can use myWordArrays.Length to set your lineCount variable
Regarding your question about closing the file, per MSDN File.ReadAllLines() closes the file after it is done reading it, so you are fine there with what you already have.
A string itself can be treated like an array in terms of accessing by index and accessing its Length property. There's also the ability to iterate over it implicitly like so:
foreach (char letter in myWordArrays[activeWord])
{
// provide a blanked-out letter for each char
}
You can access any character in the string by its index, so you can think of string as array of chars:
For example, like this snippet:
string word = "word";
char w1 = word[0];
Console.WriteLine(w1);
You can simplify your code a bit as follows. Previously your activeWord variable was an integer and therefore cannot be converted to a character array.
static void Main(string[] args)
{
String[] myWordArrays = File.ReadAllLines("WordList.txt");
Random random = new Random();
string activeWord = myWordArrays[random.next(myWordArrays.Length)];
char[] chars = activeWord.ToCharArray();
}
However a string in C# can be treated as an enumerable object and therefore you should only be using a character array if you need to mutate parts of the string.
Thanks to Sven, I was able to figure it out and I was able to add some things onto it as well!! I'm posting this for other newbs to understand from a newb's perspective:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Hangman
{
class Program
{
static void Main(string[] args)
{
printWord();
}
/*I created a method to perform this, so that I can have the program stay open for
either 1) a new game or 2) just to see if I could do it. It works!*/
private static void printWord()
{
String[] myWordArrays = File.ReadAllLines("WordList.txt");
Random randomWord = new Random();
//int lineCount = File.ReadLines("WordList.txt").Count();
//The line below fixed my previous issue of double-reading file
int activeWord = randomWord.Next(0, myWordArrays.Length);
string userSelection = "";
Console.WriteLine("Are you Ready to play Hangman? yes/no: ");
userSelection = Console.ReadLine();
if(userSelection == "yes")
{
/*This runs through the randomly chosen word and prints an underscore in
place of each letter - it does work and this is what fixed my
previous issue - thank you Sven*/
foreach(char letter in myWordArrays[activeWord])
{
Console.Write("_ ");
}
//This prints to the console "Can you guess what this 'varyingLength' letter word is?" - it does work.
Console.WriteLine("Can you guess what this "+ myWordArrays[activeWord].Length +" letter word is?");
Console.ReadLine();
}
//else if(userSelection=="no") -- will add more later
}
}
}
I want my program to read a text file all characters 1 by 1 and whereever it finds a double-quote ("), it adds a semicolon before that inverted comma. For eg we have a paragraph in a text file as follow:
This is a paragraph which conains lots and lots of characters and some
names and dates. My name "Sam" i was born at "12:00" "noon". I live in
"anyplace" .
Now I want the output to be as follows:
This is a paragraph which conains lots and lots of characters and some
names and dates. My name ;"Sam;" i was born at ;"12:00;" ;"noon;". I
live in ;"anyplace;" .
It should open the file using file stream then reads character and then adds semicolon where it finds quotes. And the output should be equal to textbox1.Text.
This is my code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
char ch;
int Tchar = 0;
StreamReader reader;
reader = new StreamReader(#"C:\Users\user1\Documents\data.txt");
do
{
ch = (char)reader.Read();
Console.Write(ch);
if (Convert.ToInt32(ch) == 34)
{
Console.Write(#";");
}
Tchar++;
} while (!reader.EndOfStream);
reader.Close();
reader.Dispose();
Console.WriteLine(" ");
Console.WriteLine(Tchar.ToString() + " characters");
Console.ReadLine();
}
}
}
This is the output:
This is a paragraph which conains lots and lots of characters and some
names and dates. My name ";Sam"; i was born at ";12:00"; ";noon";. I
live in ";anyplace"; . 154 characters
I want that semicolon before the quotes. Any help would be appreciated. Thanks!
Swap the order of the operations:
if (Convert.ToInt32(ch) == 34)
{
Console.Write(#";");
}
Console.Write(ch);
e.g. don't write the original character until AFTER you've decided to output a semicolon or not.
Try ch = (char)reader.Peek();
This will read tell you the next character without reading it. You can then use this to check if it is a " or not an insert : accordingly
if (Convert.ToInt32((char)read.Peek()) == 34) Console.Write(#";")
Do you have to read in character by character? The following code will do the whole thing as a block and return you a list containing all your lines.
var contents = File.ReadAllLines (#"C:\Users\user1\Documents\data.txt")
.Select (l => l.Replace ("\"", ";\""));
using System;
using System.IO;
using System.Text;
namespace getto
{
class Program
{
static void Main(string[] args)
{
var path = #"C:\Users\VASANTH14122018\Desktop\file.v";
string content = File.ReadAllText(path, Encoding.UTF8);
Console.WriteLine(content);
//string helloWorld = "Hello, world!";
foreach(char c in content)
{
Console.WriteLine(c);
}
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}
}
}
Currently fiddling with a little project I'm working on, it's a count down type game (the tv show).
Currently, the program allows the user to pick a vowel or consonant to a limit of 9 letters and then asks them to input the longest word they can think of using these 9 letters.
I have a large text file acting as a dictionary that i search through using the user inputted string to try match a result to check if the word they entered is a valid word. My problem, is that I want to then search my dictionary for the longest word made up of the nine letters, but i just cant seem to find a way to implement it.
So far I've tried putting every word into an array and searching through each element to check if it contains the letters but this wont cover me if the longest word that can be made out of the 9 letters is a 8 letter word. Any idea's?
Currently I have this (This is under the submit button on the form, sorry for not providing code or mentioning it's a windows form application):
StreamReader textFile = new StreamReader("C:/Eclipse/Personal Projects/Local_Projects/Projects/CountDown/WindowsFormsApplication1/wordlist.txt");
int counter1 = 0;
String letterlist = (txtLetter1.Text + txtLetter2.Text + txtLetter3.Text + txtLetter4.Text + txtLetter5.Text + txtLetter6.Text + txtLetter7.Text + txtLetter8.Text + txtLetter9.Text); // stores the letters into a string
char[] letters = letterlist.ToCharArray(); // reads the letters into a char array
string[] line = File.ReadAllLines("C:/Eclipse/Personal Projects/Local_Projects/Projects/CountDown/WindowsFormsApplication1/wordlist.txt"); // reads every line in the word file into a string array (there is a new word on everyline, and theres 144k words, i assume this will be a big performance hit but i've never done anything like this before so im not sure ?)
line.Any(x => line.Contains(x)); // just playing with linq, i've no idea what im doing though as i've never used before
for (int i = 0; i < line.Length; i++)// a loop that loops for every word in the string array
// if (line.Contains(letters)) //checks if a word contains the letters in the char array(this is where it gets hazy if i went this way, i'd planned on only using words witha letter length > 4, adding any words found to another text file and either finding the longest word then in this text file or keeping a running longest word i.e. while looping i find a word with 7 letters, this is now the longest word, i then go to the next word and it has 8 of our letters, i now set the longest word to this)
counter1++;
if (counter1 > 4)
txtLongest.Text += line + Environment.NewLine;
Mike's code:
using System;
using System.Collections.Generic;
using System.Linq;
class Program
static void Main(string[] args) {
var letters = args[0];
var wordList = new List<string> { "abcbca", "bca", "def" }; // dictionary
var results = from string word in wordList // makes every word in dictionary into a seperate string
where IsValidAnswer(word, letters) // calls isvalid method
orderby word.Length descending // sorts the word with most letters to top
select word; // selects that word
foreach (var result in results) {
Console.WriteLine(result); // outputs the word
}
}
private static bool IsValidAnswer(string word, string letters) {
foreach (var letter in word) {
if (letters.IndexOf(letter) == -1) { // checks if theres letters in the word
return false;
}
letters = letters.Remove(letters.IndexOf(letter), 1);
}
return true;
}
}
Here's an answer I knocked together in a couple of minutes which should do what you want. As others have said, this problem is complex and so the algorithm is going to be slow. The LINQ query evaluates each string in the dictionary, checking whether the supplied letters can be used to produce said word.
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static void Main(string[] args) {
var letters = args[0];
var wordList = new List<string> { "abcbca", "bca", "def" };
var results = from string word in wordList
where IsValidAnswer(word, letters)
orderby word.Length descending
select word;
foreach (var result in results) {
Console.WriteLine(result);
}
}
private static bool IsValidAnswer(string word, string letters) {
foreach (var letter in word) {
if (letters.IndexOf(letter) == -1) {
return false;
}
letters = letters.Remove(letters.IndexOf(letter), 1);
}
return true;
}
}
So where are you getting stuck? Start with the slow brute-force method and just find all the words that contain all the characters. Then order the words by length to get the longest. If you don't want to return a word that is shorter than the number of characters being sought (which I guess is only an issue if there are duplicate characters???), then add a test and eliminate that case.
I've had some more thoughts about this. I think the way to do it efficiently is by preprocessing the dictionary, ordering the letters in each word in alphabetical order and ordering the words in the list alphabetically too (you'll probably have to use some sort of multimap structure to store the original word and the sorted word).
Once you've done that you can much more efficiently find the words that can be generated from your pool of letters. I'll come back and flesh out an algorithm for doing this later, if someone else doesn't beat me to it.
Step 1: Construct a trie structure with each word sort by letter.
Example: EACH is sorted to ACEH is stored as A->C->E->H->(EACH, ACHE, ..) in the trie (ACHE is an anagram of EACH).
Step 2: Sort the input letters and find find the longest word corresponding to that set of letters in the trie.
Have you tried implementing something like this? It would be great to see your code you have tried.
string[] strArray = {"ABCDEFG", "HIJKLMNOP"};
string findThisString = "JKL";
int strNumber;
int strIndex = 0;
for (strNumber = 0; strNumber < strArray.Length; strNumber++)
{
strIndex = strArray[strNumber].IndexOf(findThisString);
if (strIndex >= 0)
break;
}
System.Console.WriteLine("String number: {0}\nString index: {1}",
strNumber, strIndex);
This must do the job :
private static void Main()
{
char[] picked_char = {'r', 'a', 'j'};
string[] dictionary = new[] {"rajan", "rajm", "rajnujaman", "rahim", "ranjan"};
var words = dictionary.Where(word => picked_char.All(word.Contains)).OrderByDescending(word => word.Length);
foreach (string needed_words in words)
{
Console.WriteLine(needed_words);
}
}
Output :
rajnujaman
ranjan
rajan
rajm